You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use MMseqs2 to obtain protein sequences from the query dataset that are completely dissimilar to the protein sequences in the target dataset (e.g., with a similarity threshold of 0.3). What should I do? Can I achieve this goal using the following code:
The results I obtained using the above code only show the protein sequences from the query dataset that are below the threshold for certain proteins in the target dataset( resultDB0.3.txt ), which is confusing to me. Did I make a mistake?
Thank you!
Best wishes!
The text was updated successfully, but these errors were encountered:
I was coming to the issues section just now to ask almost exactly the same question!
Like OP, I've got a queryDB and targetDB
I ran mmseqs search queryDB targetDB resultDB ./tmp and now am trying to figure out how to extract everything from queryDB that is NOT in resultsDB
Looking through the documentation on the structure of the database, I can see sort of how things are linked together but not a clear way to pick out things that are not in the results DB
In my case, these are fasta entries so I guess I could brute force with converting resultsDB to a *.m8 file, then parsing it and all the input sequences from queryDB.. but that is a massively intensive and inefficient process that I hope we can find an integrated way to achieve!
Hi !
I want to use MMseqs2 to obtain protein sequences from the query dataset that are completely dissimilar to the protein sequences in the target dataset (e.g., with a similarity threshold of 0.3). What should I do? Can I achieve this goal using the following code:
`
mmseqs search queryDB targetDB resultDB tmp
mmseqs filterresult queryDB targetDB resultDB resultDB0.3 --max-seq-id 0.3
mmseqs createtsv queryDB targetDB resultDB0.3 resultDB0.3.tsv
`
The results I obtained using the above code only show the protein sequences from the query dataset that are below the threshold for certain proteins in the target dataset( resultDB0.3.txt ), which is confusing to me. Did I make a mistake?
Thank you!
Best wishes!
The text was updated successfully, but these errors were encountered: