-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #92 New semi-enzymatic function with new notation. #94
Conversation
crates/sage/src/enzyme.rs
Outdated
if len >= self.min_len && len <= self.max_len && len > 0 && seen.insert(sequence) { | ||
digests.push(Digest { | ||
sequence: sequence.into(), | ||
missed_cleavages: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be missed_cleavages: site.missed_cleavages,
crates/sage/src/enzyme.rs
Outdated
} | ||
} | ||
v | ||
} | ||
} | ||
} | ||
|
||
fn missed_cleavage_sites(&self, sites: &mut Vec<DigestSite>, missed_cleavages: &u8) -> Vec<DigestSite> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but missed_cleavages should just be u8
, not &u8
(no need to pass by reference for a single byte/copy-type that is small)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Changed.
crates/sage/src/enzyme.rs
Outdated
for cut_site in start..end { | ||
semi_enzymatic_sites.push(DigestSite{ | ||
site: start..cut_site, | ||
missed_cleavages: site.missed_cleavages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? It seems to me that the number of missed_cleavages should be split across both children DigestSites - as it currently stands, a peptide with 2 missed cleavages will produce 2 different peptides, each with 2 missed cleavages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Currently, this section of code relies on seen
seeing the non-missed cleavage semi-enzymatic digested peptides first (with the correct number of missed_cleavages
) followed by those from peptides with missed cleavages. This preserves the correct number of missed_cleavages
for all peptides since duplicate peptides are formed but are likely to be seen from "parent" peptides with the lowest possible missed_cleavages
number first; however, it is not as robust as I would like it to be.
One optimization is to have DigestSite keep track of the location of missed cleaves and adjust the semi-enzymatic peptide's number accordingly. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, that makes sense, and checks out with the tests too. I think we can probably leave as-is for now - I'd like to refactor digest/peptide generation at some point anyway
_ => self.missed_cleavage_sites(&mut sites, &missed_cleavages) | ||
}; | ||
|
||
let mut sites = match self.is_semi_enzymatic() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably just use self.enzyme.is_semi_enzymatic
and eliminate this redundant function (which should hopefully be inlined by the compiler anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep getting error[E0609]: no field
enzymeon type
&EnzymeParameters`` at compile time when I match to self.enzyme.is_semi_enzymatic
. That was my first instinct until I kept getting the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's because it's an Option
- will need to be something like self.enzyme.map(|e| e.is_semi_enzymatic).unwrap_or(false)
... which is not as concise as I initially imagined. We can just leave it as-is for now
02cabe3
to
53e600a
Compare
Got through some initial testing and everything looks good! Will do a couple more tests when I get some more time and then merge. How do you feel about specifying semi-enzymatic searching by a parameter instead of modifying the enzyme string? I think this is more in line with how other tools are configured "enzyme": {
"missed_cleavages": 2,
"min_len": 7,
"max_len": 50,
"cleave_at": "KR",
"restrict": "P",
"semi_enzymatic": true // <-- new param
}, |
- propagate semi_enzymatic through to output - add "database.enzyme.semi_enzymatic" parameter, property-based tests for missed cleavages
62c9ca5
to
fa1ceb9
Compare
This update adds a semi-enzymatic function with new notation. "?KR" would produce all non-specific digests between all tryptic peptides. This update can also account for missed cleavages.
Updated the broken code in my previously PR