Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #92 New semi-enzymatic function with new notation. #94

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

FriedLabJHU
Copy link
Contributor

This update adds a semi-enzymatic function with new notation. "?KR" would produce all non-specific digests between all tryptic peptides. This update can also account for missed cleavages.

Updated the broken code in my previously PR

if len >= self.min_len && len <= self.max_len && len > 0 && seen.insert(sequence) {
digests.push(Digest {
sequence: sequence.into(),
missed_cleavages: 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be missed_cleavages: site.missed_cleavages,

}
}
v
}
}
}

fn missed_cleavage_sites(&self, sites: &mut Vec<DigestSite>, missed_cleavages: &u8) -> Vec<DigestSite> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but missed_cleavages should just be u8, not &u8 (no need to pass by reference for a single byte/copy-type that is small)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Changed.

for cut_site in start..end {
semi_enzymatic_sites.push(DigestSite{
site: start..cut_site,
missed_cleavages: site.missed_cleavages
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? It seems to me that the number of missed_cleavages should be split across both children DigestSites - as it currently stands, a peptide with 2 missed cleavages will produce 2 different peptides, each with 2 missed cleavages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Currently, this section of code relies on seen seeing the non-missed cleavage semi-enzymatic digested peptides first (with the correct number of missed_cleavages) followed by those from peptides with missed cleavages. This preserves the correct number of missed_cleavages for all peptides since duplicate peptides are formed but are likely to be seen from "parent" peptides with the lowest possible missed_cleavages number first; however, it is not as robust as I would like it to be.

One optimization is to have DigestSite keep track of the location of missed cleaves and adjust the semi-enzymatic peptide's number accordingly. What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that makes sense, and checks out with the tests too. I think we can probably leave as-is for now - I'd like to refactor digest/peptide generation at some point anyway

_ => self.missed_cleavage_sites(&mut sites, &missed_cleavages)
};

let mut sites = match self.is_semi_enzymatic() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably just use self.enzyme.is_semi_enzymatic and eliminate this redundant function (which should hopefully be inlined by the compiler anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep getting error[E0609]: no field enzymeon type&EnzymeParameters`` at compile time when I match to self.enzyme.is_semi_enzymatic. That was my first instinct until I kept getting the error.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's because it's an Option - will need to be something like self.enzyme.map(|e| e.is_semi_enzymatic).unwrap_or(false)... which is not as concise as I initially imagined. We can just leave it as-is for now

@lazear lazear force-pushed the FriedLabJHU-patch-1 branch from 02cabe3 to 53e600a Compare October 26, 2023 05:21
@lazear
Copy link
Owner

lazear commented Oct 26, 2023

Got through some initial testing and everything looks good! Will do a couple more tests when I get some more time and then merge.

How do you feel about specifying semi-enzymatic searching by a parameter instead of modifying the enzyme string? I think this is more in line with how other tools are configured
e.g.:

"enzyme": {
        "missed_cleavages": 2,
        "min_len": 7,
        "max_len": 50,
        "cleave_at": "KR",
        "restrict": "P",
        "semi_enzymatic": true  // <-- new param
    },

- propagate semi_enzymatic through to output
- add "database.enzyme.semi_enzymatic" parameter, property-based tests for missed cleavages
@lazear lazear force-pushed the FriedLabJHU-patch-1 branch from 62c9ca5 to fa1ceb9 Compare November 3, 2023 04:47
@lazear lazear merged commit 7abbb08 into lazear:master Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants