Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPatch/get mass #547
Conversation
johnmay
added some commits
Feb 18, 2019
This comment has been minimized.
This comment has been minimized.
@schymane... hard to find you. Will also ping in CDK-R/cdkr#65 |
egonw
requested changes
Feb 18, 2019
Thanks, I like the API. I also like your suggestion of a |
* a distribution of isotopes by splitting the set of isotopes in two, | ||
* the one under consideration (specified by 'idx') and the remaining to be | ||
* considered ('>idx'). The inflection point is calculate as 'k' | ||
* &le 'count' isotopes added. If there are remaining isotopes the method |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
schymane
commented
Mar 3, 2019
So, it seems you have built in 4 cases and we had 5 in the latest changes @rajarshi and I worked on in rcdk ... the missing one is the mass integer (maybe worth adding for our nominal mass friends?). Our test cases don't overlap which is making the terminology match confusing. https://github.com/CDK-R/cdkr/blob/massfuncs/rcdk/inst/unitTests/runit.mass.R I cannot figure out if these overlap completely with the different terms and the unfamiliar masses ... I suspect one option doesn't - and I think once we have all the options we need, we should come to an agreement on names. It's not clear for mass spec people which one is monoisotopic mass in our rcdk terms, it is for yours, but it's not clear if and when isotopes are ignored and this is very confusing for end users. Are you able to implement a couple of my trickier test cases? I chose them to check the implicit/explicit H handling (a nasty case for deuterium, especially deuterium exchange) and the isotope handling as well. |
This comment has been minimized.
This comment has been minimized.
I kind of think MonoIsotopic should also me modified such that explicitly specified isotopes are considered (same as MolWeight) currently it just throws them away.. In terms of naming, happy to modify and drop the Weight from AverageWeight as needed but "total.exact" and "total.natural" are non starters for me, works in R but not in Java - we can't have the "dots" and TotalExact TotalNautral meh. As you say "MolWeight" is too ubiquitous, and we have to think about the community in general rather just MS so I would keep that, it's currently the default which I think makes sense.
Not really a fan of that duplication, think it makes it more confusing. |
This comment has been minimized.
This comment has been minimized.
schymane
commented
Mar 3, 2019
I agree that it's "wrong" but agree we have to leave it in for backwards compatibility.
I confess I was severely undercaffeinated this morning and found those definitions in my emails after giving my feedback, sorry :-/ and thanks for pasting again
OK this is good, behaviour is "normal" unless we specify, and the name is what most would consider logical.
As you said, it's needed for backwards compatibility and having the "weight" in there will connect it back to MolWeight. I would not know a chemically correct term for this offhand since conceptually it's not really logical. Suggest to add "(this option is retained for backwards compatibility)" or similar to the documentation, so purists that have an issue with a rather illogical calculation know the reasoning?
This is an absolute killer for mass spec if this is the only exact mass option and we'd need an additional function that gives us the exact mass for specified isotopes. So that's the case that was missing. If we go strictly by definition (which is from the IUPAC Gold Book): Then we will need two cases. CDK's MonoIsotopic option should match the IUPAC definition (and does if your major isotope the principal by abundance - maybe adjust the definitions to be as close to the IUPAC as possible?), then we would need another option ExactMass (or similar) to return the exact mass of the exact isotopic combination specified. , i.e
Agree with the need for this and like the name. It says what it is.
See above - we should match the IUPAC definition.
Sorry I was meaning we should match terms in terms of "getting our words consistent" and ignoring the obvious styling differences between the languages. Whether we use caps or dots to do the separation doesn't really matter (and we'll need to keep the dots to keep the syntax in rcdk consistent) but I think most humans can match AvgWeight v avg.weight and ExactMass v exact.mass ;-) and this is what we need to get consistent.
Fine with me - I agree in this case that it could add too much confusion. I would strongly suggest to put in some of the examples I suggested, or similar, as they were selected to show the need for the two different styles of exact mass - and will also check the H behaviour. Or I can give it a go once we have all the options if you trust my copy-paste work ;-) |
This comment has been minimized.
This comment has been minimized.
schymane
commented
Mar 3, 2019
I spoke too soon, I find the definition a little unclear, suggest a minor wording change (if I understood it correctly...seems to match the enviPat output at least ... ):
For the record: |
This comment has been minimized.
This comment has been minimized.
Sorry miss-documented! I propose the following, we can have ugly names for the ones we don't want people to use:
|
This comment has been minimized.
This comment has been minimized.
schymane
commented
Mar 3, 2019
I'm not yet convinced it does ... comments below ... and I've finally found the good overview:
I find MolWeight clearer than Average. Molecular weight is the term commonly used. Note at the above site, MolWeight (which is calls MolMass ;-) ) and Average mass are defined separately. Would much rather MolWeight for clarity.
I'm concerned here that you deviate from the IUPAC definition, because it specifies the exact isotopic state for this mass: I think this should be "Exact" (given your comment you don't want repetition).
OK for ugly name here but then please MolWeightIgnoringSpecified? Or rather IgnoringIsotopes?
See above I think this should be MonoIsotopic (it gives people the MonoIso mass irrespective of the input isotopes, which is exactly in line with the IUPAC definition; Exact will give MonoIso only if isotopes are unspecified) - there will be cases for mass spec where this is very useful
All good with this one still as far as I can tell
Looks like you hit the nail on the head ... Do you see a need for the Nominal Mass in addition? As far as I can tell it's just a simple operation from MonoIso and shouldn't need to be hard coded (and becoming slowly obsolete ...) |
johnmay commentedFeb 18, 2019
Adds
getMass()
functions to AtomContainerManipulator and MolecularFormulaManipulator unify the existing calculations. Existing methods are deprecated pointing to the new API.Some open questions, I used integers for the options but a new 'MassType' enum might be useful. I really don't like adding new class but in this case might make sense. Naming of the options is flexible, would be good to get Emma's thoughts on this (can't find her GitHub handle ATM).
The tests summarise it nicely: