Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upCTS retrieving non-live CIDs #225
Comments
This comment has been minimized.
This comment has been minimized.
Hi Emma,
This is a great enhancement. Using the source data source is always the
best option.
Have a nice weekend,
Tobi
|
This comment has been minimized.
This comment has been minimized.
getPcId will also return non-live CIDs for non-standard tautomeric forms. We can fix this using a function in RChemMass, but we should make sure we upgrade the getPcId function to automatically do this. On the "todo" list ...
|
This comment has been minimized.
This comment has been minimized.
@meowcat I will need to upgrade getPcId to make sure it returns "live" CIDs, this is fine - but where can I check whether we grab PubChem CIDs from PubChem vs CTS? If we already use getPcId (not CTS), then all I will need to do is fix that function, then this issue is solved. Thanks. |
This comment has been minimized.
This comment has been minimized.
So: Line 577 in 611b785 Lines 602..608 is where we get CTS data. To do: what do we still need from CTS at this point? Lines 602 to 608 in 611b785 Lines 775-786 is where we decide which PubChem ID to use. I guess you want to drop CTS completely as an option? Lines 775 to 786 in 611b785 Then the actual data retrieval from PubChem is in gatherPubChem, where getPcId is called: Lines 454 to 466 in 611b785 getPcId is then the function in webAccess,R: Lines 109 to 144 in 611b785 |
This comment has been minimized.
This comment has been minimized.
@meowcat I've created a new branch: I've added getPCIDs.CIDtype, adjusted getPcId and createMassBank.R (and updated my old email address). I'm stuck on the documentation - see emails. |
This comment has been minimized.
This comment has been minimized.
Just pushed 445b432 thanks to @MaliRemorker for docs tips. |
schymane commentedDec 13, 2019
We should make our default source of CIDs PubChem, and not CTS. There are too many discrepancies/error cropping up.
@meier-rene we may have to check the "status" of CIDs during validation, to catch and fix.
Example from freshly-created infolist:
https://pubchem.ncbi.nlm.nih.gov/compound/4644