Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upExternal report: issues with conflicting stereochemistry in identifiers #70
Comments
This comment has been minimized.
This comment has been minimized.
Well, another good example why MassBank meta data needs curation. The people frequently approach us now and this is a good sign that the community is interested in MassBank. However, if errors are not handled, the people will loose reliability. We are on a good way. |
This comment has been minimized.
This comment has been minimized.
Next answer, plus a list of affected identifiers. @egonw is following this up on the Wikidata side, @meier-rene @Treutler we will need to follow this up on the MassBank side to address the immediate issue, plus add some ideas how to catch these cases in the validator. I think we may be able to do this with checking identifiers for consistency and flagging clashes? MassBank/MassBank-web#158
The list: JP000136 |
This comment has been minimized.
This comment has been minimized.
egonw
commented
May 21, 2019
I want to stress that this is not caused by our data import into Wikidata, not by MassBank. This examples is caused by an merger of two Wikidata items with different InChIKeys. I'm still exploring how this happened, as the person who did it is an experience chemist. These things do happen because of inconsistencies in Wikipedia and if you clean them, it can have downstream effects that are not always easy to detect (without automated, regular tests). |
This comment has been minimized.
This comment has been minimized.
So, if this is not caused by problems on the MassBank side, we just need to double-check that these records have structural identifiers that are consistent within themselves (MassBank/MassBank-web#158 (comment)), and if so, we close the issue our side. Do I understand that correctly? |
schymane commentedMay 19, 2019
Copy-paste from email received; @meier-rene are you able to follow-up? Thx!
Comparing data from different databases, I found some discrépancies between your data. For the mentioned entry of your database (https://massbank.eu/MassBank/RecordDisplay.jsp?id=OUF00136), the chemical structure indicates that the configuration of the double bond is not defined. This configuration is defined in other databases as InChIKey CWVRJTMFETXNAD-NCZKRNLISA-N:
See:
PubChem: https://pubchem.ncbi.nlm.nih.gov/compound/9476
ChEBI: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:95271
ChEMBL: https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL3186431/
EPA: https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID3024786
Could you check please if the definition of your entry is correct and if the chemical structure is the correct one of if the structural identifiers are wrong ?
The problem is the same for other entries like FIO00619, JP000136, FIO00623... where the chemical structure is not correct compared to the stereoconfiguration at the origin of InChIKey CWVRJTMFETXNAD-JUHZACGLSA-N. This InChIKey requires the definition of the 4 chiral carbons on the ring. Please see:
ChEBI: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:16112
CHEMBL: https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL284616/