Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External report: issues with conflicting stereochemistry in identifiers #70

Open
schymane opened this issue May 19, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@schymane
Copy link
Member

commented May 19, 2019

Copy-paste from email received; @meier-rene are you able to follow-up? Thx!

Comparing data from different databases, I found some discrépancies between your data. For the mentioned entry of your database (https://massbank.eu/MassBank/RecordDisplay.jsp?id=OUF00136), the chemical structure indicates that the configuration of the double bond is not defined. This configuration is defined in other databases as InChIKey CWVRJTMFETXNAD-NCZKRNLISA-N:

See:

PubChem: https://pubchem.ncbi.nlm.nih.gov/compound/9476
ChEBI: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:95271
ChEMBL: https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL3186431/
EPA: https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID3024786

Could you check please if the definition of your entry is correct and if the chemical structure is the correct one of if the structural identifiers are wrong ?

The problem is the same for other entries like FIO00619, JP000136, FIO00623... where the chemical structure is not correct compared to the stereoconfiguration at the origin of InChIKey CWVRJTMFETXNAD-JUHZACGLSA-N. This InChIKey requires the definition of the 4 chiral carbons on the ring. Please see:

ChEBI: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:16112
CHEMBL: https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL284616/

@tsufz

This comment has been minimized.

Copy link
Member

commented May 20, 2019

Well, another good example why MassBank meta data needs curation. The people frequently approach us now and this is a good sign that the community is interested in MassBank. However, if errors are not handled, the people will loose reliability. We are on a good way.

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 21, 2019

Next answer, plus a list of affected identifiers. @egonw is following this up on the Wikidata side, @meier-rene @Treutler we will need to follow this up on the MassBank side to address the immediate issue, plus add some ideas how to catch these cases in the validator. I think we may be able to do this with checking identifiers for consistency and flagging clashes? MassBank/MassBank-web#158

As we agree about the problem of the chemical structure and the structural identifiers (mainly InChIKey and InChI) I can provide a full list of entries of MassBank to check: I am curating chemicals entries in Wikidata and I found that somebody uploaded all MAssBank entries with InChIKey = CWVRJTMFETXNAD-JUHZACGLSA-N to the wrong item. I don't check all entries mentioned on that page https://www.wikidata.org/wiki/Q27167119 (scroll down to find the Mass Bank identifiers) but I think that most of entries have no defined chiral centers and should have the InChIKey = CWVRJTMFETXNAD-UHFFFAOYSA-N according to the chemical structure.

The list:

JP000136
FIO00618
FIO00619
FIO00620
FIO00621
FIO00622
FIO00623
FIO00624
FIO00625
FIO00626
FIO00627
PB005541
PB006181
PB006182
KO000466
KO000467
KO000468
KO000469
KO000470
KO002577
KO002578
KO002579
KO002580
KO002581
KO008922
KO008923
OUF00135
OUF00136

@egonw

This comment has been minimized.

Copy link

commented May 21, 2019

I want to stress that this is not caused by our data import into Wikidata, not by MassBank. This examples is caused by an merger of two Wikidata items with different InChIKeys. I'm still exploring how this happened, as the person who did it is an experience chemist. These things do happen because of inconsistencies in Wikipedia and if you clean them, it can have downstream effects that are not always easy to detect (without automated, regular tests).

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 21, 2019

So, if this is not caused by problems on the MassBank side, we just need to double-check that these records have structural identifiers that are consistent within themselves (MassBank/MassBank-web#158 (comment)), and if so, we close the issue our side. Do I understand that correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.