Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upInconsistency in data for records on MssBank vs MoNA #63
Comments
This comment has been minimized.
This comment has been minimized.
As for that particular example, we render from smiles, the smiles had no stereochem but the InChIKey does, this is inconsistent ...
Does this record pass validation @meier-rene ?
I am not sure what field Mona renders from. No hidden fields our side.
…-------------------------------------------
PI: EnvCheminf @ LCSB
FNR ATTRACT Fellow
emma.schymanski@uni.lu
On Fri, Apr 26, 2019 at 11:30 PM +0200, "ChemConnector" <notifications@github.com<mailto:notifications@github.com>> wrote:
I am comparing the MoNA record at http://mona.fiehnlab.ucdavis.edu/spectra/display/BSU00002 with the MassBank record at https://massbank.eu/MassBank/RecordDisplay.jsp?id=BSU00002
I see stereochem in the structure depiction on MoNA but not in the MassBank record. I assume that InChIs are the basis of the stereo on MoNA but the SMILES has no stereochem on MassBank. The inconsistency is confusing. Is there a StereoSMILES in MassBank that is not displayed?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#63>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AA7BV7MRBMM35VYEJJGLWS3PSNX5JANCNFSM4HI2OUNQ>.
|
This comment has been minimized.
This comment has been minimized.
ssmehta
commented
Apr 27, 2019
Just to note that in this case MoNA would generate the displayed structure from the InChI. In general, it depends on what is provided - MOL data is given preference for validation and display purposes followed by InChI, then SMILES, and finally falling back to an InChIKey/CSID/PubChem CID/CAS lookup if they are provided. |
This comment has been minimized.
This comment has been minimized.
Yes, it passes validation. Atm we do not check identity of molecular structures in InChI and SMILES. Only molecular sum formula is compared to molecular formula given. There are probably hundreds of consistencies of this kind in the data and I can not think of an automatic procedure to fix this properly. Thats why I havent implemented this identity check. |
This comment has been minimized.
This comment has been minimized.
Thanks for clarifying @ssmehta ! |
This comment has been minimized.
This comment has been minimized.
I would like to help in any way I can to map/curate/collapse the data and provide the appropriate SMILES strings to use with CDK depiction. I understand it will be an incremental effort and take time but @schymane and I been working iteratively on data streams for a couple of years now. For example, having the correct structure for cholesterol should be easily achievable: https://massbank.eu/MassBank/Result.jsp?compound=cholesterol&op1=and&mz=&tol=0.3&op2=and&formula=&type=quick&searchType=keyword&sortKey=not&sortAction=1&pageNo=1&exec=&inst_grp=ESI&inst=CE-ESI-TOF&inst=ESI-ITFT&inst=ESI-ITTOF&inst=ESI-QTOF&inst=ESI-TOF&inst=LC-ESI-IT&inst=LC-ESI-ITFT&inst=LC-ESI-ITTOF&inst=LC-ESI-Q&inst=LC-ESI-QFT&inst=LC-ESI-QIT&inst=LC-ESI-QQ&inst=LC-ESI-QTOF&inst=LC-ESI-TOF&ms=MS2&ion=0 I understand that we would likely not have DTXSIDs for all chemicals in the combined MassBank EU and JP, and that it could be very difficult to curate some of the data. However, I think we can make good progress in providing fully defined stereoforms of SMILES, InChIs, molfiles if necessary, mapped DTXSIDs for more than is available at present. For tasks like this I am willing to dedicate some time every day to check and curate as appropriate, Would be best to coordinate the process through @schymane based on our previous experiences on doing this on other datasets. |
ChemConnector commentedApr 26, 2019
I am comparing the MoNA record at http://mona.fiehnlab.ucdavis.edu/spectra/display/BSU00002 with the MassBank record at https://massbank.eu/MassBank/RecordDisplay.jsp?id=BSU00002
I see stereochem in the structure depiction on MoNA but not in the MassBank record. I assume that InChIs are the basis of the stereo on MoNA but the SMILES has no stereochem on MassBank. The inconsistency is confusing. Is there a StereoSMILES in MassBank that is not displayed?