Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DTXSIDs to all MassBank records with InChIKey match #66

Open
schymane opened this issue May 7, 2019 · 11 comments

Comments

Projects
None yet
2 participants
@schymane
Copy link
Member

commented May 7, 2019

@meier-rene @Treutler the EPA have set up a basic service that should allow retrieval of DTXSIDs by InChIKey, can you look into implementing this on the database end to add DTXSIDs to all records with matching entries for now, I will post a separate issue to get this into RMassBank and linked up in MassBank-web.
It's already in our Record format as
CH$LINK: COMPTOX DTXSID50274017
(https://github.com/MassBank/MassBank-web/blob/master/Documentation/MassBankRecordFormat.md)

https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.json?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.xml?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N

Any feedback re service to @ChemConnector

Thanks!

@meier-rene

This comment has been minimized.

Copy link
Collaborator

commented May 7, 2019

I will take care of this.

And I would like to give a short update about a related topic: I curated all records with any structural information available to contain proper InChI and InChI-Keys. There are just 900 records left which dont have structural information, just chemical names.

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 7, 2019

Great!
Can you post a list somewhere of the 900, with basic details like name, accession etc? Some of them are "tentative", but I am not sure we have that many ... I would be curious ... Thanks!

@meier-rene

This comment has been minimized.

Copy link
Collaborator

commented May 8, 2019

noStructure.txt
The list of all records without a Structure given.

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

Oh interesting ... so the EawagAdditional are ones that almost certainly don't have a structure because they are tenative records ... but I see a lot from BS, Fac_Eng_Univ_Tokyo (major culprit) and even IPB Halle! @sneumann should be able to comment about the latter ... do you see a systematic issue (one critical identifier missing that we could fill in with other information available) with BS and Fac_Eng_Univ_Tokyo?

@meier-rene

This comment has been minimized.

Copy link
Collaborator

commented May 8, 2019

There are roughly 60 records with other database identifier, like CAS, which I could use to retrieve proper chemical information. The remaining records have only chemical names. Needs manual lookup and might be unsuccessful in some cases. This will take some time...

Different topic:
Please could someone explain the difference between DTXCID and DTXSID? The code for adding COMPTOX id is nearly finished.

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

C = compound/chemical and S = substance. The "C" entries are the unique chemical (~~ "MS-ready" forms (put simply)) and the "S" entries are the official database entry.
Effectively we should always use and link via the substance identifier, the DTXSID

image

image

Check out infoboxes here (@ChemConnector note inconsistencies in the DTXCID!)
https://comptox.epa.gov/dashboard/dsstoxdb/batch_search

@meier-rene

This comment has been minimized.

Copy link
Collaborator

commented May 8, 2019

Sorry, didnt understand this concept.

On pubchem we have SID which is something like the label on a bottle with chemicals and could potentially be a mixture and we have CID which is a uniqe compound which is represented by exactly one formula(like you would draw on a paper).

Thats why more questions:
Does this mean that there might be several DTXSID for one InChI-Key?
Is there a 1 to n relation between DTXCID and DTXSID like in pubchem?

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

As far as I'm aware it's a one DTXSID per InChIKey. The service should return us one DTXSID for one InChIKey request and this is what @ChemConnector asked us to do, use InChIKey to DTXSID to add these identifiers to MassBank .. (therefore I'm assuming this is the most robust way in his opinion and from my experience, I'd agree)

One DTXSID may have multiple DTXCIDs associated with it. It's a bit different to the PubChem construct. imho we should not yet try mapping on DTXCIDs as they don't have the full functionality associated with them like the DTXSIDs, until recently they were hidden entirely.

Some examples:
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=nicotine
https://comptox.epa.gov/dashboard/dsstoxdb/ms_ready_mixture?cid=28128

https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID10858175
This one has two DTXCIDs associated with it:
image

@meier-rene

This comment has been minimized.

Copy link
Collaborator

commented May 14, 2019

I have created a program which can add these identifier with the help of the InChI-key to DTXSID resolver and have processed all records. We have now 39962 outlinks in place. This program can be executed on all new records and also on a regular basis on the existing records. I think this one can be closed.

@meier-rene meier-rene closed this May 14, 2019

@meier-rene

This comment has been minimized.

Copy link
Collaborator

commented May 16, 2019

Reopen until #68 is solved.

@meier-rene meier-rene reopened this May 16, 2019

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 21, 2019

@ChemConnector has added additional services that might be of interest.
NOTE that these actor-based web services will be switched off next year and replaced with CompTox ones once they are up and running.

Data Source: dsstox v02

https://ni.epa.gov/actorws/dsstox/v02/msready?identifier=80-05-7
https://ni.epa.gov/actorws/dsstox/v02/msready.json?identifier=80-05-7
https://ni.epa.gov/actorws/dsstox/v02/msready.xml?identifier=80-05-7

https://ni.epa.gov/actorws/dsstox/v02/msready?identifier=DTXCID60513
https://ni.epa.gov/actorws/dsstox/v02/msready.json?identifier=DTXCID60513
https://ni.epa.gov/actorws/dsstox/v02/msready.xml?identifier=DTXCID60513

https://ni.epa.gov/actorws/dsstox/v02/msready?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N
https://ni.epa.gov/actorws/dsstox/v02/msready.json?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N
https://ni.epa.gov/actorws/dsstox/v02/msready.xml?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N

https://ni.epa.gov/actorws/dsstox/v02/qsar?identifier=80-05-7
https://ni.epa.gov/actorws/dsstox/v02/qsar.json?identifier=80-05-7
https://ni.epa.gov/actorws/dsstox/v02/qsar.xml?identifier=80-05-7

https://ni.epa.gov/actorws/dsstox/v02/qsar?identifier=DTXCID60513
https://ni.epa.gov/actorws/dsstox/v02/qsar.json?identifier=DTXCID60513
https://ni.epa.gov/actorws/dsstox/v02/qsar.xml?identifier=DTXCID60513

https://ni.epa.gov/actorws/dsstox/v02/qsar?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N
https://ni.epa.gov/actorws/dsstox/v02/qsar.json?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N
https://ni.epa.gov/actorws/dsstox/v02/qsar.xml?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N

image

The hyperlinks to MS Ready and QSAR Ready forms are added the resolver service.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.