Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DTXSIDs into RMassBank identifiers #215

Open
schymane opened this issue May 7, 2019 · 8 comments

Comments

Projects
None yet
5 participants
@schymane
Copy link
Member

commented May 7, 2019

The EPA have preliminary webservices now available to retrieve DTXSIDs by InChIKey, we should add this into RMassBank to retrieve these for new records (and note that this may change again in the future)

https://github.com/MassBank/RMassBank/blob/master/R/webAccess.R

https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.json?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.xml?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N

@adelenelai would you be interested in looking into this?

We are looking at doing this initially database side to get DTXSIDs into records already created:
MassBank/MassBank-data#66
and
MassBank/MassBank-web#80

@adelenelai

This comment has been minimized.

Copy link

commented May 8, 2019

working on it

@meier-rene

This comment has been minimized.

Copy link

commented May 14, 2019

We have added this in MassBank. If you like you can skip this because its easy to post process records in MassBank.

@tsufz

This comment has been minimized.

Copy link
Member

commented May 14, 2019

@meier-rene I don't agree to process everything with MassBank only. MassBank records should be applicable on a private repository with parsing tools without MassBank in game. DTXSID is an important identifier and thus it should be processed with RMassBank.

@adelenelai

This comment has been minimized.

Copy link

commented May 16, 2019

Hi, EPA webservices seem to be down now, the links in @schymane 's original post don't work @ChemConnector
Are there new URLs which work?

@ChemConnector

This comment has been minimized.

Copy link

commented May 16, 2019

@schymane

This comment has been minimized.

Copy link
Member Author

commented May 17, 2019

The services should be up again @adelenelai can you check if they work for you now?
See also MassBank/MassBank-data#68 for updates

@adelenelai

This comment has been minimized.

Copy link

commented May 27, 2019

Note (mostly to self, also for documentation):

It is possible for a DTXSID to not exist for a particular compound, because not all stereochemistries of a compound exist in the CompTox Dashboard.

Assuming webservices and Dashboard are in sync, (see MassBank/MassBank-data#68) any attempts to manually curate infolists in Step 2 of mbWorkflow for DTXSID will not be successful either - the DTXSID just does not exist!

Therefore, if final MB record does not contain the field CH$LINK: COMPTOX, it should not be perceived as bug but rather reflects inherent non-existence of that particular DTXSID. (This behaviour was modelled after that of other pre-existing identifiers e.g. CHEBI.)

Whether an explicit declaration should be incorporated for these cases e.g. CH$LINK: COMPTOX **none found** is another issue...

In future: if the DTXSID does come into existence over time, post-MB-record-generation-and-upload, Rene's post-processing would be handy.

adelenelai pushed a commit to adelenelai/RMassBank that referenced this issue May 28, 2019

Adelene Lai
Add getCompTox to Mbworkflow MassBank#215
getCompTox retrieves DTXSID from EPA webservices using InChiKey. Modify generation of infolists and of final Mbrecord to include DTXSID.
Resolves MassBank#215

adelenelai pushed a commit to adelenelai/RMassBank that referenced this issue May 28, 2019

Adelene Lai
Add getCompTox to Mbworkflow MassBank#215
getCompTox retrieves DTXSID from EPA webservices using InChiKey. Modify generation of infolists and of final Mbrecord to include DTXSID.
Resolves MassBank#215

@adelenelai adelenelai referenced a pull request that will close this issue May 28, 2019

Open

Add getCompTox to Mbworkflow #215 #217

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

Yes, it is common that identifiers are missing, so modelling this on the way that ChEBI identifiers are handled is the right way to go ... I don't think we need an explicit declaration because the fact that it is missing is implicitly clear in the absence of a corresponding entry in the infolist. Some of the identifiers like KEGG, ChEBI and LipidMaps have very few matches and explicit statements would get annoying over time... and yes, Rene's workflow will catch those that do appear later.
The validation should account for potential identifier deprecation over time ... but this is another (trickier) topic ... @meier-rene

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.