Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create file for PubChem deposition at every release #102

Open
schymane opened this issue Nov 21, 2019 · 4 comments

Comments

@schymane
Copy link
Member

@schymane schymane commented Nov 21, 2019

It would be great if we could auto-create a file to deposit in PubChem with every stable release of MassBank-data.
To discuss: compound information only (=> relatively easy) or mappings with spectral IDs (slightly more info needed) or actual spectra as well (more work our side).
Shall we start with getting a deposit file for compound information only? Then we need e.g.:

PUBCHEM_EXT_DATASOURCE_REGID <= InChIKey, or any unique identifier our side
PUBCHEM_EXT_DATASOURCE_SMILES <= SMILES
PUBCHEM_EXT_DATASOURCE_CID <= PubChem CID (if available)
PUBCHEM_SUBSTANCE_COMMENT <= here we could e.g. provide accession IDs, collapsed
PUBCHEM_SUBSTANCE_SYNONYM <= any names our side (can have multiple columns, but maybe e.g. max 3 would be sensible)

@meier-rene @sneumann @tsufz what do you think?
If yes, who will look after the file?
I would contact PubChem to get us a MassBank login for deposition, so credit goes to MassBank(EU) and we can track our submissions.

@sneumann

This comment has been minimized.

Copy link
Member

@sneumann sneumann commented Nov 21, 2019

Hi, we have started to embed bioschemas information,
e.g. Line 60+ in view-source:https://msbi.ipb-halle.de/MassBank/RecordDisplay2?id=PB006301
and I would prefer if PubChem adopts that. Benefit would be that this way
they can scrape other Bioschemas compatible stuff, I would reckon that Wikipathways
also embeds such information. Otherwise we end up generating and maintaining mappings for PubChem, ChemSpider, CompTox, ... separately. Yours, Steffen

@schymane

This comment has been minimized.

Copy link
Member Author

@schymane schymane commented Nov 21, 2019

Let's discuss with Evan and @egonw at Dagstuhl then ...

@schymane

This comment has been minimized.

Copy link
Member Author

@schymane schymane commented Nov 22, 2019

adding @AlasdairGray in to contribute his Bioschemas wisdom ...

@egonw

This comment has been minimized.

Copy link

@egonw egonw commented Nov 22, 2019

@AlasdairGray, I guess a good first step forward is to have that aggregator website you just showed crawl the MassBank website and extract the chemical structures and data record JSON-LD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.