Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include scan number in output spectra/MassBank records #218

Open
schymane opened this issue Jul 1, 2019 · 15 comments

Comments

Projects
None yet
4 participants
@schymane
Copy link
Member

commented Jul 1, 2019

See discussion
MassBank/MassBank-data#79

  • PSI identifier
  • GNPS example

Including the scan number will allow users to extract also the raw data - this will require also that we cross-link our records to the raw data (same discussion)

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jul 1, 2019

We (@tsufz and I) would propose to have a new field to do this:
MS$DATA_PROCESSING: SCAN_NUMBER 1234
and we will post an issue in MassBank-web
@meier-rene @Treutler

@MaliRemorker @adelenelai the scan number is already used/available in the RMassBank workflow, do you want to investigate further? Will help debugging if it's in the records not just the failpeaks list.

@meowcat

This comment has been minimized.

Copy link

commented Jul 2, 2019

Hmm... On what time scale is this important? I'm asking because in the best of cases, I would like to substitute the actual record generation step by simply casting into a template using github.com/MSnio. It can easily be hacked in already, of course.

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

I would prefer a quick hack for now so that we can start playing around with linking raw data

@meier-rene

This comment has been minimized.

Copy link

commented Jul 2, 2019

I would appreciate it if you could create a minimal example for me.

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

Do you need a minimal example for RMassBank? Or MassBank-data?

@meier-rene

This comment has been minimized.

Copy link

commented Jul 2, 2019

Just an example record file.

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

Basically, the scan number is already stored internally in RMassBank, we just need to print it out - but haven't yet actually done this. There also isn't the accepted tag yet - do you agree with MS$DATA_PROCESSING: SCAN_NUMBER 1234 ?

@meier-rene

This comment has been minimized.

Copy link

commented Jul 2, 2019

For me this is fine. I could include this into the record spec file. But if I understand it correctly this is related to the raw data file linking. Is it directly connected or does it make sense without the raw data linking?

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

It makes less sense without the raw data linking, but would still be useful. For instance, we are now doing a prescreening workflow internally and if we can actually print the scan number into the record, it helps us debug our results in the vendor file. Eventually our plans are that when we upload the records, we will also deposit the raw data with the records and add those respective fields as well. For instance here is our "fail peak" list with scan number:
image

and this would mean that e.g. https://massbank.eu/MassBank/RecordDisplay.jsp?id=EA281702 would have MS$DATA_PROCESSING: SCAN_NUMBER 558

@meier-rene

This comment has been minimized.

Copy link

commented Jul 2, 2019

Ok, then I will implement this two items independently, which makes it easier.

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

Makes sense. Then people will be able to link raw data without specifying the scan, and specify scan without linking raw data. I see that both cases will happen, even if the ideal is that they are coupled, we do not want to eliminate one because the other is missing.

@meowcat

This comment has been minimized.

Copy link

commented Jul 2, 2019

Is DATA_PROCESSING the right tag for this? Shouldn't DATA_PROCESSING say which steps were taken to process the data?

Retention time, which is analogous to scan number, is in CHROMATOGRAPHY. (But I understand that here the important thing is the link to the raw data and not so much the "time" dimension of the scan #.)

By the way, how would you deal with spectra that are derived from multiple scans?

@meowcat

This comment has been minimized.

Copy link

commented Jul 2, 2019

Maybe we shouldn't be too shy to implement a new MS$ tag for provenance, but I don't know how hard this is on the database side.

@schymane

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

@tsufz and I iterated through a few options and ended up at DATA_PROCESSING as the best but not perfect solution. A new MS$ tag would be an idea too. So something like this?

MS$RAW: SCAN_NUMBER 1234

or MS$SCAN: 1234 1235 1236 1237
(space separated in the case of multiple? other suggestions?)

[the rest for the record were]
MS$RAW: DOI ...
MS$RAW: GNPS ....
MS$RAW: METABOLIGHTS ...
MS$RAW: METABOLOMICSWB ...
MS$RAW: ZENODO ...

@sneumann

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

Hi, I would like to keep this not too far from the mzML specification http://www.psidev.info/mzML
where spectrum references have been discussed in-depth. There are two flavours in
http://www.peptideatlas.org/tmp/mzML1.1.0.html#spectrum:

  1. The index, which is The zero-based, consecutive index of the spectrum in the SpectrumList.
  2. The id which is The native identifier for a spectrum. which e.g. captures the function for Waters, with examples in http://www.peptideatlas.org/tmp/mzML1.1.0.html#sourceFile

And we do need to be able to reference multiple spectra, which could be a comma-separated list, or a dash-separated range.

We also need the raw data filename and/or direct download URL, to which we index. This is in addition to the DOI or MTBLS accession number. DOI might only refer to a ZIP file.

Yours, Steffen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.