Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upInclude scan number in output spectra/MassBank records #218
Comments
This comment has been minimized.
This comment has been minimized.
We (@tsufz and I) would propose to have a new field to do this: @MaliRemorker @adelenelai the scan number is already used/available in the RMassBank workflow, do you want to investigate further? Will help debugging if it's in the records not just the failpeaks list. |
This comment has been minimized.
This comment has been minimized.
meowcat
commented
Jul 2, 2019
Hmm... On what time scale is this important? I'm asking because in the best of cases, I would like to substitute the actual record generation step by simply casting into a template using github.com/MSnio. It can easily be hacked in already, of course. |
This comment has been minimized.
This comment has been minimized.
I would prefer a quick hack for now so that we can start playing around with linking raw data |
This comment has been minimized.
This comment has been minimized.
meier-rene
commented
Jul 2, 2019
I would appreciate it if you could create a minimal example for me. |
This comment has been minimized.
This comment has been minimized.
Do you need a minimal example for RMassBank? Or MassBank-data? |
This comment has been minimized.
This comment has been minimized.
meier-rene
commented
Jul 2, 2019
Just an example record file. |
This comment has been minimized.
This comment has been minimized.
Basically, the scan number is already stored internally in RMassBank, we just need to print it out - but haven't yet actually done this. There also isn't the accepted tag yet - do you agree with MS$DATA_PROCESSING: SCAN_NUMBER 1234 ? |
This comment has been minimized.
This comment has been minimized.
meier-rene
commented
Jul 2, 2019
For me this is fine. I could include this into the record spec file. But if I understand it correctly this is related to the raw data file linking. Is it directly connected or does it make sense without the raw data linking? |
This comment has been minimized.
This comment has been minimized.
It makes less sense without the raw data linking, but would still be useful. For instance, we are now doing a prescreening workflow internally and if we can actually print the scan number into the record, it helps us debug our results in the vendor file. Eventually our plans are that when we upload the records, we will also deposit the raw data with the records and add those respective fields as well. For instance here is our "fail peak" list with scan number: and this would mean that e.g. https://massbank.eu/MassBank/RecordDisplay.jsp?id=EA281702 would have MS$DATA_PROCESSING: SCAN_NUMBER 558 |
This comment has been minimized.
This comment has been minimized.
meier-rene
commented
Jul 2, 2019
Ok, then I will implement this two items independently, which makes it easier. |
This comment has been minimized.
This comment has been minimized.
Makes sense. Then people will be able to link raw data without specifying the scan, and specify scan without linking raw data. I see that both cases will happen, even if the ideal is that they are coupled, we do not want to eliminate one because the other is missing. |
This comment has been minimized.
This comment has been minimized.
meowcat
commented
Jul 2, 2019
Is DATA_PROCESSING the right tag for this? Shouldn't DATA_PROCESSING say which steps were taken to process the data? Retention time, which is analogous to scan number, is in CHROMATOGRAPHY. (But I understand that here the important thing is the link to the raw data and not so much the "time" dimension of the scan #.) By the way, how would you deal with spectra that are derived from multiple scans? |
This comment has been minimized.
This comment has been minimized.
meowcat
commented
Jul 2, 2019
Maybe we shouldn't be too shy to implement a new |
This comment has been minimized.
This comment has been minimized.
@tsufz and I iterated through a few options and ended up at DATA_PROCESSING as the best but not perfect solution. A new MS$ tag would be an idea too. So something like this? MS$RAW: SCAN_NUMBER 1234 or MS$SCAN: 1234 1235 1236 1237 [the rest for the record were] |
This comment has been minimized.
This comment has been minimized.
Hi, I would like to keep this not too far from the mzML specification http://www.psidev.info/mzML
And we do need to be able to reference multiple spectra, which could be a comma-separated list, or a dash-separated range. We also need the raw data filename and/or direct download URL, to which we index. This is in addition to the DOI or MTBLS accession number. DOI might only refer to a ZIP file. Yours, Steffen |
schymane commentedJul 1, 2019
See discussion
MassBank/MassBank-data#79
Including the scan number will allow users to extract also the raw data - this will require also that we cross-link our records to the raw data (same discussion)