Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecated records #171

Open
tsufz opened this issue Apr 3, 2019 · 15 comments

Comments

Projects
None yet
6 participants
@tsufz
Copy link
Member

commented Apr 3, 2019

@schymane and me chatted a little bit about the handling of deprecated records. We agreed that those records should be tagged and not removed due to historical reasons. We suggest to move those records to a specific deprecated folder and than tagged with a deprecated tag which should be add to the tille like 13a-Hydroxylupanin; LC-ESI-ITFT; MS2; CE: 10%; R=15000; [M+H]+ (deprecated).

The tag could be and placed directly under the accession
Deprecated: This recorded was deprecated on date()

Should we give a comment why the record was deprecated (for learning reasons).

@meier-rene

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

Hi all,
today we talked a bit about a proper way to implement a mechanism for record deprecation:

  • I think we all agree, that the accession should stay occupied by the deprecated record, so deletion is not an option.
  • On the other hand deprecated records can not be treated like normal records by the database because they might be faulty and can not be validated successfully.

An option would be to introduce a tag, lets call it '[DEPRECATED]' and put it in the title. Because we want to limit the potential of breaking 3rd party clients we propose the tag like this:

RECORD_TITLE: Bassanolide; LC-ESI-ITFT; MS2; CE: 55; R=17500; [M+Na]+

make this deprecated:

RECORD_TITLE: [DEPRECATED] Bassanolide; LC-ESI-ITFT; MS2; CE: 55; R=17500; [M+Na]+

The first field in the record title is free format and this change will most likely not break 3rd party clients.

Treatment of deprecated records by MassBank:
Records marked with '[DEPRECATED]' will not be found by keyword search, because they will not be parsed and written to the database. The only possible operation on deprecated records is a record display of plain text without displaying a spectrum or a chemical structure.

Of course its possible to introduce the reason for deprecation in the COMMENT section.

I would appreciate your opinion...

@schymane

This comment has been minimized.

Copy link
Member

commented May 3, 2019

I like this way of handling it in general ... I am just wondering if we should add another (non-compulsory) tag, so that it is not just in the title, as an alternative / in addition to the reason for deprecation in COMMENT. Reason being is that I for one do not often parse the TITLE field, but I would e.g. parse a "DEPRECATED" field if I want to check that nothing's deprecated ... (otherwise how would we detect it without parsing the title field)?
A "DEPRECATED" field could inherently either be empty or contain the reason for deprecation (rather than requiring yet another COMMENT)?

@schymane

This comment has been minimized.

Copy link
Member

commented May 3, 2019

Alternative: make a systematic "COMMENT [DEPRECATED]" recommendation for adding the reasons. But then we may as well add an official tag ... it's also rare to parse the comments because they are free text and rather difficult to process automatically...

@Treutler

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

Potentially, deprecated records are not syntactically valid anymore. Hence, we can add every field we want.
Possibilities would be fields like

  • COMMENT: DEPRECATED considered noisy (03/05/19)
  • DEPRECATED: considered noisy (03/05/19)
    In the first case we would adapt fields like COMMENT: CONFIDENCE and the latter case would be a completely new field.
@schymane

This comment has been minimized.

Copy link
Member

commented May 3, 2019

@tsufz also made the point that we could/should document date of deprecation (I see you just added that), I'd suggest date first and prefer this:

  • DEPRECATED: 2019-05-03 considered noisy
@schymane

This comment has been minimized.

Copy link
Member

commented May 3, 2019

We should also add this to the Record Specification. With DEPRECATED: YYYY-MM-DD free text we'd have good flexibility to auto-parse the essential information and leave flexibility for reasons. Optional addition of name/github handle of curator/deprecator?

@meier-rene

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

If we add a new field like DEPRECATED: considered noisy (03/05/19) we might break 3rd party software. On the other hand we might prevent 3rd party software to process invalid data with this new tag.

So the question here is: Do we want to force 3rd party software to be aware of our deprecation mechanism?

@schymane

This comment has been minimized.

Copy link
Member

commented May 3, 2019

I guess that could be avoided by COMMENT: DEPRECATED: 2019-05-03 considered noisy
I personally still prefer a new tag but would accept either, just need to know as we need to mark some soon ;-)

@meowcat

This comment has been minimized.

Copy link

commented May 3, 2019

Do we want to force 3rd party software to be aware of our deprecation mechanism?

Except for RMassBank, what software will be bothered by this?

@Treutler

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

Basically every piece of software which parses MassBank records. I guess that MoNA parses MassBank records from time to time or some scripts of scientists using MassBank records...

@tsufz

This comment has been minimized.

Copy link
Member Author

commented May 3, 2019

Hi,

  1. I prefer also a dedicated tag. This is the easiest for third party developers to avoid deprecated records.
  2. I would not remove the spectral part in order of data consistency. The reader of a paper must be able to review the old record in order to decide if the message of the paper is reliable or not.
  3. I suggest to move all deprecated records to an own folder.
  4. We should actively advice known third party software maintainers to implement a respective controlling structure in their software (e.g. by announcement at MassBank.eu and by writting issues if the software is OS). It might be also possible to use contacts to the machine vendors to pass the issue to their software developers.
@Treutler

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

I agree to 1, 2, and 4.

I suggest to move all deprecated records to an own folder.

I think we should leave the records at the same place. This has the advantage that we can avoid to assign the same accession code two times, the assignment of the records to the contributors is very clear, and it is not necessary to move records. What is the rationale of moving deprecated records to a separate folder?

@schymane

This comment has been minimized.

Copy link
Member

commented May 3, 2019

Agree with @Treutler re all points ....

@sneumann

This comment has been minimized.

Copy link
Member

commented May 3, 2019

Hi, one more thought: one could remove (large) parts of the record,
and point to the last git state of the record:

ACCESSION: SMI00034
RECORD_TITLE: Glucolesquerellin; LC-ESI-QTOF; MS2; CE:40 eV;
DATE: 2012.08.31 (Created 2012.08.31)
AUTHORS: S. Neumann: IPB-Halle, Germany & E. Schymanski: Eawag, Switzerland
LICENSE: CC BY
COPYRIGHT: CASMI2012
PUBLICATION: Schymanski, E.; Neumann, S. The Critical Assessment of Small Molecule Identification (CASMI): Challenges and Solutions. Metabolites 2013, 3 (3), 517–38. DOI:10.3390/metabo3030517
COMMENT: http://casmi-contest.org/challenges-cat1-2.shtml
COMMENT: CASMI2012 LC Challenge 3
COMMENT: DEPRECATED: 2019-05-03 considered noisy
COMMENT: SUPERSEEDED: SMI00035
COMMENT: LASTGIT: 71bfc632750600db42864739472d87bc6abd6e47

where MassBank-web would render the git hash to point to
https://github.com/MassBank/MassBank-data/blob/71bfc632750600db42864739472d87bc6abd6e47/CASMI_2012/SMI00034.txt
The SUPERSEEDED would be only human readable to point to one or more replacement(s).

I also had the idea to distinguish between "DEPRECATED", which should be interpreted as
"There are good reasons to not use this record" and "DELETED" which really means
"This record is gone. Away. You can time travel if needed.".

Yours,
Steffen

@schymane

This comment has been minimized.

Copy link
Member

commented May 3, 2019

I don't think we should replace a record with a new one? Isn't that what versioning is meant to avoid?
I also don't quite agree with a git state because, well, we have text files (not just MassBank-web) and I can't auto-interpret that lastgit bit into anything useable as a human ... as Tobias pointed out, even if we decide to deprecate records, they are history and it's good to have the data so that humans can look at them and agree or disagree; ChemSpider caused many issues for us by deprecating structures suddenly ... we should not do the same ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.