Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated records #53

Open
Treutler opened this Issue Mar 25, 2019 · 1 comment

Comments

Projects
None yet
2 participants
@Treutler
Copy link
Contributor

Treutler commented Mar 25, 2019

I just stumbled over two records, which seem to be duplicates. Meta data as well as the spectrum is exactly the same.
https://massbank.eu/MassBank/RecordDisplay.jsp?id=TY000228&dsn=Univ_Toyama
https://massbank.eu/MassBank/RecordDisplay.jsp?id=TY000237&dsn=Univ_Toyama
Maybe it is worth to search MassBank globally for such cases.
I guess we will have to contact the contributors in any case.

How to tackle this? I suggest to introduce a "DEPRECATED" tag for records which are duplicated (this issue) or noisy (e.g. #51) or otherwise erroneous (#9).

@schymane

This comment has been minimized.

Copy link
Member

schymane commented Mar 25, 2019

Yes to a DEPRECATED tag ... I think this will help us keep the record IDs live but communicate beyond COMMENT that there is an issue.... if we hide this in COMMENT tags information will get lost as several records have several COMMENTs

We should do a global check for duplicates, I found some UF cases that are likely duplicates too:
Butylparaben UF4158** records and UF4234** records? I did not do a 1:1 match, but they were flagged by Oberacher and have identical "scores" in his results ..
You can check by SPLASH?

I am going to comment some validation suggestions on the Validator issue shortly ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.