Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"AUTHOR"ship in MassBank spectra #194

Open
meowcat opened this issue Aug 26, 2019 · 11 comments

Comments

@meowcat
Copy link

commented Aug 26, 2019

Could we devise a better specification for the AUTHOR field in the MassBank record?
Currently:

Authors and Affiliations of MassBank Record. Mandatory
Example: AUTHORS: Akimoto N, Grad Sch Pharm Sci, Kyoto Univ and Maoka T, Res Inst Prod Dev.

In particular, can we specify who should be in the author list, or differentiate who contributed the data and who made the records?

In "Eawag additional specs", we have situations where the record creator and uploader is not an author on the PUBLICATION. (example) My understanding is that we just added the record creator to a subset of the paper authors, somewhere between first and last author. I don't see this as a really clear and transparent solution, since as the record creator I wouldn't want to steal authority from the actual paper authors.

On the other hand, for the MetaboLights records (example), which I created from publicly available data, I am not listed at all. This is also not ideal, since the listed AUTHORS may not even know about this record existing and should not be held responsible for problems in it, e.g. if processing went wrong.

I suggest to allow the use of MARC relator terms. For example, [dtc] for where the data comes from, and [com] for who made the record.

This is how the terms are specifically defined for R packages: https://journal.r-project.org/archive/2012-1/RJournal_2012-1_Hornik~et~al.pdf

 Data contributor [dtc]
    A person or organization that submits data for inclusion in a database or other collection of data 
 Compiler [com] 
 A person, family, or organization responsible for creating a new work (e.g., a bibliography, a directory) through the act of compilation, e.g., selecting, arranging, aggregating, and editing data, information, etc 

(This is already a loose application of [dtc] in the MetaboLights case since they did not actively submit the data; but it's the one that makes most sense, it seems)

There is also e.g.

 Annotator [ann]
    A person who makes manuscript annotations on an item
 Curator [cur]
    A person, family, or organization conceiving, aggregating, and/or organizing an exhibition, collection, or other item 
 Abridger [abr]
    A person, family, or organization contributing to a resource by shortening or condensing the original work but leaving the nature and content of the original work substantially unchanged. For substantial modifications that result in the creation of a new work, see author 

In R usage, [cre] is the package maintainer, but the MARC definition is

 Creator [cre]
    A person or organization responsible for the intellectual or artistic content of a resource 

so I would leave this out probably...

@meowcat

This comment has been minimized.

Copy link
Author

commented Aug 26, 2019

Another one:

 Editor [edt]
    A person, family, or organization contributing to a resource by revising or elucidating the content, e.g., adding an introduction, notes, or other critical matter. An editor may also prepare a resource for production, publication, or distribution. For major revisions, adaptations, etc., that substantially change the nature and content of the original work, resulting in a new work, see author 
@meowcat

This comment has been minimized.

Copy link
Author

commented Aug 26, 2019

Last one, then I leave you alone

 Metadata contact [mdc]
    A person or organization primarily responsible for compiling and maintaining the original description of a metadata set (e.g., geospatial metadata set) 
@tsufz

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

I agree with @meowcat.

@meowcat

This comment has been minimized.

Copy link
Author

commented Aug 29, 2019

Should I propose a PR to the record format description, proposing the use of [dtc] and [com] if appropriate?

@meier-rene

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2019

You are welcome to propose a PR for the record format description. But, if I understood it correctly, this will be a change which breaks compatibility. This means it will take some time until its included in all neede places.
With the formal description I can implement it on the codebase. If the code is working I need to convert all existing data to the new format as well. and finally we need to incorporate the changes to RMassBank, which I cant do.

@meowcat

This comment has been minimized.

Copy link
Author

commented Aug 29, 2019

I don't believe this would break compatibility; it would merely explicitely foresee the possibility to allow putting MARC relator tags behind names. Right now, there is actually no specification at all for how the author names should be presented. So maybe this would require a small adaptation in the validator, but otherwise I don't think a change is required...

@sneumann

This comment has been minimized.

Copy link
Member

commented Aug 29, 2019

Do we have a pointer on how R package descriptions do that ?
There is a machine readable
Authors@R: c(person(given = "John Doe", email = "...", role=c("cre"), ...)
and the "compiled" version Author: Jon Doe, Jane Austen, ...
we have in https://github.com/MassBank/RMassBank/blob/master/DESCRIPTION#L5
I'd like to not change our AUTHOR, but add a machine readable (and compile/renderable)
extended version. Yours, Steffen

@meier-rene

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2019

@meowcat Then I probably misunderstood your proposed extension/changes. Could you please give an example?

@meowcat

This comment has been minimized.

Copy link
Author

commented Aug 29, 2019

@sneumann: I would argue that there is currently no syntax that would be changed, since there is a variety of formats used for the AUTHORS field, with more or less the same info in slightly different iterations:

https://github.com/MassBank/MassBank-data/blob/master/Eawag/EA016651.txt
AUTHORS: Stravs M, Schymanski E, Singer H, Department of Environmental Chemistry, Eawag
https://github.com/MassBank/MassBank-data/blob/master/Fac_Eng_Univ_Tokyo/JP000004.txt
AUTHORS: KOGA M, UNIV. OF OCCUPATIONAL AND ENVIRONMENTAL HEALTH
https://github.com/MassBank/MassBank-data/blob/master/Fiocruz/FIO00009.txt
AUTHORS: Markus Kohlhoff, Natural Product Chemistry Lab (CPqRR/FIOCRUZ, Brazil)
https://github.com/MassBank/MassBank-data/blob/master/Fukuyama_Univ/FU000010.txt
AUTHORS: Matsuura F, Ohta M, Kittaka M, Faculty of Life Science and Biotechnology, Fukuyama University
https://github.com/MassBank/MassBank-data/blob/master/NaToxAq/NA000011.txt
AUTHORS: Tobias Schulze, Hubert Schupke, Martin Krauss, Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research GmbH - UFZ, Leipzig, Germany
https://github.com/MassBank/MassBank-data/blob/master/MetaboLights/ML001351.txt
AUTHORS: Mark Earll, Stephan Beisken, EMBL-EBI
https://github.com/MassBank/MassBank-data/blob/master/CASMI_2016/SM801802.txt
AUTHORS: Krauss M, Schymanski EL, Weidauer C, Schupke H, UFZ and Eawag
https://github.com/MassBank/MassBank-data/blob/master/Athens_Univ/AU100903.txt
AUTHORS: Nikiforos Alygizakis, Anna Bletsou, Nikolaos Thomaidis, University of Athens
https://github.com/MassBank/MassBank-data/blob/master/UFZ/UA000303.txt
AUTHORS: C. Gallampois (Umea), E. Schymanski (Eawag), W. Brack (UFZ)
https://github.com/MassBank/MassBank-data/blob/master/Washington_State_Univ/BML00009.txt
AUTHORS: Cuthbertson DJ, Johnson SR, Lange BM, Institute of Biological Chemistry, Washington State University
https://github.com/MassBank/MassBank-data/blob/master/Metabolon/MT000006.txt
AUTHORS: Evans A M, Mitchell M, DeHaven C D, Barrett T, Milgram E, Metabolon Inc.
https://github.com/MassBank/MassBank-data/blob/master/MPI_for_Chemical_Ecology/CE000022.txt
AUTHORS: Ales Svatos, Ravi Kumar Maddula, MPI for Chemical Ecology, Jena, Germany
https://github.com/MassBank/MassBank-data/blob/master/ISAS_Dortmund/IA000011.txt
AUTHORS: Nils Hoffmann, Dominik Kopczynski, Bing Peng
https://github.com/MassBank/MassBank-data/blob/master/RIKEN_ReSpect/PM000309.txt
AUTHORS: Parejo I, et al.
https://github.com/MassBank/MassBank-data/blob/master/UPAO/UPA00014.txt
AUTHORS: K.A. Wilkinson & S.N. Miranda

Different order of first and last name, use of brackets, spelling of double initials, use of punctuation, different specifications for the institutes both in format and detail etc. So whatever anyone chooses to put in their AUTHORS doesn't really contradict any existing format or rule. I could go on endless; to my dismay no one is apparently using the semicolon, which I would have wanted since I find it convenient.

My current PR, as a basis for discussion, is mostly a suggestion how people might want to specify authorship, since this would fit in with any scheme people are currently using, and not be more or less machine-readable than before.

I agree that a thought-out new version of AUTHORS (or a substitute) should be machine-readable.

(Ideally we would incorporate ORCID.)

@meowcat

This comment has been minimized.

Copy link
Author

commented Aug 29, 2019

Ah, I found a few more interesting schemes. Including my semicolon! Note that many of these are actually by "us" as in "the people discussing here".

https://github.com/MassBank/MassBank-data/blob/master/CASMI_2012/SMI00021.txt
AUTHORS: S. Neumann: IPB-Halle, Germany & E. Schymanski: Eawag, Switzerland
https://github.com/MassBank/MassBank-data/blob/master/Literature_Specs/LIT00014.txt
AUTHORS: E. Schymanski; retrieved from M. Castillo et al. 2000
https://github.com/MassBank/MassBank-data/blob/master/Boise_State_Univ/BSU00003.txt
AUTHORS: Chandler, C. and Habig, J. Boise State University
https://github.com/MassBank/MassBank-data/blob/master/BS/BS001003.txt
AUTHORS: Plant Biology, The Noble Foundation, Ardmore, OK, US/Dennis Fine, Daniel Wherritt, and Lloyd Sumner Institute first, and even a slash!

This is not meant to criticise any of the formats that were used, only pointing out the complete absence of anything systematic, even among high-quality and involved contributors.

@schymane

This comment has been minimized.

Copy link
Member

commented Aug 29, 2019

Yes I totally agree its's a good time to start using some conventions; @MaliRemorker and I were discussing how to write the author statement for the hopefully soon-to-be-coming LCSB records, he's looking into your suggestions our side. I agree with @sneumann that we should retain a plain text AUTHOR field (but add some recommendations for use into the documentation to avoid this in the future) and add a machine readable one as an extra, to retain backwards compatibility and ease-of-use for users.
We would then have to decide with @meier-rene whether we standardize this information in already-existing records - at least the plain text field?

ORCIDs ... maybe a separate field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.