Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Citation and Research Objects: Toward Active Research Objects

28 views

Published on

A short talk at the Workshop on Research Objects 2019 (with eScience 2019), based on an extended abstract (https://doi.org/10.5281/zenodo.3338176)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Citation and Research Objects: Toward Active Research Objects

  1. 1. Citation and Research Objects: Toward Active Research Objects Research Objects 2019, eScience 2019, 24 September 2019 Daniel S. Katz (d.katz@ieee.org, http://danielskatz.org, @danielskatz) Assistant Director for Scientific Software & Applications, NCSA Research Associate Professor, CS, ECE, iSchool https://doi.org/10.5281/zenodo.3338176
  2. 2. Definitions • Simple research object • Small unit of citable work • E.g., paper, dataset, version of software, etc. • Complex research object • Collection of multiple simple research object • E.g., Research Objects as thought of in this workshop
  3. 3. Citing simple research objects • Recent progress in creating principles for citing simple research objects, such as data [1] and software [2] • Different principles because these are fundamentally different objects [3] • More recently, community efforts to implement these citation principles • Data: FORCE11 Data Citation Implementation Group • Software: FORCE11 Software Citation Implementation Working Group • Data (in the context of the FAIR data principles): Enabling FAIR Data [4] • Note: There’s no widely-accepted equivalent of FAIR data principles for software or for other research objects, though some researchers are working in this area [1] Joint declaration of data citation principles https://doi.org/10.25490/a97f-egyk [2] Software citation principles https://doi.org/10.7717/peerj-cs.86 [3] Software vs. data in the context of citation https://doi.org/10.7287/peerj.preprints.2630v1 [4] The FAIR guiding principles for scientific data management and stewardship https://doi.org/10.1038/sdata.2016.18
  4. 4. How to cite simple research objects • Follow the example of long-established method for citing papers: 1. Deposit item (data, software) and associated metadata in an archival repository • Possible peer-review or repository checks 2. The repository (aka publisher) stores/archives the item and metadata; provides an identifier that can be used to retrieve them 3. Identifier and metadata are used to cite the object
  5. 5. Citing complex research objects? • Complex research objects are objects that contain other objects, e.g., “Research Objects” • What could be cited? • Entire complex object (as a single entity) • Some of the contained objects (which may already have identifiers) • Both • How to cite? • Two proposals follow
  6. 6. Basic citation of complex research objects • Proposal 1: Treat complex research object as a container and a set of contents & cite both complex research object and all the contained objects that were used • FORCE11 Software Citation Implementation Working Group recently defined some challenges [5] • One is how to cite complex software objects, namely frameworks that include components • A framework can have lots of components • Only some components are used in a particular research project • So a set of citations for that project should cite the framework and the components that were used • Citation of Research Objects (ROs) [6] is similar • RO itself should be cited, plus objects in RO that are used, not those that are not • Citing objects in the RO can then be handled similarly to how those objects outside an RO would be cited, whether they are data, software, or something else • Note: this relies on separability of objects; not the case for some complex research objects, e.g., Jupyter Notebooks, where all the software, data, and text are bundled in such a way that they cannot be separated and individually cited [5] Software citation implementation challenges http://arxiv.org/abs/1905.08674 [6] Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
  7. 7. How to cite complex research objects • The necessary steps are thus: 1. Tracking what parts of the RO was used (both the RO itself and the objects within it) 2. Finding identifiers & other metadata for the RO and its objects that were used 3. Building correctly formatted citations for the RO and its objects that were used • Step 1 is the greatest challenge • With current Research Objects, this must be done outside the RO, either manually or by tools that use the RO (e.g., an electronic notebook system) • For Steps 2 and 3 • Cite the RO as a data object; follow data citation principles • Cite software, data, and documentation objects in an RO as you would for any software or data objects or papers • Contents may have identifiers already based on their existence outside the RO, or they can be given identifiers when the RO is given an identifier, with suitable relationship metadata between the RO and the content
  8. 8. Active research objects and citation • Move beyond current Research Objects to automatically track usage of object inside ROs • As stated on http://www.researchobject.org: “Enriching these resources and collections with any and all additional information required to make research reusable, and reproducible!” • Proposal 2: Active Research Objects (AROs), adds internal data and methods to the RO • Basic ARO methods: put() and get() to place and access the object within the ARO • put() requires data beyond the object being placed • Data currently required by many ROs, including description, checksum, etc. • External identifier (DOI) and a citation • Perhaps also internal identifier (e.g., IDO [7]) • get() tracks when an object is accessed • ARO data includes: flags for each internal object • Initially set to false when object is put • Set to true when then object is accessed via get() • Next ARO method: validate() method to provide fixity • Final ARO method: citation(), similar to the citation method in R [8], except can be used to obtain citation for whole RO, citations for RO and internal objects that have been used, or citation for one specific internal object [7] Identifiers for Digital Objects: the Case of Software Source Code Preservation https://hal.archives-ouvertes.fr/hal-01865790 [8] Citing R https://cran.r-project.org/doc/FAQ/R-FAQ.html#Citing-R
  9. 9. Acknowledgements • Prior support from NIH Data Commons Pilot Program Consortium (DCPPC) via Harvard as part of Team Sodium • Thanks! • Questions?

×