Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FAIR is not Fair Enough, Particularly for Software Citation, Availability, or Quality

220 views

Published on

a talk at the 2018 AGU Fall Meeting

Published in: Technology
  • Be the first to comment

  • Be the first to like this

FAIR is not Fair Enough, Particularly for Software Citation, Availability, or Quality

  1. 1. FAIR is not Fair Enough, Particularly for Software Citation, Availability, or Quality Daniel S. Katz, dskatz@illinois.edu, d.katz@ieee.org, @danielskatz Assistant Director for Scientific Software & Applications, NCSA Research Associate Professor, CS, ECE, iSchool with Neil Chue Hong (@npch), U. Edinburgh & SSI
  2. 2. FAIR for data • Findable, Accessible, Interoperable, and Reusable • Work “started” in 2014, leading to: • https://www.force11.org/group/fairgroup/fairprinciples • Published in 2016: Wilkinson et al, “The FAIR Guiding Principles for scientific data management and stewardship,” https://doi.org/10.1038/sdata.2016.18 • Further definition, 2017: Mons et al., “Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud,” https://doi.org/10.3233/ISU-170824 • Yet more definition, 2018: Hodson et al., “Turning FAIR data into reality: interim report from the European Commission Expert Group on FAIR data,” https://doi.org/10.5281/zenodo.1285272
  3. 3. FAIR for software: Is software data? • Katz et al., 2016, “Software vs. data in the context of citation,” https://doi.org/10.7287/peerj.preprints.2630v1 • Backstory – wanted to say software was different than data in the Software Citation Principles paper, but reviewers wanted evidence, so wrote this 12-author preprint via GitHub • “Software is data, but it is not just data” • “While ‘data’ in computing and information science can refer to anything that can be processed by a computer, software is a special kind of data that can be a creative, executable tool that operates on data.” • Differences • Software is executable, data is not • Data provides evidence, software provides a tool • Software is a creative work, data are facts or observations (different licenses) • Rot is different: data has bit rot, software additionally has software collapse • Software lifetime is generally not as long as data lifetime
  4. 4. FAIR for software? • Sounds great! • Findable • Accessible • Interoperable • Reusable • Do these terms mean the same thing for software as for data? • Maybe not because of the differences between software and data • So need to define FAIR principles first, likely differently • Is that enough? • Do we also need to redefine the FAIR terms? • Consider citation, availability, quality ✓ ✓ ✓? ✓
  5. 5. FAIR for software citation: what else is needed? • FAIR makes credit for data implicit, not explicit • “There are numerous and diverse stakeholders who stand to benefit from overcoming these obstacles: researchers wanting to share, get credit, and reuse each other’s data and interpretations” • Open source model for software leads to liberal forking, pull requests • Software is much more collaborative than data • Since reuse is often collaboration for software -> credit is more important than for data • Then FAIR needs to encourage credit • Metadata clearly and explicitly include the contributors of the software it describes (maybe within F (findable), but not really related to findable) • Citations should be used to record software use/reuse (not sure where this goes)
  6. 6. FAIR for software availability: what else is needed? • Much software is open • Open source as inspired by ideal science • Open science as inspired by open source • Is software being open good? • Or is software source code being released under clear licenses sufficient for software availability? • Perhaps FAIR should encourage software to be open • Open should be identified as the default, likely within A (accessibility)
  7. 7. FAIR for software quality: what else is needed? • How to judge software quality? • Internal metrics • External usage • (Peer) Review • Is software quality important? • Or is software usage (R) more important? • Perhaps FAIR should encourage software to be higher quality • Quality should be added to FAIR (FAIRQ?)
  8. 8. Conclusions • Software is not just data • FAIR for software is not sufficient, particularly for citation, probably for available, and maybe for quality • For citation, credit needs to be explicitly encouraged • For availability, software should be encouraged to be open • For quality, not sure – maybe an addition for FAIR is needed

×