Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Public: Why and Why Not

108 views

Published on

A presentation made to OECD's Committee for Scientific and Technological Policy (CSTP) at the Workshop on the Revision of the Recommendation of the Council concerning Access to Research Data from Public Funding, 15 October 2019

Published in: Government & Nonprofit
  • Be the first to comment

  • Be the first to like this

Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Public: Why and Why Not

  1. 1. Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Public: Why and Why Not OECD, 15 October 2019 Daniel S. Katz (d.katz@ieee.org, http://danielskatz.org, @danielskatz) Assistant Director for Scientific Software & Applications, NCSA Research Associate Professor, CS, ECE, iSchool
  2. 2. Why do we care about research software? • Examining funding • ~20% of NSF projects over 11 years topically discuss software in their abstracts ($10b) [1] • 2 of 3 main ECP areas are research software (~$4b) • Examining publications • Software intensive projects are a majority of current publications [2] • Most-cited papers are methods and software [3] • Asking researchers [4-6] • >90% of US/UK researchers use research software • ~65% would not be able to do their research without it • ~50% develop software as part of their research [1] Collected from http://www.dia2.org in 2017 [2] Nangia & Katz, 10.1109/eScience.2017.78 [3] “Top 100-cited papers of all time,” 10.1038/514550a [4] Hettrick, http://bit.ly/2B8y6Iz [5] Hettrick et al., 10.5281/zenodo.14809 [6] Nangia & Katz, 10.6084/m9.figshare.5328442.v1
  3. 3. Software (vs data) properties • Software and data are fundamentally different • Software is executable, data is not • Data provides evidence, software provides a tool • Software is a creative work, data are facts or observations • Copyright applies to software but not data; different licenses are appropriate • Software suffers from software collapse • Software is not a one-time effort, it must be sustained • Development, production, and maintenance are human-intensive • Personal aside: FAIR was created for data, work needed to decide if it can be applied to software, and if so, to do so, still needs to be done Katz, et al., https://doi.org/10.7287/peerj.preprints.2630v1
  4. 4. Background • Now at University of Illinois • Assistant Director for Scientific Software & Applications, NCSA • Research Associate Professor, CS, ECE, iSchool • From 2012-2016, I ran the Software Infrastructure for Sustained Innovation at NSF • Led the writing of NSF documents • Software Vision and Strategy Report • Implementation of Software Vision • Funded about US$30m in software projects/year • 2/3 of funding under my control from Cyberinfrastructure Office • 1/3 raised under agreement of Science & Engineering Divisions http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817 http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
  5. 5. NSF Support for Infrastructure Software • Some software intended for research • Funded by many parts of NSF, sometimes explicitly, often implicitly • Intended for use by developer • Other software intended as infrastructure • Funded by many parts of NSF, often Office of Cyberinfrastructure (OCI), almost always explicitly • Intended for use by community • NSF’s Software Infrastructure for Sustained Innovation (SI2) focused on research infrastructure projects
  6. 6. SI2 Review Criteria • Standard NSF Criteria • Intellectual Merit – advancing knowledge • Generally not direct knowledge advances made by project; usually indirect based on how the software would be used by others • Broader Impacts – benefitting society • Some of the additional SI2 review criteria • Fill a recognized need and advance research capabilities? • Security, trustworthiness, reproducibility, and usability are integrated? • User interaction, community-driven approach? • Leverage & interoperate with other software? • Appropriate and justified license? • Sustainability of software beyond award? https://www.nsf.gov/pubs/2016/nsf16532/nsf16532.htm
  7. 7. SI2 licensing and sustainability • Goal: software that has impact beyond the lifetime of the award • How • Ask proposers to provided sustainability plan • Open source as default, but not required • Proposers make a case for the best way to achieve sustainability • In some fields (e.g., chemistry), may include integration into commercial packages with low-cost licenses for academic research • Over time, sustainability plans improved • Realization that putting the software on GitHub is not a sustainability plan • But still no clear model that works in all cases • And few cases where sustainability path and success were clear
  8. 8. Software collapse • Software stops working eventually if is not actively maintained • Structure of computational science software stacks: 1. Project-specific software (developed by researchers): software to do a computation using building blocks from the lower levels: scripts, workflows, computational notebooks, small special-purpose libraries & utilities 2. Discipline-specific software (developed by developers & researchers): tools & libraries that implement disciplinary models & methods 3. Scientific infrastructure (developed by developers): libraries & utilities used for research in many disciplines 4. Non-scientific infrastructure (developed by developers): operating systems, compilers, and support code for I/O, user interfaces, etc. • Software builds & depends on software in all layers below it; any change below may cause collapse • Note: Containers freeze software; can stop collapse but also prevents bug fixes, new algorithms, adaptations for new hardware, etc.; too long a freeze can kill software K. Hinsen, “Dealing With Software Collapse,” 2019. https://doi.org/10.1109/MCSE.2019.2900945
  9. 9. Software Sustainability • Software sustainability is the capacity of the software to endure • Will the software will continue to be available in the future, on new platforms, meeting new needs? • Software sustainability ≡ sufficient ∆ software state • Sufficient to deal with: software collapse, bugs, new features needed • ∆ software state = (human effort in – human effort out - friction) * efficiency • Software stops being sustained when human effort out > human effort in over some time • Human effort ⇆ $ • All human effort works (community open source) • All $ (salary) works (commercial software, grant funded projects) • Combined is hard, equation is not completely true, humans are not purely rational
  10. 10. What can funding agencies do? • Human effort ⇆ $ • All human effort works (community open source) • All $ (salary) works (commercial software, grant funded projects) • Combined is hard, equation is not completely true, humans are not purely rational • Provide incentives to support community contributions • Provide funds to directly support software
  11. 11. Publicly-funded software • Goal is funding software that is useful to a community over time, not just during the period of public funding • Personal aside: reproducibility also is a function of time, not an absolute • Leads to options for each software package • Make software public, commit to pay for maintenance/support • Make software public, software developers grow community that performs maintenance/support (as needed to sustain the software for their own needs) • Make software commercial, use sales/service to pay for maintenance/support
  12. 12. Recommendations for publicly-funded software • Let the developers/proposers state what they will do as part of requesting funds • Open source as default • Take this into account when making decisions about what to fund • Commit to reasonable maintenance funding, not tied to novel research by the maintainers • Support policy to provide incentives for community contributions • Career paths, e.g., Research Software Engineers • Credit, e.g. software citation, to include software in decisions such as hiring, promotion, grants • Overall: software is not data; policies must be carefully considered https://rse.ac.uk Smith, Katz, Niemeyer et al. 10.7717/peerj-cs.86
  13. 13. Recommendations for algorithms and workflows • Algorithms • If algorithms are executable, treat them the same as software • If not, treat them the same as papers • Workflows • Can be data (e.g. DAG) or software (e.g. program) • Treat software workflows as software • Treat data workflows as data, and • Ideally treat software that generates data workflows as software Katz, https://danielskatzblog.wordpress.com/2018/01/08/expressing-workflows-as-code-vs-data/

×