Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

First Steps in Research Data Management Under Constraints of a National Security Laboratory

171 views

Published on

"First Steps in Research Data Management Under Constraints of a National Security Laboratory"
Presentation at CNI Fall Meeting 2018

Published in: Internet
  • Be the first to comment

  • Be the first to like this

First Steps in Research Data Management Under Constraints of a National Security Laboratory

  1. 1. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC First Steps in Research Data Management Under Constraints of a National Security Laboratory Martin Klein 0000-0003-0130-2097 @mart1nkle1n Brian Cain 0000-0001-7356-5860 @briancain101 Research Library Los Alamos National Laboratory Acknowledgements: Herbert Van de Sompel, Frances Knudson, Joshua Finnell, Wei Gu, Jason Keith
  2. 2. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC 2 2013 OSTP MEMO • All federal agencies over $100M annually in R&D required to make data stored and publicly accessible to search, retrieve, and analyze. • Scope: data necessary to validate research findings, including data sets used to support scholarly publications.
  3. 3. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC NATIONAL LABS Ames Argonne Brookhaven Fermi Idaho Los Alamos Lawrence Berkeley Lawrence Livermore NETL NREL Oak Ridge Pacific Northwest Princeton SLAC Sandia Savannah River Thomas Jefferson SCIENTIFIC & TECHNICAL INFORMATION (STI/R&D Results) Text • Journal articles/accepted manuscripts • Technical reports • Conference papers • Patents Data • Large and small datasets • Images • Visualizations Software/Code ≥ 50,000 STI “products” annually $12 Billion R&D Funding United States Department of Energy 3
  4. 4. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC United States Department of Energy 4
  5. 5. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Environmental Scan • Data Working Group identified 12 institutions and contacted data services managers at each institution • Questions regarded: • Budget • Staffing • Data Security/Platform • Discovery Tools • Interdepartmental cooperation • Permanent Data Storage Budget $200,000 - $2,000,000 Staffing 2-15 FTE Platform Hydra/Fedora ; Dataverse ; Dspace ; Third Party Data Discovery Tools Individual silos and catalogs ; APIs Interdepartmental Cooperation Library ; IT + Library ; Partnership between 3+ Permanent Storage None ; 10-year retention policy ; Forever? 5
  6. 6. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Data interviews • Data collection consisted of in-depth interviews with researchers from across the Lab and were completed during the summer of 2016 • Identify all LANL data sets deposited in external data repositories by LANL researchers (Figshare, Zonodo, Dryad, Dataverse) • Contact all researchers who have submitted a data set for review • Contact all researchers who have created a Data Management Plan using the Library’s DMPTool • Identify data-intensive researchers from the Library Roadshows • Recommendations from the Data Executive Team 6
  7. 7. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • “A centralized storage solution would be great. Wouldn’t have to keep all my files on an office computer.” • “It would be great if the Lab had a centralized repository where I could collaborate, store, and share my data both internally and externally. Something like Dropbox would even be helpful.” • “Much of this data is hosted on old websites that we maintain (scripted in Perl; accessible through FTP). In other words, old crusty crap.” • “When my post-doc is gone, I don’t know where their stuff is.” • “I know this experiment has been run before, but I can’t find the data, so I’m running it again.” • “There are many files named “data” on my group’s share drive.” “Highlights” from Data Management Surveys, Working Groups 7
  8. 8. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • Need for infrastructure to support internal and external research collaboration • Providing data storage and data sharing • Integration with frequently utilized research tools/flows • Support documentation and preservation • Compliant with LANL policies re data management, review and release, security Results from Data Management Surveys, Working Groups 8
  9. 9. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • Reality: LANL policies prevent the use of off-the-shelf, cloud- based “open science” platforms that many other research institutions use. • Ergo: • Investigate the feasibility of a local solution • Goal: Provide internal collaboration platform as a path toward structured data management • Nucleus Project • Pilot effort by the Research Library • Since January 2017; 1 FTE hired; 4 PT contributors • Based on a local install of the Open Science Framework software Taking Action at LANL 9
  10. 10. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • Optimize a researcher’s use of time by: • Making it easier to accomplish collaborative goals • Reducing number of steps to achieve goals • Reducing potential for errors when accomplishing goals • Improving project management, communication • By deploying a platform that: • Streamlines workflows • Provides glue between systems/tools • Provides an overview of assets involved in research collaboration Some Anticipated Benefits for Researchers 10
  11. 11. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - Before 1. 11
  12. 12. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - Before 1. 2. 12
  13. 13. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - Before 1. 2. 13
  14. 14. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - Before 1. 2. 3. 14
  15. 15. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - Before 1. 2. 3. 15
  16. 16. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - Before 1. 2. 3. 4. 16
  17. 17. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - Before 1. 2. 3. 5. 4. 17
  18. 18. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • Making it easier to accomplish collaborative goals • Overview of assets involved in research (collaboration) • Tracking of idea  funding  data  publication  patent • “Single” point of (data) preservation • Synergies with internal and external funding requirements • Provide a seamless method for compliance with LANL review and release and security policies Some Anticipated Benefits for LANL 18
  19. 19. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • Open Source Software, developed and maintained by the Center for Open Science • Default use is in the cloud-based portal osf.io that supports multi-organizational open science and collaborative scholarship • Provides glue for many aspects of the research workflow • Offers integrations with many existing productivity tools Open Science Framework 19
  20. 20. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Data Storage Data Storage Data Storage Data Storage STORAGE TOOLS R MatLab Echo Granta TeXmaker................................ CODES RASSTI RASSTI store CODES store REVIEW RELEASE OUTSIDE LANL DOE GitLab DOE OSTI experiment simulation documentation arXivGitHub .......................... firewall
  21. 21. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Data Storage Data Storage Data Storage Data Storage STORAGE TOOLS R MatLab Echo Granta TeXmaker................................ CODES RASSTI LANL authenticate LANL GitLab RASSTI store CODES store REVIEW RELEASE OUTSIDE LANL DOE GitLab DOE OSTI experiment simulation documentation arXivGitHub .......................... firewall
  22. 22. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC S3 storage connect Google Drive connect Data Storage Data Storage Data Storage Data Storage Project Metadata & Access Control CODES connect RASSTI connect LANL auth connect GitLab connect STORAGE TOOLS R MatLab Echo Granta TeXmaker................................ CODES RASSTI LANL authenticate LANL GitLab NUCLEUS OSF project collaboration RASSTI store CODES store REVIEW RELEASE OUTSIDE LANL DOE GitLab DOE OSTI experiment simulation documentation arXivGitHub .......................... firewall
  23. 23. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Google Drive connect Data Storage Data Storage Data Storage Data Storage Project Metadata & Access Control CODES connect RASSTI connect LANL auth connect GitLab connect STORAGE TOOLS R MatLab Echo Granta TeXmaker................................ CODES RASSTI LANL authenticate LANL GitLab NUCLEUS OSF project collaboration RASSTI store CODES store REVIEW RELEASE OUTSIDE LANL DOE GitLab DOE OSTI experiment simulation documentation arXivGitHub .......................... firewall
  24. 24. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC 24
  25. 25. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC 25
  26. 26. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC 26
  27. 27. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Submit Dataset to RASSTI - After 1. 2. 27
  28. 28. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC 28
  29. 29. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • (Technical) Challenges • Ownership of the software environment • Storage at institution, division, group level? • Contamination, who is responsible for clean-up? • Cybersecurity review Challenges & Questions • Questions • Is this useful to researchers? Under what conditions? • What other local/homegrown systems should be integrated? • What does success look like? • Active users? • Structured data management? 29
  30. 30. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC • Pilot Nucleus--Summer 2018 • Invited “friends and family” at LANL • Identifying bugs and eliciting feedback before expanding pilot • Generally positive response, some skepticism for yet another tool Outreach & Feedback • Data Management Workshop--August 2018 • ~50 attendees introduced to Nucleus and “Data Management 101” • Positive response, but constrained by “Pilot” status • Some concerns about competition between collaboration tools 30
  31. 31. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Nucleus — Fall/Winter 2018 • Utilizing feedback to seek institutional support, ownership (i.e. funding and project resources) • Possible integration with institutional Google Suite, impact on adoption and sustainability? • New laboratory management (since November 1, 2018) • Developing relationships with new partners and hierarchy Current Status 31
  32. 32. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC Lessons Learned • Limitation of a pilot to assess utility • Contradictions between survey results and a few pilot responses • LANL collaborations tools are terrible, we need something new!  LANL collaboration tools are amazing, why would you want to compete with them?! • What might be going on here? Grass is greener? Novelty? Endowment effect? • Need to have a strong use case and value-added proposition prepared for end-users, they do not always see the apparent utility 32
  33. 33. 1st Steps in RDM at LANL @mart1nkle1n & @briancain101 CNI Fall Meeting 2018, 12/10/2018, Washington, DC First Steps in Research Data Management Under Constraints of a National Security Laboratory Martin Klein 0000-0003-0130-2097 @mart1nkle1n Brian Cain 0000-0001-7356-5860 @briancain101 Research Library Los Alamos National Laboratory

×