Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environmental Chemistry Data

69 views

Published on

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This involves computational and data-driven approaches that integrate chemistry, exposure and biological data. The National Center for Computational Toxicology (NCCT) has measured, assembled and delivered an enormous quantity and diversity of data for the environmental sciences, including high-throughput in vitro screening data, in vivo and functional use data, exposure models and chemical databases with associated properties. The CompTox Chemicals Dashboard is a web-based application providing access to data associated with ~875,000 chemical substances. New data are continuously added to the database on an ongoing basis, along with registration of new and emerging chemicals. This includes data extracted from the literature, identified by our analytical labs, and otherwise of interest to support specific research projects to the agency. By adding these data, with their associated chemical identifiers (names and CAS Registry Numbers), the dashboard uses linking approaches to allow for automated searching of PubMed, Google Scholar and an array of public databases. This presentation will provide an overview of the CompTox Chemicals Dashboard, how it has developed into an integrated data hub for environmental data, and how it can be used for the analysis of emerging chemicals in terms of sourcing related chemicals of interest, and deriving read-across as well as QSAR predictions in real time. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

  • Be the first to like this

The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environmental Chemistry Data

  1. 1. US-EPA CompTox chemicals dashboard: A web-based data integration hub for environmental chemistry data Antony Williams, Chris Grulke, Richard Judson, John Wambaugh, Jeremy Dunne and Jeff Edwards National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC Spring 2019 ACS Spring Meeting, Orlando http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. Limit notetaking if you wish www.slideshare.net/AntonyWilliams
  3. 3. CompTox Portal 2
  4. 4. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 3 875k Chemical Substances
  5. 5. BASIC Search 4
  6. 6. Detailed Chemical Pages 5
  7. 7. Experimental and Predicted Data 6
  8. 8. Transparency for prediction models 7
  9. 9. OPERA Predicted Properties 8 OPERA Models: https://github.com/kmansouri/OPERA
  10. 10. Access to Chemical Hazard Data 9
  11. 11. Hazard Data from “ToxVal_DB” • ToxVal Database contains following data: –30,050 chemicals –772,721 toxicity values –29 sources of data –21,507 sub-sources –4585 journals cited –69,833 literature citations 10
  12. 12. In Vitro Bioassay Screening ToxCast and Tox21 11
  13. 13. In Vitro Bioassay Screening ToxCast and Tox21 12
  14. 14. Bioactivity: Downloadable Data https://www.epa.gov/chemical-research/exploring-toxcast-data- downloadable-data 13
  15. 15. Sources of Exposure to Chemicals 14
  16. 16. Sources of Exposure to Chemicals 15
  17. 17. An “Executive Summary” Quick Look Tox Info 16
  18. 18. Identifiers to Support Searches 17
  19. 19. BUILT-IN “MODULES” 18
  20. 20. Abstract Sifter for Excel 19
  21. 21. Literature Searching 20
  22. 22. Literature Searching 21
  23. 23. Literature Searching 22
  24. 24. Generalized Read-Across (GenRA) 23
  25. 25. Related Publications 24
  26. 26. MAPPED RELATIONSHIPS 25
  27. 27. Relationships in the Data 26
  28. 28. 27
  29. 29. “MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 28
  30. 30. Bisphenol A 27 Total MS-Ready Mappings 29
  31. 31. Related Substances – Transformation Products, “Monomer-Polymer” 30 What No Structures???
  32. 32. UVCB CHEMICAL SUBSTANCES 31
  33. 33. UVCB Chemicals 32
  34. 34. UVCB: Complex Surfactants 33
  35. 35. “Markush Structures” https://en.wikipedia.org/wiki/Markush_structure 34
  36. 36. Xylenes 35
  37. 37. UVCB: Complex Surfactants 36
  38. 38. CHEMICAL LISTS AND CATEGORIES 37
  39. 39. Category example – PAHs 38
  40. 40. EPAHFR: Hydraulic Fracturing 39
  41. 41. PFAS lists of Chemicals 40
  42. 42. List of Assays
  43. 43. From Assay to Chemicals… 42
  44. 44. Other Searches 43
  45. 45. Product/Use Categories 44
  46. 46. Lubricant 45
  47. 47. Lots of UVCBS in Commerce…. 46
  48. 48. Other Searches Chemical-Biology 47
  49. 49. Assay/Gene Search 48
  50. 50. Assay/Gene Search 49
  51. 51. Mass/Formula Searching and Metadata Ranking 50
  52. 52. Suspect Screening and Non-Targeted Analysis Workflows 51 DSSTox Chemical Database “Molecular Features” Extracted Samples Raw Samples Raw Features Matched Formulas Mapped Structures Prioritized Structures (using ToxPi) Confirmed Structures (using ToxCast standards) Processed Features Prioritized Features Predicted Formulas Candidate Structures Sorted Structures Predicted Retention Times Predicted/Observed Functional Use Top Candidate Structure(s) Suspect Screening Non-Targeted Analysis Predicted Concentrations Predicted/Observed Media Occurrence Predicted Mass Spectra Methodological Concordance Red = Analytical Chemistry Blue = Data Processing & Analysis Green = Informatics & Web Services Purple = Mathematical & QSPR Modeling Color Key
  53. 53. Advanced Searches Mass Search 52
  54. 54. Advanced Searches Mass Search 53
  55. 55. MS-Ready Structures for Formula Search 54
  56. 56. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 55
  57. 57. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 56
  58. 58. Mass Spec Focused Applications 57
  59. 59. Mass Spec Focused Applications 58
  60. 60. Candidate ranking using public resources 59
  61. 61. Is a bigger database better? 60 • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better??
  62. 62. Using Metadata for Ranking • Use available metadata to rank candidates – Associated data sources • Associated lists in the underlying database • Associated data sources in PubChem • Specific types (e.g. water, surfactants, pesticides etc.) – Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is a very important source of data 61
  63. 63. Comparing Search Performance 62 • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  64. 64. SAME dataset for comparison 63
  65. 65. How did performance compare? 64
  66. 66. Data Quality is important • Data quality in free web-based databases! 65
  67. 67. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 66
  68. 68. Comparing ChemSpider Structures 67
  69. 69. Quality Control of the Database • We have full time curators checking data 68
  70. 70. Names to CASRN Mappings 69
  71. 71. Subtleties 70 E/Z-stereochemistry E-stereochemistry “4-Decene”
  72. 72. CAS Registry Numbers 71
  73. 73. Crowdsourced Curation 72
  74. 74. Batch Searching 73
  75. 75. Batch Searching • Singleton searches are useful but people generally want data on LOTS of chemicals! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 74
  76. 76. Aggregate data for a list of chemicals 75
  77. 77. Batch Search Names 76 Excel Download
  78. 78. Add Other Data of Interest 77
  79. 79. Batch Search in specific lists 78
  80. 80. Built in Checks… 79
  81. 81. Related Substance Relationships
  82. 82. Batch Searching Formula/Mass 81
  83. 83. Searching batches using MS-Ready Formula (or mass) searching 82
  84. 84. Real-Time Predictions 83
  85. 85. Real-Time Predictions 84
  86. 86. Real-Time Predictions with detailed calculation reports 85
  87. 87. Real-Time Predictions with detailed calculation reports 86
  88. 88. Open Data Download Files 87
  89. 89. Downloadable Data 88
  90. 90. Work in Progress Prototypes in Development 89
  91. 91. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 90
  92. 92. Search Expt. vs. Predicted Spectra
  93. 93. Search Expt. vs. Predicted Spectra
  94. 94. Spectral Viewer Comparison 93
  95. 95. Prototype Development 94
  96. 96. In Progress : pKa Prediction Model • pKa prediction models based on Open Data Set of 8000 chemicals – acidic, basic and amphoteric chemicals 95
  97. 97. API services and Open Data • Groups waiting on our API and web services • Mass Spec companies instrument integration • Release will be in iterations but for now our data are available 96
  98. 98. Conclusion • Building an integrated hub for environmental chemistry data to serve computational toxicology • Transparent access to data and models – file downloads, SQL data dumps and web services • Expansion of functionality to serve all data streams generated by NCCT across the agency & community 97 • Data QUALITY is a key focus - ongoing curation • We are committed to open API development with time..
  99. 99. Acknowledgements EPA-RTP • An enormous team of contributors from NCCT, especially the IT software development team • Our curation team for their care and focus on data quality • Multiple centers and laboratories across the EPA • Many public domain databases and open data contributors
  100. 100. Contact Antony Williams NCCT, US EPA Office of Research and Development, Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 99 https://doi.org/10.1186/s13321-017-0247-6

×