Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis

102 views

Published on

This presentation was given at the ASMS Sanibel Conference "Unraveling the Exposome" and provided a general overview of the dashboard and how it integrates to many of the projects that we support but with a special focus on list generation, mass and formula searching based on MS-Ready structures and some of the prototypes that we have been developing to support non-targeted analysis.

Published in: Science
  • Be the first to comment

  • Be the first to like this

The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis

  1. 1. The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis Unravelling the Exposome ASMS-Sanibel Conference, January 2020 http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA Antony Williams Center for Computational Toxicology and Exposure, US-EPA, RTP, NC …and an enormous cast of characters
  2. 2. Outline • Quick overview of the dashboard • Specific data of interest to this audience (it’s not just Computational Toxicology) • Support for Mass Spectrometry • Data quality in the public domain • Work in progress – prototypes • A request for help 1
  3. 3. 2 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard
  4. 4. BASIC Search 3
  5. 5. Detailed Chemical Pages 4
  6. 6. Properties, Fate and Transport 5
  7. 7. Properties, Fate and Transport e.g. Solubility 6
  8. 8. Properties, Fate and Transport e.g. logP 7
  9. 9. Sources of Exposure to Chemicals 8
  10. 10. Identifiers to Support Searches 9
  11. 11. Link Access 10
  12. 12. Mass Spec Links 11
  13. 13. NIST WebBook https://webbook.nist.gov/chemistry/ 12
  14. 14. MassBank of North America https://mona.fiehnlab.ucdavis.edu 13
  15. 15. Batch Searching 14
  16. 16. Aggregate data for a list of chemicals 15
  17. 17. Batch Search Names 16 Excel Download
  18. 18. Add Other Data of Interest 17
  19. 19. Chemical Lists of Interest… 18
  20. 20. 225 Chemical Lists (and growing) 19
  21. 21. “Volatilome” Human Breath 20
  22. 22. “Volatilome” Saliva 21
  23. 23. PFAS lists of Chemicals 22
  24. 24. Building a “reference” PFAS list • PFAS structure list (PFASSTRUCT) is expanded from public databases, EPA agency lists and literature • Approaching ~7000 structures – 98.8% have associated CAS Numbers • Compare with PubChem 220,720 structures 23
  25. 25. Formula Search can find isomers 24
  26. 26. Active expansion of the PFAS list From 2 to 8 variants of PFOS 25
  27. 27. Disinfection By-Products 26
  28. 28. Mycotoxins • Two lists: 328 and 88 members 27
  29. 29. Vomitoxin 28
  30. 30. BIG databases are GREAT! P u b C h e m C A S R e g is try C h e m S p id e r E P A D S S T o x B lo o d E x p o s o m e 1 0 4 1 0 5 1 0 6 1 0 7 1 0 8 1 0 9 ChemicalSubstances • Thanks to all of the public database efforts • So much benefit from what’s been done • There are hundreds of them at this point…
  31. 31. Vomitoxin - ChemSpider • 19 “Vomitoxins” – 3 isotopically labeled 30
  32. 32. Vomitoxin – PubChem 31 • 33 unique InChI Keys
  33. 33. PubChem – “virtual chemistry” • Other databases grow quickly…a lot of “virtual chemistry” and “make on demand” compounds. Vomitoxin has 7 ZINC stereoforms. • The Dashboard database grows slowly (next release is +20k chemicals in 6 months) 32
  34. 34. ChemSpider – lots of virtuals??? 33 • 52 million chemicals from one vendor
  35. 35. Taxol: 79 Results 34
  36. 36. Data Quality is important • Data quality in free web-based databases! 35
  37. 37. We’re still cleaning data too 36
  38. 38. Tire Crumb Rubber (298) 37
  39. 39. Terpenes in Vape (37) 38
  40. 40. Hydraulic Fracturing (1640) 39
  41. 41. Opioids and Metabolites (160) 40
  42. 42. “MS-ready” structures 41
  43. 43. Overview of MS-Ready Structures • All structure-based chemical substances are algorithmically processed to – Split multicomponent chemicals into individual structures – Desalt and neutralize individual structures – Remove stereochemical bonds from all chemicals • MS-Ready structures are then mapped to original substances to provide a path between chemicals detected by mass spectrometry to original substances 42
  44. 44. 43
  45. 45. MS-Ready Mapping What is PFOS?? • Perfluorooctanesulfonic acid • Perfluorooctanesulfonate • 1763-23-1 • 45298-90-6 • 132324-11-9 44
  46. 46. MS-Ready Mappings from Details Page 45
  47. 47. MS-Ready Mappings Set of 20 substances for “PFOS” 46
  48. 48. Mass and Formula Searching 47
  49. 49. Advanced Searches Mass Search 48
  50. 50. Advanced Searches Mass Search 49
  51. 51. MS-Ready Structures for Formula Search 50
  52. 52. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 51
  53. 53. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 52
  54. 54. MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope-labeled – 3 are neutral compounds and 2 are charged • Multiple components, stereo, isotopes and charge all collapsed and mapped through MS-Ready 53
  55. 55. Batch Searching mass and formula 54
  56. 56. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 55
  57. 57. Batch Searching Formula/Mass 56
  58. 58. Searching batches using MS-Ready Formula (or mass) searching 57
  59. 59. Batch Search in specific lists 58
  60. 60. Benefits of bringing it all together • The true dashboard benefit is integration • Rank potential candidates for toxicity using available data – hazard, exposure, in vitro 59
  61. 61. Candidate ranking using metadata 60
  62. 62. Data Source Ranking of “known unknowns” 61 • A mass and/or formula search is for an unknown chemical but it is a known chemical contained within a reference database • Most likely candidate chemicals have the most associated data sources, most associated literature articles or both C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  63. 63. Data Streams for Ranking • CompTox Dashboard Data Sources • PubChem Data Source Count • PubMed Reference Count • Toxcast in vitro bioactivity • Presence in CPDat database • OPERA PhysChem Properties • Other possibilities – predicted media occurrence, frequency of InChIs online
  64. 64. Search 228.115 +/- 5.0 ppm 234 single component chemicals 63
  65. 65. Search 228.115 +/- 5.0 ppm 234 single component chemicals 64
  66. 66. The original ChemSpider work 65
  67. 67. Is a bigger database better? 66 • ChemSpider was 26 million chemicals for the original work • Much BIGGER today • Is bigger better?? • Are there other metadata to use for ranking?
  68. 68. Comparing Search Performance 67 • When dashboard contained 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  69. 69. SAME dataset for comparison 68
  70. 70. How did performance compare? 69 For the same 162 chemicals, Dashboard outperforms ChemSpider for both Mass and Formula Ranking
  71. 71. Identification ranks for 1783 chemicals using multiple data streams 70 DS: Data Sources PC: PubChem PM: PubMed STOFF: DB KEMI: DB Data Sources alone rank ~75% of the chemicals as Top Hit
  72. 72. “UVCB” Chemicals 71
  73. 73. UVCB Chemicals 72
  74. 74. UVCBs challenge in non-target analysis 73 Homologue screening plots from Swiss Wastewater (Schymanski et al 2014, left) and Novi Sad (right) o Complex mixtures (UVCBs) are a huge and very challenging part of the unknowns in many environmental samples
  75. 75. Public TSCA Inventory on Dashboard 31,460 Chemicals (1/24/2020) 74
  76. 76. Many Chemicals are “Complex” >14000 chemicals are UVCBs 75
  77. 77. “Markush Structures” https://en.wikipedia.org/wiki/Markush_structure 76
  78. 78. How to represent complexity? 77
  79. 79. In the Dashboard Abstract Sifter 78
  80. 80. Literature Searching 79
  81. 81. Literature Searching 80
  82. 82. Abstract Sifter for Excel 81
  83. 83. Work in Progress 82
  84. 84. List Registration Activities • Registering and curating numerous lists – NIST library of chemicals –clean up especially around stereochemical representation – United States Geological Survey chemicals in water – Scientific Working Group for the Analysis of Seized Drugs – Synthetic Cannabinoids – Blood Exposome Database 83
  85. 85. Blood Exposome Curation 84 • Blood exposome data collection from Barupal and Fiehn. Great work and we reviewing. • Aggregating large datasets is CHALLENGING • Comparing with our “Abstract Sifter” approach • We will iterate into a dashboard form..
  86. 86. Prototype Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Structure/substructure/similarity search • Access to API and web services • Integration to EPA “Chemical Transformation Simulator” 85
  87. 87. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 86
  88. 88. Search Expt. vs. Predicted Spectra
  89. 89. Search Expt. vs. Predicted Spectra
  90. 90. Spectral Viewer Comparison 89
  91. 91. Presented at ACS Fall 2019 90
  92. 92. Example match 91
  93. 93. Predicted Data Already Public Publication and Data Files 92 https://epa.figshare.com/articles/CFM-ID_Paper_Data/7776212/1
  94. 94. Published: Alex Chao et al 93
  95. 95. Prototype Development 94
  96. 96. CASMI 2012-2017 revisited • Application of metadata candidate ranking and CFM-ID to all five years of CASMI data 95
  97. 97. Method Amenability Prediction Charlie Lowe Why? • Chromatography-mass spectrometry can be LC or GC • Which phase is more appropriate for which chemicals?
  98. 98. Ongoing Work • Data sources to date • Massbank of North America • 9,275 chemicals for non-derivatized GC • 846 chemicals for derivatized GC • 816 chemicals for APCI+ • 454 chemicals for APCI- • 4,907 chemicals for ESI+ • 3,430 chemicals for ESI- • EPA Non-targeted Analysis Collaborative Trial (ENTACT) • 886 chemicals for non-derivatized GC • 44 chemicals for derivatized GC • 774 chemicals for APCI+ • 431 chemicals for APCI- • 1,113 chemicals for ESI+ • 648 chemicals for ESI-
  99. 99. TMAP Visualization of MoNA GC Data
  100. 100. Future Work: Add database of Collision Cross Section Prediction 99
  101. 101. API services and Open Data • Web Services https://actorws.epa.gov/actorws/ • Data sets also available for download.. 100
  102. 102. Web Services https://actorws.epa.gov/actorws/ • Data in UI, JSON and XML format • Our services are free of course.. 101
  103. 103. InChIKey to DTXCIDs 102 https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier =UVOFGKIRTCCNKG-UHFFFAOYSA-N
  104. 104. Data and Services used by the Community 103
  105. 105. NORMAN Suspect List Exchange https://www.norman-network.com/?q=node/236 104
  106. 106. Integration to MetFrag in place https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2 105
  107. 107. MassBank mapping to Dashboard Based on Web Service lookup 106
  108. 108. Conclusion • Dashboard access to data for ~875,000 chemicals (~895k in the Spring Release) • MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking 107 • Relationship mappings and chemical lists of great utility • Curation and mutual sharing of chemical lists is important (e.g. NORMAN)
  109. 109. Request for the Audience (and thanks for ideas so far!) 108
  110. 110. A request for the audience • Please submit comments for curation 109
  111. 111. Please share data and lists • Help expand existing lists with new data • Consider using DTXSIDs instead of just Names/CASRNs in your published tables 110
  112. 112. ILS Kamel Mansouri EPA ORD Ann Richard Chris Grulke John Wambaugh Jeremy Dunne Jeff Edwards Grace Patlewicz Alex Chao Kristin Isaacs Charles Lowe James McCord Seth Newton Katherine Phillips Tom Purucker Jon Sobus Mark Strynar Elin Ulrich Joach Pleil GDIT Ilya Balabin Tom Transue Tommy Cathey Acknowledgements TEAMS IT Development Team Curation Team Collaborators Emma Schymanski NORMAN Network Andrew McEachran Jerry Zweigenbaum
  113. 113. MANY presentations online https://tinyurl.com/w5hqs55 112
  114. 114. Contact Antony Williams CCTE, US EPA Office of Research and Development, Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 113 https://doi.org/10.1186/s13321-017-0247-6

×