Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applications of the US EPA’s CompTox chemicals dashboard to support structure identification and chemical forensics using mass spectrometry

45 views

Published on

High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are of increasing interest in chemical forensics for the identification of emerging contaminants and chemical signatures of interest. At the US Environmental Protection Agency, our research using HRMS for non-targeted and suspect screening analyses utilizes databases and cheminformatics approaches that are applicable to chemical forensics. The CompTox Chemicals Dashboard is an open chemistry resource and web-based application containing data for ~760,000 substances. Basic functionality for searching through the data is provided through identifier searches, such as systematic name, trade names and CAS Registry Numbers. Advanced Search capabilities supporting mass spectrometry include mass and formula-based searches, combined substructure-mass searches and searching experimental mass spectral data against predicted fragmentation spectra. A specific type of data mapping in the underpinning database, using “MS-Ready” structures, has proven to be a valuable approach for structure identification that links structures that can be identified via HRMS with related substances in the form of salts, and other multi-component mixtures that are available in commerce. This presentation will provide an overview of the CompTox Chemicals Dashboard and demonstrate its utility for supporting structure identification and NTA in chemical forensics. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Applications of the US EPA’s CompTox chemicals dashboard to support structure identification and chemical forensics using mass spectrometry

  1. 1. Applications of the US EPA’s CompTox chemicals dashboard to support structure identification and chemical forensics using mass spectrometry Antony Williams1, Andrew D. McEachran2, Jon R. Sobus3 and Emma Schymanski4 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC 3) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC 4) Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, Luxembourg Spring 2019 ACS Spring Meeting, Orlando http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. Suspect Screening and Non-Targeted Analysis Workflows 1 DSSTox Chemical Database “Molecular Features” Extracted Samples Raw Samples Raw Features Matched Formulas Mapped Structures Prioritized Structures (using ToxPi) Confirmed Structures (using ToxCast standards) Processed Features Prioritized Features Predicted Formulas Candidate Structures Sorted Structures Predicted Retention Times Predicted/Observed Functional Use Top Candidate Structure(s) Suspect Screening Non-Targeted Analysis Predicted Concentrations Predicted/Observed Media Occurrence Predicted Mass Spectra Methodological Concordance Red = Analytical Chemistry Blue = Data Processing & Analysis Green = Informatics & Web Services Purple = Mathematical & QSPR Modeling Color Key
  3. 3. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 2
  4. 4. BASIC Search 3
  5. 5. Detailed Chemical Pages 4
  6. 6. Sources of Exposure to Chemicals 5
  7. 7. Identifiers to Support Searches 6
  8. 8. Link Access 7
  9. 9. NIST WebBook https://webbook.nist.gov/chemistry/ 8
  10. 10. MassBank of North America https://mona.fiehnlab.ucdavis.edu 9
  11. 11. BATCH SEARCHING 10
  12. 12. Aggregate data for a list of chemicals 11
  13. 13. Batch Search Names 12 Excel Download
  14. 14. Add Other Data of Interest 13
  15. 15. CHEMICAL LISTS 14
  16. 16. Chemical Lists 15
  17. 17. PFAS lists of Chemicals 16
  18. 18. EPAHFR: Hydraulic Fracturing 17
  19. 19. Batch Search in specific lists 18
  20. 20. “MS-READY” STRUCTURES 19
  21. 21. 20
  22. 22. “MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 21
  23. 23. MS-Ready Mappings 22
  24. 24. MS-Ready Mappings Set 23
  25. 25. MASS AND FORMULA SEARCHING 24
  26. 26. Advanced Searches Mass Search 25
  27. 27. Advanced Searches Mass Search 26
  28. 28. MS-Ready Structures for Formula Search 27
  29. 29. MS-Ready Structures Batch Searches 28
  30. 30. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 29
  31. 31. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 30
  32. 32. MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope-labeled – 3 are neutral compounds and 2 are charged 31
  33. 33. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 32
  34. 34. Batch Searching Formula/Mass 33
  35. 35. Searching batches using MS-Ready Formula (or mass) searching 34
  36. 36. SUPPORTING FUNCTIONALITY FOR MASS-SPEC 35
  37. 37. Formula-Based Search 36
  38. 38. Select Chemicals of Interest 37
  39. 39. Prune to list of interest 38
  40. 40. Structure Similarity Searches 39
  41. 41. Structure Similarity Searches 40
  42. 42. Literature Searching 41
  43. 43. Literature Searching 42
  44. 44. Literature Searching 43
  45. 45. Example Online Resources for MS 44
  46. 46. DO WE REALLY NEED ANOTHER DATABASE? 45
  47. 47. Is a bigger database better? 46 • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better??
  48. 48. Comparing Search Performance 47 • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  49. 49. SAME dataset for comparison 48
  50. 50. How did performance compare? 49
  51. 51. Data Quality is important • Data quality in free web-based databases! 50
  52. 52. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 51
  53. 53. International Chemical Identifier 52
  54. 54. Comparing ChemSpider Structures 53
  55. 55. Comparing ChemSpider Structures 54
  56. 56. Other Searches 55
  57. 57. Delivering a Better Database • We have full time curators checking data 56
  58. 58. UVCB CHEMICAL SUBSTANCES 57
  59. 59. UVCB Chemicals 58
  60. 60. “Markush Structures” https://en.wikipedia.org/wiki/Markush_structure 59
  61. 61. UVCB: Complex Surfactants 60
  62. 62. UVCB: Complex Surfactants 61
  63. 63. WORK IN PROGRESS 62
  64. 64. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database 63
  65. 65. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 64
  66. 66. Search Expt. vs. Predicted Spectra
  67. 67. Search Expt. vs. Predicted Spectra
  68. 68. Spectral Viewer Comparison 67
  69. 69. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction 68
  70. 70. Moving to Relative Retention Times 69
  71. 71. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction • Structure/substructure/similarity search 70
  72. 72. Prototype Development 71
  73. 73. Prototype Development 72
  74. 74. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction • Structure/substructure/similarity search • Integration of predicted ion mobility data 73
  75. 75. Collision Cross Section Prediction 74
  76. 76. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction • Structure/substructure/similarity search • Integration of predicted ion mobility data • Access to API and web services for programmatic access 75
  77. 77. API services and Open Data • Groups waiting on our API and web services • Mass Spec companies instrument integration • Release will be in iterations but for now our data are available 76
  78. 78. SIDE EFFECTS OF SHARING OPEN DATA 77
  79. 79. NORMAN Suspect List Exchange https://www.norman-network.com/?q=node/236 78
  80. 80. Integration to MetFrag in place https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2 79
  81. 81. Conclusion • Dashboard access to data for ~875,000 chemicals • MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking 80 • Relationship mappings and chemical lists of great utility • Dashboard and contents are one part of the solution • Future releases will offer even more utility • We are committed to open API development with time..
  82. 82. Acknowledgements EPA-RTP • An enormous team of contributors from NCCT, especially the IT software development team • Our curation team for their care and focus on data quality • Multiple centers and laboratories across the EPA • Many public domain databases and open data contributors
  83. 83. Contact Antony Williams NCCT, US EPA Office of Research and Development, Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 82 https://doi.org/10.1186/s13321-017-0247-6

×