Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Environmental Cheminformatics for Unknown ID UC Davis Nov 2018

50 views

Published on

Environmental Cheminformatics to Identify Unknown Chemicals and their Effects
Assoc. Prof. Dr. Emma L. Schymanski
FNR ATTRACT Fellow and PI: Environmental Cheminformatics, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond.
NOTE: some slides causing errors have been removed but can be accessed through the tinyurl on the front page.
Active hyperlinks can be retrieved using the tinyurl on the front page. Please cite this work if you use any of the contents.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Environmental Cheminformatics for Unknown ID UC Davis Nov 2018

  1. 1. 1 Environmental Cheminformatics to Identify Unknown Chemicals and their Effects Assoc. Prof. Dr. Emma L. Schymanski FNR ATTRACT Fellow and PI in Environmental Cheminformatics Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg Email: emma.schymanski@uni.lu …and many colleagues who contributed to my science over the years! Image©www.seanoakley.com/ https://tinyurl.com/ucdavis-echidna Metabolomics Group Seminar, UC Davis, California, November 28, 2018. Host: Oliver Fiehn
  2. 2. 2 Outline for Today o Background about LCSB • LCSB & University of Luxembourg • Biomedicine and Parkinson’s Disease • Environmental Cheminformatics @LCSB o European(+) Community Efforts for Unknown ID • Mass Spectral Libraries (www.massbank.eu) • NORMAN Suspect Exchange and CompTox Chemicals Dashboard • Metadata, MS-ready and MetFrag • Bigger Picture Examples (Rhine, NormanNEWS, DSFP) o Work in Progress and Future Challenges • Complex Mixtures – Cheminformatics to Screen Undefined Structures • Preview: Disease-specific & MetFrag-compatible Metadata • Bonus slides on HDX (an entire presentation) if anyone wants
  3. 3. 3 Luxembourg Source: https://en.wikipedia.org/wiki/File:Luxembourg-CIA_WFB_Map.png and https://en.wikipedia.org/wiki/File:EU-Luxembourg.svg
  4. 4. 4 University of Luxembourg & LCSB o Uni Lu was founded in 2003 • We just turned 15 (teenage years!) o LSCB was founded in 2009 • …and is still pre-teenager • Young and very dynamic working environment!
  5. 5. 5 Environmental Cheminformatics … the Group S. Gene; https://en.wikipedia.org/wiki/File:Zwei_zigaretten.jpg; R. Singh; DOI:10.1186/s13321-017-0223-1; DOI: 10.1016/j.aca.2017.12.034 Sources:
  6. 6. 6 Our challenge? We still have many unknowns … o …in both environmental and metabolomics analysis (l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich. Wastewater Cells
  7. 7. 7 (European) Environmental Community (subset!) Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 Croatian Water RWS Specialist Knowledge Highly Disjointed
  8. 8. 8 1 10 100 1000 10000 100000 1 million 1 billion chemicals …. …. …. Our (Community) Challenge: Identifying Chemicals Data: Schymanski et al 2014, Environ. Sci. Technol. DOI: 10.1021/es4044374; Hollender et al 2017 DOI: 10.1021/acs.est.7b02184 Sample High resolution mass spectrometry
  9. 9. 9 1 10 100 1000 10000 100000 1 million 1 billion chemicals …. …. …. Our (Community) Challenge: Identifying Chemicals Data: Schymanski et al 2014, DOI: 10.1021/es4044374; https://www.slideshare.net/EmmaSchymanski/small-molecules-in-big-data-analytica-munich Sample High resolution mass spectrometry Chemicals AND connecting chemical knowledge
  10. 10. 10 Mass Spectral Libraries http://massbank.eu/MassBank https://github.com/MassBank/MassBank-data/ >46,000 spectra 32 contributors
  11. 11. 11 MassBank EU https://github.com/MassBank/MassBank-data; https://github.com/MassBank/MassBank-web/; Rösch et al DOI 10.1021/acs.est.5b05186 http://massbank.eu/MassBank o MassBank.EU was founded late 2012, hosted at UFZ, Leipzig, Germany o >16,000 MS/MS spectra; 1,200 substances from NORMAN members o MassBank now has >46,000 spectra from 32 contributing institutes! o Thorough Github-based modernization in progress for traceability: o Tentative/unknown/literature spectra (Level Scheme) as SI for publications Schymanski et al DOI: 10.1021/es5002105
  12. 12. 12 Mass Spectral Libraries https://github.com/cdk/depict/
  13. 13. 13 Confidence Levels for Tentative Structures Schymanski, Jeon, Gulde, Fenner, Ruff, Singer & Hollender (2014) ES&T, 48 (4), 2097-2098. DOI: 10.1021/es5002105 o Annotation is the key to communicating information MS, MS2, RT, Reference Std. Level 1: Confirmed structure by reference standard Level 2: Probable structure a) by library spectrum match b) by diagnostic evidence Identification confidence N N N NHNH CH3 CH3 S CH3 OH MS, MS2, Library MS2 MS, MS2, Exp. data Example Minimum data requirements Level 4: Unequivocal molecular formula Level 5: Exact mass of interest C6H5N3O4 192.0757 MS isotope/adduct MS Level 3: Tentative candidate(s) structure, substituent, class MS, MS2, Exp. data
  14. 14. 14 Creating High-Quality Mass Spectra Stravs, Schymanski, Singer and Hollender, 2013, Journal of Mass Spectrometry, 48, 89–99. DOI: 10.1002/jms.3131 Automatic MS and MS/MS Recalibration and Clean-up Remove interfering peaks Spectral Annotation with - Experimental Details - Compound Information https://github.com/MassBank/RMassBank/ http://bioconductor.org/packages/RMassBank/
  15. 15. 15 Communicating Mass Spectra for Mixtures Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 Formulas: http://sourceforge.net/projects/genform/ Meringer et al, 2011, MATCH 65, 259-290 Data: Schymanski et al. 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Chromatography and MS/MS Annotation Literature: LIT00034,35 Sample: ETS00002 Standard: ETS00016,17,19,20 https://github.com/MassBank/RMassBank/
  16. 16. 16 1 10 100 1000 10000 100000 1 million 1 billion chemicals …. …. …. Our (Community) Challenge: Identifying Chemicals Data: Schymanski et al 2014, DOI: 10.1021/es4044374; https://www.slideshare.net/EmmaSchymanski/small-molecules-in-big-data-analytica-munich Sample High resolution mass spectrometry Chemicals AND connecting chemical knowledge
  17. 17. 17 European (World-)Wide Exchange of Suspects Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236
  18. 18. 18 NORMAN Suspect List Exchange o http://www.norman-network.com/?q=node/236 Schymanski, Aalizadeh et al. in prep; https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics ReferencesFull Lists
  19. 19. 19 o Now 21 lists available online … from small to large! • Specialist collections (e.g. NormaNEWS) to large market lists • Integrated into the CompTox Chemistry Dashboard NORMAN Suspect Exchange Lists
  20. 20. 20 NORMAN Lists => CompTox Dashboard https://comptox.epa.gov/dashboard/chemical_lists/normanews http://www.norman-network.com/?q=node/236 https://comptox.epa.gov/dashboard/chemical_lists/normanews
  21. 21. 21 Lists on CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard/chemical_lists/ More lists become available with every release
  22. 22. 22 CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard/
  23. 23. 23 CompTox Chemistry Dashboard – Presence in Lists https://comptox.epa.gov/dashboard/chemical_lists
  24. 24. 24 CompTox Chemistry Dashboard – Additional Data https://comptox.epa.gov/dashboard/
  25. 25. 25 Metadata & Different Chemical Forms Schymanski & Williams, 2017, ES&T, 51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
  26. 26. 26 Metadata & Different Chemical Forms MS-ready: McEachran et al. 2018, J Cheminform. DOI: 10.1186/s13321-018-0299-2
  27. 27. 27 Metadata & Different Chemical Forms https://comptox.epa.gov/dashboard/dsstoxdb/mixture_search?cid=930
  28. 28. 28 Connecting Resources in MetFrag https://msbi.ipb-halle.de/MetFragBeta/ AND https://comptox.epa.gov/dashboard/dsstoxdb/batch_search (MetFrag Export) https://msbi.ipb-halle.de/MetFragBeta/
  29. 29. 29 MetFrag2.3: Non-target Identification Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Status: 2010 => 2016 5 ppm 0.001 Da mz [M-H]- 213.9637 ChemSpider or PubChem± 5 ppm 2.3 RT: 4.54 min 355 InChI/RTs References External Refs Data Sources RSC Count PubMed Count Suspect Lists MS/MS 134.0054 339689 150.0001 77271 213.9607 632466 Elements: C,N,S S OO OH
  30. 30. 30 MetFrag2.3: Non-target Identification Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/
  31. 31. 31 MetFrag2.3: Non-target Identification Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/
  32. 32. 32 https://msbi.ipb-halle.de/MetFragBeta/ ; https://comptox.epa.gov/dashboard/ ; https://massbank.eu ; http://normandata.eu/ Combined evidence clearly highlights potential neurotoxicant among chemical candidates Connecting Resources in MetFrag
  33. 33. 33 MS-ready: McEachran et al. 2018, J Cheminform. DOI: 10.1186/s13321-018-0299-2 Connecting Resources in MetFrag
  34. 34. 34 Connecting and Enhancing Open Resources https://www.slideshare.net/EmmaSchymanski/small-molecules-in-big-data-analytica-munich o Sharing knowledge is a win-win situation 2014 2015: found in waters across Europe 2016: 1 datapoint cross-annotates 3072 in GNPS Hits in GNPS MassIVE datasets: Surfactants: http://goo.gl/7sY9Pf 2017: Early-Warning System is born 2018: Highlighted in Science
  35. 35. 35 NORMAN Digital Sample Freezing Platform “Live” retrospective screening of known and unknown chemicals in European samples (various matrices) http://norman-data.eu/ AND Alygizakis et al, in prep.
  36. 36. 36 Interactive heatmap available at http://norman-data.eu/NORMAN-REACH NORMAN Digital Sample Freezing Platform Retrospective screening of REACH chemicals in Black Sea samples (various matrices)
  37. 37. 37 NORMAN Digital Sample Freezing Platform “Live” retrospective screening of known and unknown chemicals in European samples (various matrices) Future work: use results of unknowns to drive prioritization efforts http://norman-data.eu/ AND Alygizakis et al, in prep.
  38. 38. 38 Real-time Monitoring of the Rhine River Hollender, Schymanski, Singer & Ferguson, 2018, ES&T Feature, 51:20, 11505-11512. DOI: 10.1021/acs.est.7b02184 Previously unknown chemicals detected due to “stand-out” patterns
  39. 39. 39 Real-time Monitoring of the Rhine River Hollender, Schymanski, Singer & Ferguson, 2018, ES&T Feature, 51:20, 11505-11512. DOI: 10.1021/acs.est.7b02184 Previously unknown chemicals detected due to “stand-out” patterns
  40. 40. 40 We still have many unknowns … (l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich. Environment Cells
  41. 41. 41 NORMAN Suspects don’t all have structures!
  42. 42. 42 Accessing Metadata Behind Complex Mixtures Highest Priority PFAS are also highly complex UVCBs!
  43. 43. 43 Homologous Series Detection M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374 http://www.envihomolog.eawag.ch/ Search for discrete mass differences S OO OH CH3 CH3 m n C9H19 O O S O O OHm
  44. 44. 44 Towards high throughput MS screening of UVCBs o https://github.com/schymane/RChemMass/
  45. 45. 45 Cross-Linking Homologues in the Dashboard Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
  46. 46. 46 Homologous Series in Biological Matrices Lipid extract of Mycobacterium smegmatis C23F48O7 +CF2 Schymanski & Zamboni … random data exploration …
  47. 47. 47 Exchanging Knowledge … Open Science Helps! We need to be able to find and annotate the unexpected! C23F48O7 +CF2 Schymanski & Zamboni … random data exploration …
  48. 48. 48 Exchanging Knowledge … Open Science Helps! We need to be able to find and annotate the unexpected!
  49. 49. 49 Exchanging data reveals things we never expected! Schymanski & Zamboni … random data exploration … o Lipid extract of Mycobacterium smegmatis C23F48O7 +CF2 DTXSID70880513DTXSID70880513
  50. 50. 50 Community Challenges … and Solutions Data: Schymanski et al 2014, DOI: 10.1021/es4044374; https://www.slideshare.net/EmmaSchymanski/small-molecules-in-big-data-analytica-munich High resolution mass spectrometry AND connecting chemical knowledge
  51. 51. 51 Target List Suspect List (e.g. NORMAN, LMC, Eawag-PPS, ReSOLUTION) Componentization (nontarget) TARGET ANALYSIS SUSPECT SCREENING NON-TARGET SCREENING (enviMass, vendor software) Gather evidence (nontarget, ReSOLUTION, RMassBank) Masses of interest Molecular formula determination (enviPat, GenForm) Non-target identification (MetFrag2.3, ReSOLUTION) Sampling extraction (SPE) HPLC separation HR-MS/MS Detection of blank/blind/noise/internal standards; time trend analysis (enviMass) Conversion (Proteowizard) and Peak Picking (enviPick, xcms, MZmine, …) Prioritization (enviMass) MS/MS Extraction (RMassBank) Interpretation, confirmation, peak inventory, confidence and reporting
  52. 52. 52 Coming Soon … (WiP and Already Online!) Schymanski, Baker, Williams, Singh et al. in preparation. Excel macro: https://figshare.com/s/824f6606644f474c7288 https://comptox.epa.gov/dashboard/chemical_lists/litminedneuro
  53. 53. 53 Conclusions / Outlook / Perspectives Monzel et al 2017 Stem Cell Reports, DOI: 10.1016/j.stemcr.2017.03.010 (Organoids) o Over 60 % of HR-MS peaks are potentially relevant but unknown o Non-target screening requires data and evidence from many different sources o Many excellent workflows now available to collate this information o Incorporation of all available metadata (expert knowledge) is critical to success! o Complex mixtures (UVCBs) are a huge and very challenging part of the puzzle o New cheminformatics approaches needed - great progress so far o Information in the public domain helps everyone! o Additional experimental methods can provide more information o H-D exchange-based labelling [EXTRA SLIDES] o Integration of computational toxicity knowledge essential o LCSB has some amazing facilities and expertise (I am just beginning to appreciate how much …)
  54. 54. 54 Acknowledgements emma.schymanski@uni.lu Further Information: https://massbank.eu/MassBank/ http://c-ruttkies.github.io/MetFrag/ https://comptox.epa.gov/dashboard/ http://www.norman-network.com/?q=node/236 https://wwwen.uni.lu/lcsb/research/ environmental_cheminformatics .eu EU Grant 603437
  55. 55. 55

×