Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RSC Environmental Cheminformatics to Identify Unknowns Feb 2019

81 views

Published on

The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond.

Published in: Science
  • Be the first to comment

RSC Environmental Cheminformatics to Identify Unknowns Feb 2019

  1. 1. 1 Environmental Cheminformatics to Identify Unknown Chemicals and their Effects Assoc. Prof. Dr. Emma L. Schymanski FNR ATTRACT Fellow and PI in Environmental Cheminformatics Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg Email: emma.schymanski@uni.lu …and many colleagues who contributed to my science over the years! Image©www.seanoakley.com/ https://tinyurl.com/rsc-echidna RSC Event: Latest Advances in the Analysis of Complex Environmental Matrices, 22 Feb. 2019, London, UK.
  2. 2. 2 Environmental Cheminformatics and Biomedicine? Chemistry what? Medicine why? Biology how? Patient cohort Yeast MouseHuman Zebrafish Microbiome
  3. 3. 3 Our challenge? We still have many unknowns … (l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich. Wastewater Cells
  4. 4. 4 Target, Suspect and Non-Target Screening KNOWNS SUSPECTS No Prior Knowledge HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING NON-TARGET SCREENING Targets found Suspects found Masses of interest (Molecular formula) DATABASE SEARCH STRUCTURE GENERATION Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS Time, Effort & Number of Compounds…. SUSPECTS SPECTRUM SEARCH Spectral match
  5. 5. 5 Identification Strategies and Confidence Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 Peak picking Non-target HR-MS(/MS) Acquisition Target Screening Suspect Screening Non-target Screening Start Level 1 Confirmed Structure by reference standard Level 2 Probable Structure by library/diagnostic evidence Start Level 3 Tentative Candidate(s) suspect, substructure, class Level 4 Unequivocal Molecular Formula insufficient structural evidence Start Level 5 Mass of Interest multiple detection, trends, … “downgrading” with contradictory evidence Increasing identification confidence Target list Suspect list Peak picking or XICs
  6. 6. 6 What is in our (Swiss) Wastewater? France Germany Austria Italy Vernier Uetendorf Zug Werdhölzli, Zürich Bioggio Bussigny prés Lausanne Hallau Zwillikon ThalWinterthur Map © Eawag/BAFU/SwissTopo 10 Wastewater Treatment Plants 24 hr flow-proportional samples February 2010 364 target substances Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
  7. 7. 7 Target Analysis: Status Quo (>364 targets) Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Target List HPLC separation and HR-MS/MS TARGET ANALYSIS Targets found Confirmation and quantification of compounds present Sampling extraction (SPE) HPLC separation HR-MS/MS TPs!
  8. 8. 8 Target Analysis: Status Quo (>364 targets) Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Target List HPLC separation and HR-MS/MS TARGET ANALYSIS Targets found Confirmation and quantification of compounds present Sampling extraction (SPE) HPLC separation HR-MS/MS m/z RT
  9. 9. 9 Swiss Wastewater: Top 30 Peaks (ESI-) Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Artificial Sweeteners Diclofenac Pictures: www.coca-cola-com; www.rivella.ch; www.voltargengel.com
  10. 10. 10 Suspect Screening: Different Approaches Target List Suspect List HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING Targets found Suspects found Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS o Search in mass spectral libraries o Screen for predicted transformation products of known parent compounds o Look for “well known” substances without reference standards o Screen for known homologue series
  11. 11. 11 Searching Mass Spectral Libraries o … which one? Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
  12. 12. 12 Do we need all these libraries? Vinaixa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005 o Yes … most libraries still have many unique entries = HMDB, GNPS, MassBank, ReSpect Compound lists provided by: S. Stein, R. Mistrik, Agilent
  13. 13. 13 Mind the Gap! Frainay, C. et al. (2018) “Mind the Gap: …” Metabolites: http://www.mdpi.com/2218-1989/8/3/51 o Best library to choose depends highly on your dataset • Example: MSforID (https://msforid.com/) is poor for metabolic networks – but great for forensic toxicology!
  14. 14. 14 MassBank EU http://massbank.eu/MassBank https://github.com/MassBank/MassBank-data/ >52,800 spectra >40 contributors
  15. 15. 15 Creating High-Quality Mass Spectra Stravs, Schymanski, Singer and Hollender, 2013, Journal of Mass Spectrometry, 48, 89–99. DOI: 10.1002/jms.3131 Automatic MS and MS/MS Recalibration and Clean-up Remove interfering peaks Spectral Annotation with - Experimental Details - Compound Information https://github.com/MassBank/RMassBank/ http://bioconductor.org/packages/RMassBank/
  16. 16. 16 Communicating Mass Spectra for Mixtures Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 Formulas: http://sourceforge.net/projects/genform/ Meringer et al, 2011, MATCH 65, 259-290 Data: Schymanski et al. 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Chromatography and MS/MS Annotation Literature: LIT00034,35 Sample: ETS00002 Standard: ETS00016,17,19,20 https://github.com/MassBank/RMassBank/
  17. 17. 17 Suspect Screening: Different Approaches Target List Suspect List HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING Targets found Suspects found Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS o Search in mass spectral libraries o Screen for predicted transformation products of known parent compounds o Look for “well known” substances without reference standards o Screen for known homologue series
  18. 18. 18 Suspect Screening: Benzotriazole TPs Huntscha et al. 2014, ES&T, 48(8), 4435-4443. DOI: 10.1021/es405694z 28 Suspects HPLC separation and HR-MS/MS SUSPECT SCREENING 11 masses for 6 suspect formulas 7 with MS/MS 1 reference std. 1 TP confirmed 1 TP “likely”, no std. [UM-PPS] ↓ Eawag-PPS ↓ [enviPath]
  19. 19. 19 Suspect Screening: Benzotriazole TPs Huntscha et al. 2014, ES&T, 48(8), 4435-4443. DOI: 10.1021/es405694z 28 Suspects HPLC separation and HR-MS/MS SUSPECT SCREENING 11 masses for 6 suspect formulas 7 with MS/MS 1 reference std. 1 TP confirmed 1 TP “likely”, no std. [UM-PPS] ↓ Eawag-PPS ↓ [enviPath] N N N H O OH N N N H O OH - Predicted with Eawag-PPS - No standard - Not in databases (at that time) - Confirmed with reference std. - Observed in WWTP effluents
  20. 20. 20 Suspect Screening: Benzotriazole TPs Huntscha et al. 2014, ES&T, 48(8), 4435-4443. DOI: 10.1021/es405694z 1H-BT .eu
  21. 21. 21 Suspect Screening: Different Approaches Target List Suspect List HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING Targets found Suspects found Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS o Search in mass spectral libraries o Screen for predicted transformation products of known parent compounds o Look for “well known” substances without reference standards o Screen for known homologue series
  22. 22. 22 European (World-)Wide Exchange of Suspects Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236
  23. 23. 23 NORMAN Suspect List Exchange o http://www.norman-network.com/?q=node/236 Schymanski, Aalizadeh et al. in prep; https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics ReferencesFull Lists
  24. 24. 24 NORMAN SusDat (merged list) https://www.norman-network.com/nds/susdat/ https://comptox.epa.gov/dashboard/chemical_lists/susdat Schymanski, Aalizadeh et al. in prep; https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics
  25. 25. 25 Suspect Screening: Different Approaches Target List Suspect List HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING Targets found Suspects found Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS o Search in mass spectral libraries o Screen for predicted transformation products of known parent compounds o Look for “well known” substances without reference standards o Screen for known homologue series
  26. 26. 26 RECAP: Target Analysis: Status Quo (>364 targets) Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Target List HPLC separation and HR-MS/MS TARGET ANALYSIS Targets found Confirmation and quantification of compounds present Sampling extraction (SPE) HPLC separation HR-MS/MS m/z RT
  27. 27. 27 Grouping Isotopes and Adducts Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 0 3000 6000 9000 12000 15000 positive 2% 27% 100% Noise/Blank Targets Non-targets 0 3000 6000 9000 12000 15000 positivenegative 1% 30% 100%
  28. 28. 28 Swiss Wastewater: Top 30 Peaks (ESI-) Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 S OO O - O S O - O CH2 m/z = 79.96 m/z = 183.01 Picture: www.momsteam.com
  29. 29. 29 Surfactant Screening From Literature Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Literature sources o Formulas, masses (ions), retention times and intensities o Spectra of selected compounds (different instruments) Gonzalez et al. Rapid Comm. Mass Spec. 2008, 22: 1445-54 Lara-Martin et al. EST. 2010, 44: 1670-1676
  30. 30. 30 Homologous Series Detection M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374 http://www.envihomolog.eawag.ch/ Search for discrete mass differences S OO OH CH3 CH3 m n C9H19 O O S O O OHm
  31. 31. 31 Homologous Series Detection M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374 S OO OH CH3 CH3 m n DATS S OO OH O OH CH3 ()n ()m SPAC S OO OH O OHCH3 ()n ()m STAC http://www.envihomolog.eawag.ch/
  32. 32. 32 Supporting Evidence for Homologues Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 Formulas: http://sourceforge.net/projects/genform/ Meringer et al, 2011, MATCH 65, 259-290 Data: Schymanski et al. 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Chromatography and MS/MS Annotation Literature: LIT00034,35 Sample: ETS00002 Standard: ETS00016,17,19,20 https://github.com/MassBank/RMassBank/
  33. 33. 33 Swiss Wastewater: Top 30 Peaks (ESI-) Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Acesulfame Diclofenac Cyclamate Saccharin C10DATS C10SPAC SPA5C C15DATS STA6C C9DATS SPA2DC S OO OH O OH CH3 S OO OH CH3 CH3 ()n ()m SPAC DATS ()n ()m
  34. 34. 34 Cross-Linking Homologues in the Dashboard Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
  35. 35. 35 What about Non-Target Screening? Target List Suspect List (no prior information) HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING NON-TARGET SCREENING Targets found Suspects found Masses of interest (Molecular formula) DATABASE SEARCH STRUCTURE GENERATION Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS Number of compounds
  36. 36. 36 Target List Suspect List (e.g. NORMAN, LMC, Eawag-PPS, ReSOLUTION) Componentization (nontarget) TARGET ANALYSIS SUSPECT SCREENING NON-TARGET SCREENING (enviMass, vendor software) Gather evidence (nontarget, ReSOLUTION, RMassBank) Masses of interest Molecular formula determination (enviPat, GenForm) Non-target identification (MetFrag2.3, ReSOLUTION) Sampling extraction (SPE) HPLC separation HR-MS/MS Detection of blank/blind/noise/internal standards; time trend analysis (enviMass) Conversion (Proteowizard) and Peak Picking (enviPick, xcms, MZmine, …) Prioritization (enviMass) MS/MS Extraction (RMassBank) Interpretation, confirmation, peak inventory, confidence and reporting
  37. 37. 37 MetFrag Relaunched! Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Status: 2010 => 2019 5 ppm 0.001 Da mz [M-H]- 213.9637 ChemSpider or PubChem± 5 ppm RT: 4.54 min 355 InChI/RTs References External Refs Data Sources RSC Count PubMed Count Suspect Lists MS/MS 134.0054 339689 150.0001 77271 213.9607 632466 Elements: C,N,S S OO OH
  38. 38. 38 MetFrag Relaunched! Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/
  39. 39. 39 Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/ MetFrag Relaunched!
  40. 40. 40 State of the Art in Small Molecule Identification Schymanski et al, 2017, J Cheminf., DOI: 10.1186/s13321-017-0207-1 www.casmi-contest.org Metadata is critical to improving annotation of known unknowns!
  41. 41. 41 https://msbi.ipb-halle.de/MetFragBeta/ ; https://comptox.epa.gov/dashboard/ ; https://massbank.eu ; http://normandata.eu/ Combined evidence clearly highlights potential neurotoxicant among chemical candidates Connecting Resources in MetFrag
  42. 42. 42 McEachran et al. 2018, DOI: 10.1186/s13321-018-0299-2; Schymanski & Williams, 2017 ES&T DOI: 10.1021/acs.est.7b01908 “MS-ready” Form for MetaData in MetFrag
  43. 43. 43 “MS-ready” Form for MetaData in MetFrag
  44. 44. 44 Connecting and Enhancing Open Resources https://www.slideshare.net/EmmaSchymanski/small-molecules-in-big-data-analytica-munich o Sharing knowledge is a win-win situation 2014 2015: found in waters across Europe 2016: 1 datapoint cross-annotates 3072 in GNPS Hits in GNPS MassIVE datasets: Surfactants: http://goo.gl/7sY9Pf 2017: Early-Warning System is born 2018: Highlighted in Science
  45. 45. 45 NORMAN Digital Sample Freezing Platform “Live” retrospective screening of known and unknown chemicals in European samples (various matrices) http://norman-data.eu/ and Alygizakis et al, submitted.
  46. 46. 46 Interactive heatmap available at http://norman-data.eu/NORMAN-REACH NORMAN Digital Sample Freezing Platform Retrospective screening of REACH chemicals in Black Sea samples (various matrices)
  47. 47. 47 Real-time Monitoring of the Rhine River Hollender, Schymanski, Singer & Ferguson, 2018, ES&T Feature, 51:20, 11505-11512. DOI: 10.1021/acs.est.7b02184 Previously unknown chemicals detected due to “stand-out” patterns
  48. 48. 48 NORMAN Digital Sample Freezing Platform Future work: use results of unknowns to drive prioritization efforts http://norman-data.eu/ and Alygizakis et al, submitted
  49. 49. 49 Future MetaData: Topic-Specific Reference Counts Schymanski, Baker, Williams, Singh et al. submitted. Excel macro: https://figshare.com/s/824f6606644f474c7288 https://comptox.epa.gov/dashboard/chemical_lists/litminedneuro
  50. 50. 50 Take Home Messages Unknowns and High Resolution Mass Spectrometry o Over 60 % of HR-MS peaks are potentially relevant but unknown Environment Cells
  51. 51. 51 Take Home Messages o Over 60 % of HR-MS peaks are potentially relevant but unknown o Annotating unknowns requires data and evidence from many different sources o Many excellent workflows available to collate this information o Incorporation of all available metadata is critical to success! o E.g. MetFrag has greatly improved the speed and success of tentative identification of “known unknowns”: 15 % => 89 % Ranked Number 1 o https://ipb-halle.github.io/MetFrag/ Unknowns and High Resolution Mass Spectrometry
  52. 52. 52 Take Home Messages o Over 60 % of HR-MS peaks are potentially relevant but unknown o Annotating unknowns requires data and evidence from many different sources o Exchange expert knowledge worldwide o Community efforts contribute greatly to improved cross-annotation o Information in the public domain helps everyone! o You never know when it will help you  Unknowns and High Resolution Mass Spectrometry Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365
  53. 53. 53 Acknowledgements emma.schymanski@uni.lu Further Information: https://massbank.eu/MassBank/ https://ipb-halle.github.io/MetFrag/ https://comptox.epa.gov/dashboard/ http://www.norman-network.com/?q=node/236 https://wwwen.uni.lu/lcsb/research/ environmental_cheminformatics .eu EU Grant 603437
  54. 54. 54

×