Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Small Molecules in Big Data fTALES Ghent

39 views

Published on

Finding small molecules in big data, f-TALES meeting in Ghent (https://ftales.be/).

Published in: Data & Analytics
  • Be the first to comment

Small Molecules in Big Data fTALES Ghent

  1. 1. 1 Emma Schymanski FNR ATTRACT Fellow and PI, Environmental Cheminformatics Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg Email: emma.schymanski@uni.lu Antony J. Williams National Centre for Computational Toxicity (NCCT), US Environmental Protection Agency (US EPA), NC, USA Image © www.seanoakley.com/ The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. Finding Small Molecules in Big Data https://tinyurl.com/smallmol-ftales-ghent 26-27 September 2018
  2. 2. 2 Small molecules … big problems? E. coli data provided by N. Zamboni, IMSB, ETH Zürich. o Status quo of small molecules: • How many are in compound databases? • How many could there be? • How many are in spectral libraries? • Mind the Gap! o Exchanging “expert knowledge” • Suspect Lists in Europe • Live, retrospective screening & untargeted MS o Tackling Complex Structures • Exchanging Information on Unknowns • …and how Open Science helps! Cells
  3. 3. 3 Searching for Small Molecules … o Compound Databases Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034 PubChem: >95 million https://pubchem.ncbi.nlm.nih.gov/ ChemSpider: >60 million http://www.chemspider.com/ CompTox Chemicals Dashboard: >765 000 https://comptox.epa.gov/dashboard/ Human Metabolome DB (HMDB): >115 000 http://www.hmdb.ca/
  4. 4. 4 Searching for Small Molecules … o Compound Databases … isn’t 95 million enough? Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034 Quick answer … NO! E. coli data :N. Zamboni, IMSB, ETH Zürich in silico prediction
  5. 5. 5 Searching for More Small Molecules … o In silico metabolite prediction – example of MINE (2015) Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1 KEGG MINE 13,307 => 571,368 EcoCyc MINE 1,832 => 54,719 YMDB MINE 1,978 => 100,755 HMDB [15] MINE 23,035 => 400,414
  6. 6. 6 Searching for More Small Molecules … o In silico metabolite prediction – example of MINE (2015) • First generation only … combinatorial explosion! Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1 KEGG MINE 13,307 => 571,368 EcoCyc MINE 1,832 => 54,719 YMDB MINE 1,978 => 100,755 HMDB [15] MINE 23,035 => 400,414 Speculation … PubChem MINE 95 million => 1.6 billion … first generation only?!?!
  7. 7. 7 Searching for EVEN MORE Small Molecules … Source: A. Kerber, R. Laue, M. Meringer, C. Rücker (2005) MATCH 54 (2), 301-312. o Structure Generation • But of course most of these do not exist Molecular Mass NumberofStructures 50 70 90 110 130 150 1100100001000000100000000 NIST MS LibraryNIST MS Library Beilstein Registry NIST MS Library Beilstein Registry Molecular Graphs Structure Generation 100 million at MW 150 NIST MS Library ~1-200 at MW 150 Spectral Libraries
  8. 8. 8 Searching for Small Molecules in Spectral Libraries o … to find what is “on record”… • Too many different MS/MS libraries (and they are still too small) Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
  9. 9. 9 Do we need all these libraries? o Yes … most libraries still have many unique entries Vinaixa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005 = HMDB, GNPS, MassBank, ReSpect Compound lists provided by: S. Stein, R. Mistrik, Agilent
  10. 10. 10 Mind the Gap! Frainay, C. et al. (2018) “Mind the Gap: …” Metabolites: http://www.mdpi.com/2218-1989/8/3/51 o Only 23-60 % of (defined) metabolites in Genome-Scale Metabolic Networks are covered by (combined!) Mass Spectral Libraries
  11. 11. 11 Mind the Gap! Frainay, C. et al. (2018) “Mind the Gap: …” Metabolites: http://www.mdpi.com/2218-1989/8/3/51 o Best library to choose depends highly on your dataset • Example: MSforID (https://msforid.com/) is poor for metabolic networks – but great for forensic toxicology!
  12. 12. 12 SPectraL hASH (SPLASH) – Search between libraries splash10 - 0002 - 0900000000 - b112e4e059e1ecf98c5f [version] - [top10] - [histogram] - [hash of full spectrum] http://splash.fiehnlab.ucdavis.edu/ Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689
  13. 13. 13 Small molecules … big problems? E. coli data provided by N. Zamboni, IMSB, ETH Zürich. o Status quo of small molecules: • How many are in compound databases? • How many could there be? • How many are in spectral libraries? • Mind the Gap! o Exchanging “expert knowledge” • Suspect Lists in Europe • Live, retrospective screening & untargeted MS o Tackling Complex Structures • Exchanging Information on Unknowns • …and how Open Science helps! Cells
  14. 14. 14 2015: European Non-target Screening Trial Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 Croatian Water RWS
  15. 15. 15 European (World-)Wide Exchange of Suspects Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365 NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236
  16. 16. 16 o http://www.norman-network.com/?q=node/236 o 21 lists available … specialist collections to market lists • Integrated in NORMAN Databases & CompTox Chemicals Dashboard NORMAN Suspect Exchange Lists Coordinated with publications!
  17. 17. 17 on the CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard/chemical_lists/normanews https://comptox.epa.gov/dashboard/chemical_lists/normanews
  18. 18. 18 NORMAN Digital Sample Freezing Platform “Live” retrospective screening of known and unknown chemicals in European samples (various matrices) http://norman-data.eu/ AND Aligizakis et al, in prep.
  19. 19. 19 Interactive heatmap available at http://norman-data.eu/NORMAN-REACH NORMAN Digital Sample Freezing Platform Retrospective screening of REACH chemicals in Black Sea samples (various matrices)
  20. 20. 20 Challenge 1: Not all suspects have structures!
  21. 21. 21 Challenge 2: Mass Spec “sees” ONE part at a time See: Schymanski & Williams, 2017, DOI: 10.1021/acs.est.7b01908 and McEachran et al, 2018, DOI: 10.1186/s13321-018-0299-2 o What do all these chemicals have in common?
  22. 22. 22 Small molecules … big problems? E. coli data provided by N. Zamboni, IMSB, ETH Zürich. o Status quo of small molecules: • How many are in compound databases? • How many could there be? • How many are in spectral libraries? • Mind the Gap! o Exchanging “expert knowledge” • Suspect Lists in Europe • Live, retrospective screening & untargeted MS o Tackling Complex Structures • Exchanging Information on Unknowns • …and how Open Science helps! Cells
  23. 23. 23 We still have many unknowns … (l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich. Environment Cells
  24. 24. 24 …and many are interconnected by mass Schymanski et al. 2014, ES&T, DOI: 10.1021/es4044374; M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z Homologous Series in environmental and biological samples …. S OO OH CH3 CH3 m n C9H19 O O S O O OHm Lipid extract data of Mycobacterium smegmatis provided by N. Zamboni, IMSB, ETHZ
  25. 25. 25 New Ways to Store and Access Homologues Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6 https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID3020041 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf https://comptox.epa.gov/dashboard/ chemical_lists/eawagsurf
  26. 26. 26 RECAP: Mass Spec “sees” ONE part at a time See: Schymanski & Williams, 2017, DOI: 10.1021/acs.est.7b01908 and McEachran et al, 2018, DOI: 10.1186/s13321-018-0299-2 o What do all these chemicals have in common?
  27. 27. 27 MS-Ready: Accessing Data from Salts and Mixtures McEachran et al. 2018 J. Cheminformatics 10:45 DOI: 10.1186/s13321-018-0299-2 and https://msbi.ipb-halle.de/MetFragBeta/
  28. 28. 28 Annotated Spectra of Homologues in MassBank.EU Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 www.massbank.eu ACCESSIONS (LAS, SPACs): Literature MS/MS LIT00034, LIT00037 Std Mix., Sample ETS00012, ETS00018https://github.com/MassBank/RMassBank/ Tentatively Identified Spectra: http://goo.gl/0t7jGp
  29. 29. 29 European (World-)Wide Exchange of Suspects Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236 Tentatively Identified Spectra: http://goo.gl/0t7jGp Hits in GNPS MassIVE datasets: TPs in skin: http://goo.gl/NmO4tx Surfactants: http://goo.gl/7sY9Pf
  30. 30. 30 NORMAN Digital Sample Freezing Platform “Live” retrospective screening of known and unknown chemicals in European samples (various matrices) Future work: use results of unknowns to drive prioritization efforts http://norman-data.eu/ AND Aligizakis et al, in prep.
  31. 31. 31 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034 http://splash.fiehnlab.ucdavis.edu/
  32. 32. 32 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources o Exchanging expert knowledge worldwide • Community efforts contribute greatly to improved cross-annotation Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365
  33. 33. 33 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources o Exchanging expert knowledge worldwide • Community efforts contribute greatly to improved cross-annotation o Tackling complex structures • Huge progress in cheminformatics approaches in very short time … https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics
  34. 34. 34 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources o Exchanging expert knowledge worldwide • Community efforts contribute greatly to improved cross-annotation o Tackling complex structures • Huge progress in cheminformatics approaches in very short time … • Information in the public domain helps everyone! (you never know when it will help you!) Open Science Viewpoint: Schymanski & Williams, 2017, ES&T, 51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
  35. 35. 35 Acknowledgements I emma.schymanski@uni.lu Further Information: http://www.norman-network.com/?q=node/236 https://massbank.eu/MassBank/ https://comptox.epa.gov/dashboard/ https://www.researchgate.net/project/Supporting-Mass- Spectrometry-Through-Cheminformatics https://github.com/MassBank/ .eu 2.3 EU Grant 603437
  36. 36. 36
  37. 37. 37
  38. 38. 38 MS-Ready: Metadata & Chemical Forms Schymanski & Williams, 2017, ES&T51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
  39. 39. 39 MetFrag2.3: Non-target Identification Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Status: 2010 => 2016 5 ppm 0.001 Da mz [M-H]- 213.9637 ChemSpider or PubChem± 5 ppm 2.3 RT: 4.54 min 355 InChI/RTs References External Refs Data Sources RSC Count PubMed Count Suspect Lists MS/MS 134.0054 339689 150.0001 77271 213.9607 632466 Elements: C,N,S S OO OH

×