Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Challenge of Deeper Knowledge Graphs for Science

194 views

Published on

Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Challenge of Deeper Knowledge Graphs for Science

  1. 1. THE CHALLENGE OF DEEPER KNOWLEDGE GRAPHS FOR SCIENCEPAUL GROTH | @PGROTH | PGROTH.COM CONTRIBUTIONS: RON DANIEL, MICHAEL LAURUHN & @ELSEVIERLABS TEAM
  2. 2. OUTLINE ▸Research Performance ▸Knowledge Graphs ▸Research as a low resource domain ▸Quality
  3. 3. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  4. 4. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  5. 5. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  6. 6. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  7. 7. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  8. 8. WHY? INFORMATION OVERLOAD
  9. 9. WHY? IN PRACTICE Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey & interviews : • The needs and behaviors of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. • Participants require details about data collection and handling • Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common. K Gregory, H Cousijn, P Groth, A Scharnhorst, S Wyatt (2018). Understanding Data Retrieval Practices: A Social Informatics Perspective. arXiv preprint arXiv:1801.04971
  10. 10. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ANSWERS ARE ABOUT THINGS, NOT JUST WORKS Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  11. 11. ENTER KNOWLEDGE GRAPHS ERNST, PATRICK, ET AL. "DEEPLIFE: AN ENTITY- AWARE SEARCH, ANALYTICS AND EXPLORATION PLATFORM FOR HEALTH AND LIFE SCIENCES." PROCEEDINGS OF ACL-2016 SYSTEM DEMONSTRATIONS (2016): 19-24.
  12. 12. Knowledge Graphs: The Science System
  13. 13. Knowledge Graphs: Curated Databases From: Wikidata as a semantic framework for the Gene Wiki initiative Database (Oxford). 2016;2016. doi:10.1093/database/baw015
  14. 14. RESEARCH IS DIVERSE http://knowescape.org/map-of-science-an-update/
  15. 15. 15 Augenstein, Isabelle, et al. "SemEval 2017 Task 10: ScienceIE-Extracting Keyphrases and Relations from Scientific Publications." Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 2017. SCIENTIFIC TEXT IS CHALLENGING
  16. 16. UNSUPERVISED & DISTANT SUPERVISION EXAMPLE: UNIVERSAL SCHEMAS AND REVERB Groth et al., Applying Universal Schemas for Domain Specific Ontology Expansion http://www.akbc.ws/2016/papers/3_Paper.pdf • Successful in predicting new triples (F1 =~ .7) • ReVerb’s relations very interesting, but recall very low • Was not domain independent • Matched arguments against a medical ontology to improve precision • Predicted relations were restricted to relation types from the same ontology
  17. 17. OPEN INFORMATION EXTRACTION IN SCIENCE IS HARD Open Information Extraction on Scientific Text: An Evaluation. Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr.. COLING 2018 Example: “The patient was treated with Emtricitabine, Etravirine, and Darunavir” ‣ (The patient :: was treated with :: Emtricitabine, Etravirine, and Darunavir) Another possible extraction is: ‣ (The patient :: was treated with :: Emtricitabine) ‣ (The patient :: was treated with :: Etravirine) ‣ (The patient :: was treated with :: Darunavir) 698 unique relation types – 400 relation types
  18. 18. CROWDS ARE NOT EXPERTS Use of Internal Testing Data to Help Determine Compensation for Crowdsourcing Tasks Michael Lauruhn, Paul Groth, Corey Harper, Helena Deus. HUML 2018
  19. 19. TRANSFER LEARNING Sujit Pal @ Elsevier Labs
  20. 20. TRANSFER LEARNING & MACHINE DEPENDENCIES
  21. 21. QUALITY IS DEPENDENT ON SOURCES
  22. 22. PROVENANCE
  23. 23. SOURCES AREN’T JUST DATA Lauruhn, Michael, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  24. 24. A MORE TRANSPARENT SUPPLY CHAIN Groth, Paul, "Transparency and Reliability in the Data Supply Chain," Internet Computing, IEEE, vol.17, no.2, pp.69,71, March- April 2013 doi: 10.1109/MIC.2013.41
  25. 25. 1) https://www.elsevier.com/connect/how-elsevier-is-breaking-down-barriers- to-reproducibility REPRODUCIBILITY AS QUALITY?
  26. 26. QUALITY AS MORE AUTOMATION
  27. 27. http://blog.booleanbiotech.com/genetic_engine ering_pipeline_python.html “There are some catches too of course, especially since it's very early in the evolution of these tools. If it were the internet it would be around 1994”
  28. 28. RESEARCH QUESTIONS 1. Does basic lab-based biomedical research reuse and assemble existing methods, or is it primarily focused on the development of new techniques? 2. What existing methods are covered by robotic labs?
  29. 29. RESULTS
  30. 30. DIRECTION: GROUNDING KNOWLEDGE GRAPHS IN ACTIONS http://www.researchobject.orghttps://smart-api.info
  31. 31. CONCLUSIONS ▸Knowledge Graphs are crucial for overcoming information overload in research ▸Research has less redundancy than other domains ▸less resources and high diversity ▸challenge: effectively use general knowledge in these domains ▸Quality is central ▸turn towards processes and reproducibility as foundations

×