Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cognitive data

61 views

Published on

Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/

Published in: Data & Analytics
  • Be the first to comment

Cognitive data

  1. 1. Stockholm, November 11, 2018 KTH Royal Institute of Technology From Linked Data to Cognitive Data
  2. 2. --- VERTRAULICH --- Zuse Z3: the beginning of Computing – close to the hardware Foto: Konrad Zuse Internet Archiv/Deutsches Museum/DFG
  3. 3. © Fraunhofer
  4. 4. --- VERTRAULICH --- We can make things more intuitive Picture: The illustrated recipes of lucy eldridge http://thefoxisblack.com/2013/ 07/18/the-illustrated-recipes- of-lucy-eldridge/
  5. 5. Computing more inuitive: procedural programming
  6. 6. Sören Auer 6
  7. 7. Computing more inuitive: OO programming
  8. 8. Sören Auer 8
  9. 9. Sören Auer 9 Computing even more inuitive: with cognitive data?!
  10. 10. Page 10 Machine Learning and Big Data http://www.spacemachine.net/views/2016/3/datasets-over-algorithms  AI is not just the next hype after Big Data, Big Data is the reason why we have AI!
  11. 11. Page 11 Source: Gesellschaft für Informatik The Three “V” of Big Data - Variety often Neglected
  12. 12. Linked Data Principles Addressing the neglected third V (Variety) 1. Use URIs to identify the “things” in your data 2. Use http:// URIs so people (and machines) can look them up on the web 3. When a URI is looked up, return a description of the thing in the W3C Resource Description Format (RDF) 4. Include links to related things http://www.w3.org/DesignIssues/LinkedData.html 12 [1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
  13. 13. Page 13 1. Graph based RDF data model consisting of S-P-O statements (facts) RDF & Linked Data in a Nutshell OSLCFest dbpedia:Stockholm 05.11.2018 KTH conf:organizes conf:starts conf:takesPlaceIn 2. Serialised as RDF Triples: KTH conf:organizes OSLCFest . OSLCFest conf:starts “2018-11-05”^^xsd:date . OSLCFest conf:takesPlaceAt dbpedia:Stockholm . 3. Publication under URL in Web, Intranet, Extranet Subject Predicate Object
  14. 14. Page 14 Creating Knowledge Graphs with RDF Linked Data located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label
  15. 15. Page 15 Graph consists of:  Resources (identified via URIs)  Literals: data values with data type (URI) or language (multilinguality integrated)  Attributes of resources are also URI-identified (from vocabularies) Various data sources and vocabularies can be arbitrarily mixed and meshed URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/ RDF Data Model (a bit more technical) gn:locatedIn rdfs:label dbo:industry ex:headquarters foaf:namedbp:DHL_International_GmbH dbp:Post_Tower "162.5"^^xsd:decimal dbp:Bonn dbp:Logistics "Logistik"@de "DHL International GmbH"^^xsd:string ex:height "物流"@zh rdfs:label rdf:value unit:Meter ex:unit
  16. 16. Vocabularies – Breaking the mold! • Semantic data virtualization allows for continuous expansion and enhancement of data and metadata across data sources without loosing the overall perspective Relational data models 1:1 Relation between Data Model und Application Graph based data model Subject Predicate Object / Subject Predicate Object / Subject 1:n Relation between Data Model and Application
  17. 17. RDF mediates between different Data Models & bridges between Conceptual and Operational Layers Id Title Screen 5624 SmartTV 104cm 5627 Tablet 21cm Prod:5624 rdf:type Electronics Prod:5624 rdfs:label “SmartTV” Prod:5624 hasScreenSize “104”^^unit:cm ... Electronics Vehicle Car Bus Truck Vehicle rdf:type owl:Thing Car rdfs:subClassOf Vehicle Bus rdfs:subClassOf Vehicle ... Tabular/Relational Data Taxonomic/Tree Data Logical Axioms / Schema Male rdfs:subClassOf Human Female rdfs:subClassOf Human Male owl:disjointWith Female ... Sören Auer 17
  18. 18. 18 Engineering Manufactur. Logistics Marketing. . . Parts of data are being curated, duplicated, annotated and simply changed over time, making reconciliation and interpretation a challenge Perspectives on data turn into silos
  19. 19. Engineering Manufactur. Logistics Marketing 19 Integrate Using RDF & Vocabularies
  20. 20. Page 20 The Trinity of Semantic Integration Knowledge Graphs • Complex fabric of concepts & relationships • Focus on heterogenous, multi-domain knowledge representation Data Spaces • Community of organizations agreeing on standards for data access/ security/ semantics/ governance/ licenses • Focus on data sharing & exchange Semantic Data Lakes • Storage facility for enterprise/research data • Use Big Data (HDFS) management • Focus on scalable data access Use in a single organization Intra-organizational use
  21. 21. Page 21 • Fabric of concept, class, property, relationships, entity descriptions • Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL) • Holistic knowledge (multi-domain, source, granularity): • instance data (ground truth), • open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models), • derived, aggregated data, • schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation licensing) • comprehensive taxonomies to categorize entities • links between internal and external data • mappings to data stored in other systems and databases Knowledge Graphs – A definition Smart Data for Machine Learning
  22. 22. Page 22
  23. 23. Page 23 Search Engine Optimization & Web-Commerce  Schema.org used by >20% of Web sites  Major search engines exploit semantic descriptions Pharma, Lifesciences  Mature, comprehensive vocabularies and ontologies  Billions of disease, drug, clinical trial descriptions Digital Libraries  Many established vocabularies (DublinCore, FRBR, EDM)  Millions of aggregated from thousands of memory institutions in Europeana, German Digital Library Emerging Knowledge Graphs & Data Spaces
  24. 24. ENTERPRISE DATA INTEGRATION WITH A SEMANTIC DATA LAKE Example:
  25. 25. © eccenca GmbH 2016 The future of data management is semantic! The Problem today Solution Tomorrow App. 1 App. 2 App. 3 App. 1 App. 2 App. 3 Data Access limited to connected source Exploding cost of ETL Full Access to All Data Lean Architecture Great Synergies in data lifting
  26. 26. Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Corporate Memory Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure High Level Architecture Corporate Memory
  27. 27. Integration via Knowledge Graph and Semantic Data Models 27 Knowledge Graph (RDF) XML EDI CSV iDoc RDF JSON XML EDI CSV iDoc RDF JSON Supplier OnBoarding cost/time reduction due to rich and flexible pivot format OEMSupplier
  28. 28. © eccenca GmbH 2016 lift ERP sync OEM CMEM Supplier CMEM ERP CMEM-SYNC tabulate subscribe Turning Strings into Things for Graph Synchronization CMEM-SYNC
  29. 29. Ingestion / Cataloging • Cataloging of datasets and vocabularies • Rich meta data model • Automatic profiling of datasets • DataLake (HDFS) integration • Extraction of metadata • Continuous monitoring for new versions and structural changes 29 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  30. 30. Manage Datasets 30
  31. 31. Profiling Data 31
  32. 32. Mapping • Sophisticated mapping management • Mapping towards semantic vocabularies (lifting) • Self documentation of data (data dictionary) • Normalization of data • Mapping suggestions • Mapping reuse based on data profiling • Advanced mapping suggestions • machine learning • data fingerprinting 32 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  33. 33. Discovery • Calculation of dataset relatedness / similarity • Visual exploration of data neighborhood • Similarity measure based on profiling and mapping • Similarity measure based on data fingerprinting 33 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  34. 34. Linking • Linking based on expressive rule trees • Interactive machine learning of linkage rules • Continuous integration of gold standard for quality assurance • Data fusion support 34 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  35. 35. © eccenca GmbH 2016 Create Declarative Matching Rules Create Context-aware deterministic rules to match pairs of records, supported by machine learning. © eccenca GmbH 2016
  36. 36. © Fraunhofer Industrial/International Data Space Establishing Data Value Chains
  37. 37. © Fraunhofer 37 Digitisation of Industry Digitisation Enables Data Driven Business Models … for Example Precision Farming Image sources: wiwo, traction-magazin.de. Quelle: Beecham Research Ltd. (2014). “Precision Farming” Value Creation in the “Ecosystem” “Digital Farming Eco- system” Machine Producer Seed Provider Farmers Wholesale Technology Provider Weather Service
  38. 38. © Fraunhofer 38 Goal and Architecture of the Industrial Data Space Der Industrial Data Space aims at blueprinting a “Network of Trusted Data”. Secure Data exchange Trustworthiness Certified Members Decentralisation Federated Architecture Sovereignty over Data and Services Governance Common Rules of the Game Scalability Network Effects Openness Neutral and User-Driven Ecosystem Platform and Services
  39. 39. © Fraunhofer 39 Goal and Architecture of the Industrial Data Space Component Reference Architecture
  40. 40. © Fraunhofer www.industrialdataspace.or g // 40 LOCATION IN THE CONTEXT OF “INDUSTRY 4.0” FOCUS ON DATA Retail 4.0 Bank 4.0Insurance 4.0 … Industrie 4.0 Focus on Manufacturing Industry Smart Services Transfer and Networks Real time systems Industrial Data Space Focus on Data Data …
  41. 41. © Fraunhofer 41 Goal and Architecture of the Industrial Data Space The Industrial Data Space Connects the Internet of Things and Smart Services.
  42. 42. Integration Millions of Metadata Records from >2000 Memory Institutions for the German Digital Library A Cultural Heritage Data Space
  43. 43. --- VERTRAULICH --- 43 Dataspace with • 2000 memory institutions in Germany alone • Common semantic data model: EDM • Common data governance: CC0 • Common access scheme: OAI-PMH
  44. 44. --- VERTRAULICH ---
  45. 45. --- VERTRAULICH ---
  46. 46. Conclusion
  47. 47. Page 47 Hybrid AI – combination of smart data (knowledge graphs) and smart analytics Distributed semantic technologies – knowledge representation using vocabularies, ontologies Question Answering • Open Question Answering architecture – flexible, knowledge-based integration architecture for QA components and pipelines • Dialogue Systems - combination of language models and goal-driven question answering Integration with Crowdsourcing Knowlege Graphs, Semantic Data Lakes Robotics – usage of semantics for actuation Agile Interoperability – leveraging community driven vocabulary development Cognitive Data challenges where we can make a difference  Systematic Enterprise Linked Data Framework (GDPR is a driver)
  48. 48. https://de.linkedin.com/in/soerenauer https://twitter.com/soerenauer https://www.xing.com/profile/Soeren_Auer http://www.researchgate.net/profile/Soeren_Auer TIB & Leibniz University of Hannover auer@tib.eu Prof. Dr. Sören Auer

×