Permalink
Browse files

Updates the README

  • Loading branch information...
Tpt
Tpt committed Jan 23, 2019
1 parent cb1b283 commit 46c7dc69d045a777c18d3cf2c05123b39e9d2bc6
Showing with 3 additions and 51 deletions.
  1. +3 −51 README.md
@@ -3,67 +3,19 @@ SPARQL endpoint for Wikidata history

This repository provides a SPARQL endpoint for Wikidata history, allowing to do queries like "count the number of humans in Wikidata in 2015" or "how many contributors have added values for the sex or gender property".

Warning: This is a work in progress and is not ready yet.
A [documentation page is available on Wikidata.org](https://www.wikidata.org/wiki/Wikidata:History_Query_Service).

[![Build Status](https://travis-ci.org/Tpt/wikidata-sparql-history.svg?branch=master)](https://travis-ci.org/Tpt/wikidata-sparql-history)


## User documentation

A public endpoint should be available soon. Here are some example of queries:

Number of humans in Wikidata in February 2nd, 2015 at midnight.
```sparql
SELECT (COUNT(?item) AS ?count) WHERE {
?rev schema:dateCreated "2015-02-02T00:00:00Z"^^xsd:dateTime ;
hist:globalState ?state .
GRAPH ?state {
?item wdt:P31 wd:Q5
}
}
```

Number of contributors having changed sex or gender values:
```sparql
SELECT (COUNT(?user) AS ?count) WHERE {
# this is going to only set in ?addOrDel the graphs where a value of wd:P21 is added or removed
GRAPH ?addOrDel {
?item wdt:P21 ?value .
}
?rev hist:additions|hist:deletions ?addOrDel ;
schema:author ?user .
}
```

Statistics on the number of main snak additions by property for user Tpt:
```sparql
SELECT ?prop (COUNT(?revision) AS ?c) WHERE {
?revision schema:author "Tpt" ;
hist:additions ?additionsGraph .
GRAPH ?additionsGraph {
?topic ?prop ?o .
}
} GROUP BY ?prop ORDER BY DESC(COUNT(?revision))
```

These queries assumes the following prefixes:
```sparql
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
```


## Developer documentation

To setup a working endpoint do:

* Compile the Java program `mvn package`
* Download the Wikidata history dumps to a directory `mkdir dumps && cd dumps && bash ../download_wd_history.sh`. Warning: it requies around 600GB of disk.
* Preprocess the dump to get all revision metadata and triples annotated with there insertions and deletions (takes a few days and all your CPU cores): `java -server -jar target/sparql-endpoint-0.1-SNAPSHOT.jar -preprocess`
* Build database indexes: `java -server -jar target/sparql-endpoint-0.1-SNAPSHOT.jar -load`. You may use the `--wdt-only` argument to only load wdt: triples
* Preprocess the dump to get all revision metadata and triples annotated with their insertions and deletions (takes a few days and all your CPU cores): `java -server -jar target/sparql-endpoint-0.1-SNAPSHOT.jar -preprocess`
* Build database indexes: `java -server -jar target/sparql-endpoint-0.1-SNAPSHOT.jar -load`.
* Start the web server `java -server -classpath target/sparql-endpoint-0.1-SNAPSHOT.jar org.wikidata.history.web.Main`

## License

0 comments on commit 46c7dc6

Please sign in to comment.