Permalink
Please sign in to comment.
Showing
with
30 additions
and 0 deletions.
- +24 −0 README.md
- +6 −0 download_wd_history.sh
@@ -0,0 +1,24 @@ | |||
SPARQL endpoint for Wikidata history | |||
==================================== | |||
|
|||
This repository provides a SPARQL endpoint for Wikidata history, allowing to do queries like "count the number of humans in Wikidata in 2015" or "how many contributors have added values for the sex or gender property". | |||
|
|||
Warning: This is a work in progress and is not ready yet. | |||
|
|||
|
|||
## User documentation | |||
|
|||
A public endpoint should be available soon. Here are some example of queries: | |||
|
|||
|
|||
|
|||
|
|||
## Developer documentation | |||
|
|||
To setup a working endpoint do: | |||
|
|||
* Compile the Java program `mvn package` | |||
* Download the Wikidata history dumps to a directory `mkdir dumps && cd dumps && bash ../download_wd_history.sh`. Warning: it requies around 600GB of disk. | |||
* Preprocess the dump to get all revision metadata and triples annotated with there insertions and deletions (takes a few days and all your CPU cores): `java -server -jar target/sparql-endpoint-0.1-SNAPSHOT.jar -preprocess` | |||
* Build database indexes: `java -server -jar target/sparql-endpoint-0.1-SNAPSHOT.jar -load`. You may use the `--wdt-only` argument to only load wdt: triples | |||
* Start the web server `java -server -classpath target/sparql-endpoint-0.1-SNAPSHOT.jar org.wikidata.history.web.Main` |
@@ -0,0 +1,6 @@ | |||
#!/usr/bin/env bash | |||
|
|||
curl https://dumps.wikimedia.org/wikidatawiki/latest/ | grep -Po "wikidatawiki/[0-9]+/wikidatawiki-[0-9]+-pages-meta-history[0-9]+\.xml-[p0-9]+\.bz2" | while read -r url ; do | |||
echo $url | |||
wget -c "https://dumps.wikimedia.org/$url" | |||
done |
0 comments on commit
52edbb9