Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Scholia landing page, provide some overview stats about Wikidata and scholarly publications in it #336

Closed
Daniel-Mietchen opened this issue Apr 9, 2018 · 14 comments

Comments

@Daniel-Mietchen
Copy link
Collaborator

@Daniel-Mietchen Daniel-Mietchen commented Apr 9, 2018

e.g. number of triples in Wikidata

SELECT (count(*) as ?counts) WHERE {
  ?s ?p ?o .
  }

and some WikiCite-focused ones, e.g. as per this list

or some version of http://wikicite.org/statistics.html .

fnielsen added a commit that referenced this issue Apr 9, 2018
Now statistics on number of triples, DOIs and PMIDs.
@fnielsen fnielsen self-assigned this Apr 9, 2018
@fnielsen

This comment has been minimized.

Copy link
Owner

@fnielsen fnielsen commented Apr 9, 2018

@fnielsen fnielsen closed this Apr 9, 2018
@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

@Daniel-Mietchen Daniel-Mietchen commented Apr 9, 2018

I think adding a few more would be useful, e.g. total number of items and of scientific articles, and then a good selection of properties from the above list and/ or from https://www.wikidata.org/wiki/Template:Bibliographic_properties .

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

@Daniel-Mietchen Daniel-Mietchen commented Apr 11, 2018

Here is a query that gives a more comprehensive list:

SELECT ?count ?description
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] ?p [] . }
} AS %triples
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P50 []. }
} AS %authors
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P356 []. }
} AS %dois
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P496 []. }
} AS %orcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P698 []. }
} AS %pmids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P932 []. }
} AS %pmcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2093[]. }
} AS %authorstrings
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2860 [] . }
} AS %cites
WHERE {
  {
    INCLUDE %triples
    BIND("Total number of triples" AS ?description)
  }
  UNION
  {
    INCLUDE %pmids
    BIND("Items with a PubMed ID" AS ?description)
  }
  UNION
  {
    INCLUDE %pmcids
    BIND("Items with a PubMed Central ID" AS ?description)
  }
  UNION
  {
    INCLUDE %dois
    BIND("Items with a DOI" AS ?description)
  }
  UNION
  {
    INCLUDE %cites
    BIND("Citations" AS ?description)
  }
  UNION
  {
    INCLUDE %authors
    BIND("Links from items about works to items about their authors" AS ?description)
  }
  UNION
  {
    INCLUDE %authorstrings
    BIND("Author name strings on items about works" AS ?description)
  }
  UNION
  {
    INCLUDE %orcids
    BIND("Items about authors with an ORCID profile that has public content" AS ?description)
  }
}
ORDER BY DESC(?count)
 

Still missing:

  • total number of items
  • total number of works
  • probably some more, e.g. aXiv ID, taxon author, doctoral advisor, published in, affiliation/employer, field of work, educated at, ISSN
@fnielsen

This comment has been minimized.

Copy link
Owner

@fnielsen fnielsen commented Apr 12, 2018

Added with b8f8f6a and now running at https://tools.wmflabs.org/scholia/

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

@Daniel-Mietchen Daniel-Mietchen commented Apr 28, 2018

Here are some further ideas on what to include into these stats:

@Daniel-Mietchen Daniel-Mietchen added this to To do in Taxa May 5, 2018
@lucaswerkmeister

This comment has been minimized.

Copy link
Contributor

@lucaswerkmeister lucaswerkmeister commented May 18, 2018

Number of properties:

SELECT (COUNT(*) AS ?propertyCount) WHERE {
  ?property a wikibase:Property.
}

For the number of triples, you can also use ?s ?p ?o (subject predicate object) instead of [] ?p [] – equivalent but slightly more readable :)

Daniel-Mietchen added a commit that referenced this issue May 18, 2018
@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

@Daniel-Mietchen Daniel-Mietchen commented May 18, 2018

Thanks, @lucaswerkmeister — I've just included it in the above batch of additional stats.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

@Daniel-Mietchen Daniel-Mietchen commented May 19, 2018

The above patch caused display problems, so we reverted it. Here is the query again:

SELECT ?count ?description
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] ?p [] . }
} AS %triples
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { ?property a wikibase:Property.  }
} AS %properties
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P50 []. }
} AS %authors
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P69 [] . }
} AS %almamater
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P108 [] . }
} AS %employer
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P166 [] . }
} AS %award_received
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P212 [] . }
} AS %isbn13
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P225 []. }
} AS %taxa
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P234 []. }
} AS %inchi
WITH {
  SELECT (COUNT(DISTINCT ?serials) AS ?count) WHERE { ?serials wdt:P236 [] . }
} AS %issn
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P356 []. }
} AS %dois
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P496 []. }
} AS %orcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P625 []. }
} AS %geoloc
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P638 [] . }
} AS %pdb
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P686 [] . }
} AS %gene
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P698 []. }
} AS %pmids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P699 [] . }
} AS %disease
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P859 [] . }
} AS %sponsor
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P818 [] . }
} AS %arxivID
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P921 []. }
} AS %topics
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P932 []. }
} AS %pmcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P1416 [] . }
} AS %affiliation
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2093 []. }
} AS %authorstrings
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2427 [] . }
} AS %GRID
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2860 [] . }
} AS %cites
WHERE {
  {
    INCLUDE %triples
    BIND("Total number of triples" AS ?description)
  }
  UNION
  {
    INCLUDE %properties
    BIND("Total number of properties" AS ?description)
  }
  UNION
  {
    INCLUDE %pmids
    BIND("Items with a PubMed ID" AS ?description)
  }
  UNION
  {
    INCLUDE %pmcids
    BIND("Items with a PubMed Central ID" AS ?description)
  }
  UNION
  {
    INCLUDE %dois
    BIND("Items with a Digital Object Identifier (DOI)" AS ?description)
  }
  UNION
  {
    INCLUDE %cites
    BIND("Citations" AS ?description)
  }
  UNION
  {
    INCLUDE %authors
    BIND("Links from items about works to items about their authors" AS ?description)
  }
  UNION
  {
    INCLUDE %authorstrings
    BIND("Author name strings on items about works" AS ?description)
  }
  UNION
  {
    INCLUDE %orcids
    BIND("Items about authors with an ORCID profile that has public content" AS ?description)
  }
  UNION
  {
    INCLUDE %taxa
    BIND("Items with a taxon name" AS ?description)
  }
  UNION
  {
    INCLUDE %geoloc
    BIND("Items with a geolocation" AS ?description)
  }
  UNION
  {
    INCLUDE %topics
    BIND("Links from items about works to items about their main subjects" AS ?description)
  }
  UNION
  {
    INCLUDE %inchi
    BIND("Items with an International Chemical Identifier (InChI)" AS ?description)
  }
  UNION
  {
    INCLUDE %isbn13
    BIND("Items with a 13-digit International Standard Book Number (ISBN 13)" AS ?description)
  }
  UNION
  {
    INCLUDE %award_received
    BIND("Links from items about people or others to an award they have received" AS ?description)
  }
  UNION
  {
    INCLUDE %affiliation
    BIND("Links from items about people to items about groups they are affiliated with" AS ?description)
  }
  UNION
  {
    INCLUDE %employer
    BIND("Links from items about people to items about their employer" AS ?description)
  }
  UNION
  {
    INCLUDE %almamater
    BIND("Links from items about people to items about the educational establishments they attended" AS ?description)
  }
  UNION
  {
    INCLUDE %issn
    BIND("Items with an International Standard Serial Number (ISSN)" AS ?description)
  }
  UNION
  {
    INCLUDE %arxivID
    BIND("Items with an arxivID" AS ?description)
  }
  UNION
  {
    INCLUDE %GRID
    BIND("Items about institutions with an identifier from the Global Research Identifier Database (GRID ID)" AS ?description)
  }
  UNION
  {
    INCLUDE %sponsor
    BIND("Links from items about anything to items about corresponding sponsors" AS ?description)
  }
  UNION
  {
    INCLUDE %disease
    BIND("Items indexed in the Disease Ontology" AS ?description)
  }
  UNION
  {
    INCLUDE %gene
    BIND("Items indexed in the Gene Ontology" AS ?description)
  }
  UNION
  {
    INCLUDE %pdb
    BIND("Protein structures indexed in the Protein Data Bank" AS ?description)
  }
}
ORDER BY DESC(?count)

Pinging @lucaswerkmeister

@lucaswerkmeister

This comment has been minimized.

Copy link
Contributor

@lucaswerkmeister lucaswerkmeister commented May 20, 2018

What kinds of display problems did it cause?

@fnielsen

This comment has been minimized.

Copy link
Owner

@fnielsen fnielsen commented May 21, 2018

There was no response from WDQS, probably because the query was too lone. Perhaps the getJSON can be modified to a POST.

@lucaswerkmeister

This comment has been minimized.

Copy link
Contributor

@lucaswerkmeister lucaswerkmeister commented May 24, 2018

WDQS already retries using POST if the GET request fails due to being too long. If I run the query in @Daniel-Mietchen’s comment on WDQS, it works both on index.html and embed.html.

@fnielsen

This comment has been minimized.

Copy link
Owner

@fnielsen fnielsen commented Jul 3, 2018

"Items about authors with an ORCID profile that has public content" Why "that has public content"?

@fnielsen

This comment has been minimized.

Copy link
Owner

@fnielsen fnielsen commented Jul 3, 2018

"Items with a 13-digit International Standard Book Number (ISBN 13)" This should be rephrased as there might be items with multiple ISBN (there is, especially Springer volume).

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

@Daniel-Mietchen Daniel-Mietchen commented Nov 29, 2019

I have reworked the query, as per Daniel-Mietchen/ideas#1022 (comment) .

Daniel-Mietchen added a commit that referenced this issue Nov 29, 2019
Daniel-Mietchen added a commit that referenced this issue Nov 30, 2019
more stats on homepage, as per #336
Taxa automation moved this from To do to Done Nov 30, 2019
@Daniel-Mietchen Daniel-Mietchen added this to To do in Meta via automation Nov 30, 2019
@Daniel-Mietchen Daniel-Mietchen moved this from To do to Done in Meta Nov 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.