Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCreate "missing" subpage for organization aspect #598
Comments
Daniel-Mietchen
added
aspects
missing-data
labels
Jan 2, 2019
Daniel-Mietchen
added this to To do
in Organizations
via automation
Jan 2, 2019
Daniel-Mietchen
added this to To do
in Missing
via automation
Jan 2, 2019
Daniel-Mietchen
referenced this issue
Jan 2, 2019
Closed
Resolve David Safronetz' Zika papers on Wikidata #1105
This comment has been minimized.
This comment has been minimized.
Here is a nugget for a good candidate query, which checks for author name strings that match the names of organization staff: SELECT (COUNT(?paper) AS ?countPapers) ?person ?nameString {
?person wdt:P108 wd:Q317070 ;
rdfs:label ?name .
BIND(STR(?name) AS ?nameString)
FILTER(LANG(?name)="en")
?paper wdt:P2093 ?nameString .
}
GROUP BY ?person ?nameString
ORDER BY DESC(?countPapers)
|
This comment has been minimized.
This comment has been minimized.
A demo page based on that query is up at https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Author_name_strings_popular_on_publications_co-authored_by_UVa_people . |
This comment has been minimized.
This comment has been minimized.
Another panel useful on an organization's missing page would be for publications where an author (P50) and an author name string (P2093) both have the same series ordinal (P1545). Here is a demo page for that: https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Semi-disambiguated_UVa_authors . |
This comment has been minimized.
This comment has been minimized.
Another panel could highlight "Stated As" (P1932) strings from publications already associated with the organization and identify other publications with those author name strings — demo at |
Daniel-Mietchen
added
P50-author
P2093-author-name-string
P1545-series-ordinal
P1932-stated-as
labels
Jan 7, 2019
fnielsen
added a commit
that referenced
this issue
Apr 10, 2019
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I also think we do not necessarily need to limit ourselves to just one or two panels on this page - I could imagine versions of all of the above queries being present, plus some equivalents for missing topics, publication dates, affiliations of co-authors and so on. |
This comment has been minimized.
This comment has been minimized.
Another way to address the high-energy physics issue would be to just sort affiliated people by number of unidentified co-authors and then link to the /missing page for that identified affiliate instead of linking to the author disambiguator directly. Also, when linking to the disambiguator, the known co-author should be specified, e.g. as per https://tools.wmflabs.org/author-disambiguator/?name=J%C3%BCrgen+Popp&doit=Look+for+author&limit=500&filter=wdt%3AP50+wd%3AQ1707155 . |
This comment has been minimized.
This comment has been minimized.
Here is a variant that filters out strings starting with initials: PREFIX organization: <http://www.wikidata.org/entity/Q1269766>
SELECT
# Number of works with the coauthor
?count
# Build URL to the Author disambiguator tool for a given author name string and a coauthor associated with the institution
(CONCAT(
"[https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=",
ENCODE_FOR_URI(?coauthor ), "&filter=wdt%3AP50+wd%3A", ?qid , " Publications with at least one affiliated co-author and with author name string '''''", ?coauthor , "''''']") AS ?disambiguator1)
WHERE {
SELECT DISTINCT ?coauthor (COUNT(DISTINCT ?work) as ?count) ?item (REPLACE(STR(?item), ".*Q", "Q") AS ?qid) WHERE {
{ ?item wdt:P108 / wdt:P361* organization: .}
UNION
{ ?item wdt:P463 / wdt:P361* organization: .}
UNION
{ ?item wdt:P1416 / wdt:P361* organization: .}
?work wdt:P50 ?item ; wdt:P2093 ?coauthor .
FILTER(regex (?coauthor, "^(?=^[A-Z][a-z]{1,}.*)(?=.*[a-z]$).*$")).
FILTER(!CONTAINS(LCASE(?coauthor), "."))
}
GROUP BY ?coauthor ?count ?item ?qid
HAVING (?count > 4)
# LIMIT 2000
}
ORDER BY DESC(?count)
LIMIT 100
Regarding the use of PREFIX here, see also #431 . |
Daniel-Mietchen commentedJan 2, 2019
as per #281