Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create "missing" subpage for organization aspect #598

Open
Daniel-Mietchen opened this Issue Jan 2, 2019 · 8 comments

Comments

@Daniel-Mietchen
Copy link
Collaborator

Daniel-Mietchen commented Jan 2, 2019

as per #281

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

Daniel-Mietchen commented Jan 5, 2019

Here is a nugget for a good candidate query, which checks for author name strings that match the names of organization staff:

SELECT (COUNT(?paper) AS ?countPapers) ?person ?nameString {
  ?person wdt:P108 wd:Q317070 ;
          rdfs:label ?name .
  BIND(STR(?name) AS ?nameString)
  FILTER(LANG(?name)="en")
  ?paper wdt:P2093 ?nameString .
}
GROUP BY ?person ?nameString
ORDER BY DESC(?countPapers)

Hat tip to @zuphilip

@Daniel-Mietchen

This comment has been minimized.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

Daniel-Mietchen commented Jan 7, 2019

Another panel useful on an organization's missing page would be for publications where an author (P50) and an author name string (P2093) both have the same series ordinal (P1545).

Here is a demo page for that: https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Semi-disambiguated_UVa_authors .

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

Daniel-Mietchen commented Jan 7, 2019

Another panel could highlight "Stated As" (P1932) strings from publications already associated with the organization and identify other publications with those author name strings — demo at
https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Author_name_strings_matched_to_UVa_people_items_using_Stated_As .

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

Daniel-Mietchen commented Apr 11, 2019

Thanks. This is a good start but for institutions engaged in high-energy physics (with thousands of co-authors typically stated only with initials), this is not as useful as it could be - see https://tools.wmflabs.org/scholia/organization/Q213439/missing for an example.

So we need to filter a bit more. One option is to drop names stated with initials, as per the regex filtering in https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Author_name_strings_popular_on_publications_co-authored_by_UVa_people
or
https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Author_name_strings_matched_to_UVa_people_items_using_Stated_As .

Another option would be to filter more strongly on co-authorship, e.g. as per https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Authors_frequently_publishing_together_with_two_or_more_UVa_authors
or
https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Author_name_strings_that_are_on_multiple_papers_with_at_least_three_identical_co-authors,_at_least_one_of_which_is_a_UVa_person .

Yet another interesting option for prioritizing is the length of the author name string, e.g. as per
https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Long_author_name_strings_in_works_co-authored_by_UVa_people .
Longer strings tend to be less ambiguous, so resolving them first can then help resolve some of the more ambiguous strings by way of co-authorship.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

Daniel-Mietchen commented Apr 11, 2019

I also think we do not necessarily need to limit ourselves to just one or two panels on this page - I could imagine versions of all of the above queries being present, plus some equivalents for missing topics, publication dates, affiliations of co-authors and so on.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

Daniel-Mietchen commented Apr 11, 2019

Another way to address the high-energy physics issue would be to just sort affiliated people by number of unidentified co-authors and then link to the /missing page for that identified affiliate instead of linking to the author disambiguator directly.

Also, when linking to the disambiguator, the known co-author should be specified, e.g. as per https://tools.wmflabs.org/author-disambiguator/?name=J%C3%BCrgen+Popp&doit=Look+for+author&limit=500&filter=wdt%3AP50+wd%3AQ1707155 .

@Daniel-Mietchen

This comment has been minimized.

Copy link
Collaborator Author

Daniel-Mietchen commented Apr 15, 2019

Here is a variant that filters out strings starting with initials:

PREFIX organization:           <http://www.wikidata.org/entity/Q1269766>

SELECT

  # Number of works with the coauthor
  ?count

# Build URL to the Author disambiguator tool for a given author name string and a coauthor associated with the institution
  (CONCAT(
      "[https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=",
      ENCODE_FOR_URI(?coauthor ), "&filter=wdt%3AP50+wd%3A", ?qid , " Publications with at least one affiliated co-author and with author name string '''''", ?coauthor , "''''']") AS ?disambiguator1)
 
WHERE {
  SELECT DISTINCT ?coauthor (COUNT(DISTINCT ?work) as ?count) ?item (REPLACE(STR(?item), ".*Q", "Q") AS ?qid) WHERE {
    { ?item wdt:P108 / wdt:P361* organization: .}
    UNION
    { ?item wdt:P463 / wdt:P361* organization: .}
    UNION
    { ?item wdt:P1416 / wdt:P361* organization: .}

    ?work wdt:P50 ?item ; wdt:P2093 ?coauthor .
 FILTER(regex (?coauthor, "^(?=^[A-Z][a-z]{1,}.*)(?=.*[a-z]$).*$")).
 FILTER(!CONTAINS(LCASE(?coauthor), "."))

  }
  GROUP BY ?coauthor ?count ?item ?qid
  HAVING (?count > 4)
#           LIMIT 2000
}
ORDER BY DESC(?count) 
LIMIT 100

Regarding the use of PREFIX here, see also #431 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.