Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve all Zika authors with 10 or more publications on https://tools.wmflabs.org/scholia/topic/Q202864/missing #1102

Closed
Daniel-Mietchen opened this issue Jan 2, 2019 · 14 comments

Comments

@Daniel-Mietchen
Copy link
Owner

commented Jan 2, 2019

Currently, the query

SELECT
  # Number of works with the author
  ?count

  # Author as a string
  ?author

  # Build URL to the Author disambiguator tool
  (CONCAT(
      'https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=',
      ENCODE_FOR_URI(?author)) AS ?resolver_url)
WITH {
  SELECT DISTINCT ?work WHERE {
    ?work wdt:P921 / (wdt:P361+ | wdt:P1269+ | (wdt:P31* / wdt:P279*) ) wd:Q202864 .
  }
} AS %works
WITH {
  SELECT
    (COUNT(?work) AS ?count)
    ?author
  WHERE {
    INCLUDE %works
    ?work wdt:P2093 ?author .
  }
  GROUP BY ?author
} AS %result
WHERE {
  INCLUDE %result
  FILTER ( ?count > 9 )

  # Label the result
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,jp,nl,no,ru,sv,zh". }
}
ORDER BY DESC(?count)
LIMIT 500

yields 37 results.

I tried to set up a Listeria list for this (with a threshold of 5) at https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus/Listeria/Missing_authors but did not get it to work properly.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

Top of the list is https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=Michael%20McCarthy with 22 publications, and since Michael McCarthy is a very common name, I'm filtering it down to just items about Zika: https://tools.wmflabs.org/author-disambiguator/?name=Michael+McCarthy&doit=Look+for+author&filter=wdt%3AP921+wd%3AQ202864 .
On that basis, I have set up a new item (could not find any of the existing ones with variants of that name that would seem to indicate the same person) and am now linking those 22 papers to it: https://tools.wmflabs.org/quickstatements/#/batch/6723 .

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

The list also contains 12 papers by Manon Vouga [Q28051248], for whom
https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=Manon%20Vouga
yields 16 papers that are now being matched to her item: https://tools.wmflabs.org/quickstatements/#/batch/6725 .

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

Doing all 37 author strings through comments in this thread may become unwieldy, so I may start to set up individual tickets for each of them.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

I am running an additional batch for "M Vouga" papers at https://tools.wmflabs.org/quickstatements/#/batch/6730 .

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

One way to reduce the number of P2093 statements in the corpus is to run SourceMD over items that have P2093 statements, in the hope that the tool might find the affected papers in some ORCID records. I have done this multiple times in the past, but since both the corpus and the ORCID records are constantly evolving, now might be a good time to do it again.

For simplicity,I will use this query that I am exploring for a Listeria list.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 2, 2019

That batch has finished and resulted in 92 edits but no effects on the remaining 33 strings with 10 or more Zika publications.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jan 11, 2019

The query from the first comment in this thread now has 19 results.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented May 29, 2019

The query results are down to 10, and all of these currently have 10 papers.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jun 29, 2019

There are 5 author name strings left with more than 9 papers (all of them have 10).

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Jun 29, 2019

No author name strings left with 10 or more papers, but 22 with 9.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Aug 26, 2019

By now, we have no author name strings left with 8 or more occurrences, and 55 with 7.

@Daniel-Mietchen

This comment has been minimized.

Copy link
Owner Author

commented Aug 26, 2019

I also checked https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus/Listeria/Missing_authors , which shows basically the same information (I have some batches running right now), so I think this ticket is ripe to be closed, knowing that curation will need to go on as new publications are being indexed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.