Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upDatacite DOIs partly broken #2018
Comments
This comment has been minimized.
This comment has been minimized.
DataCite have confirmed verbally that they'll fix this. |
This comment has been minimized.
This comment has been minimized.
How can I tell which DOIs are affected? I've got a couple that resolve just fine via the doi.org URI, but not with Zotero. |
This comment has been minimized.
This comment has been minimized.
@Jmuccigr : You can't tell from looking at the DOI: you need to know if the database where the corresponding metadata are storied is at DataCite or CrossRef. If the former, Zotero import won't work until the bug mentioned above is fixed at their end. |
This comment has been minimized.
This comment has been minimized.
Hi, just a note to thank for the information on what's going wrong here and confirm that the fix will be hailed with much enthusiasm. The Pangloss Collection, an open archive of endangered languages, now has DOIs for all resources, with serious metadata (code available here), and seamless integration with Zotero would be super-cool. (Here's one such DOI, if anyone wants to test: https://doi.org/10.24397/pangloss-0005709 ) |
This comment has been minimized.
This comment has been minimized.
If you use https://search.datacite.org/works/{doi} in your browser and it shows a search result, then it's a datacite DOI. So for the ones above: |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
Would it be at all possible to follow up DataCite? We'd really like to use Zenodo DOIs for import to Zotero... https://forums.zotero.org/discussion/comment/353170 |
This comment has been minimized.
This comment has been minimized.
DataCite Technical Director Martin Fenner kindly answered an e-mail query sent to him on this topic saying: |
This comment has been minimized.
This comment has been minimized.
@alexis-michaud thanks for posting this. Because we are dealing with a number of other important issues at DataCite right now, I am reluctant to give a specific timeline for a fix. But we will aim to resolve this in April. |
This comment has been minimized.
This comment has been minimized.
@adam3smith: This seems to be causing a freeze in PDF metadata retrieval. We should obviously fix that in Zotero itself, but as a quick hack that we can push out right away, do you think we can change translators/DOI Content Negotiation.js Line 87 in ec2eaef "agency": "DataCite" ? (Trying to parse the text as JSON and access the property would be cleaner but probably not necessary. Obviously checking the response Content-Type would be best but we can't do that currently.)
|
This comment has been minimized.
This comment has been minimized.
Unfortunately "agency" is not yet available for all DOIs in our API, so would probably also not reliably work. We have some other fields that are probably unique (e.g. ClientId, ProviderId, state), but schema and agency are really the two fields that make the most sense. Until the end of the year we will be adding many DOIs from other registration agencies to our index, and we have already added 8.5 million Crossref DOIs. As you can see in https://api.datacite.org/dois?client-id=crossref.citations, all these Crossref DOIs use schema Obviously you fetch metadata via DOI content negotiation so should never run into the Crossref DOIs in our index. One question I have is why you don't directly ask for Citeproc JSON from DataCite in the DOI content negotiation, the implementation is much better than a few years ago. To summarize, what is the best way to fix this? Should DataCite fix the inclusion of the schema "http://datacite.org/schema" as quickly as possible, but definitely before the end of April, or do you want to switch to another strategy to detect a DataCite DOI, e.g. via |
This comment has been minimized.
This comment has been minimized.
I don't remember the details, but we experimented with this in January 2019 and found that we got significantly better import using the Datacite JSON |
#2018 (comment) This is a temporary fix until the DataCite JSON includes "http://datacite.org/schema" again or we can check the Content-Type response.
``` curl -L -v \ -H "Accept: application/vnd.datacite.datacite+json, application/vnd.crossref.unixref+xml, application/vnd.citationstyles.csl+json" \ 'https://doi.org/10.7916/d8959hr1' ``` The JSON for https://doi.org/10.7916/d8959hr1 didn't have many of the fields expected by the translator, so I made them optional. The new test fails because of #2018.
This comment has been minimized.
This comment has been minimized.
OK, I put in a temporary fix for Retrieve Metadata for PDF. The DOI Content Negotiation translator is now looking for the strings Unfortunately that wasn't sufficient, because at least for 10.7916/D8959HR1, the DataCite JSON translator then failed with For reference, here's the request we're making:
When I tried just removing But for now, with the above fixes, the DataCite JSON is working, at least for this DOI. |
adam3smith commentedOct 10, 2019
I have not investigated how widely that's the case, but for at least some Datacite DOIs, the
schemaVersion
has been removed and thus our detect (and hence import) fails. I've reported this upstream at Datacite. I'd suggest we wait for a couple of days before changing the translator as the schema is by far the most elegant way to detect.