Datacite DOIs partly broken #2018

adam3smith · 2019-10-10T16:04:05Z

I have not investigated how widely that's the case, but for at least some Datacite DOIs, the schemaVersion has been removed and thus our detect (and hence import) fails. I've reported this upstream at Datacite. I'd suggest we wait for a couple of days before changing the translator as the schema is by far the most elegant way to detect.

adam3smith · 2019-10-21T16:01:51Z

DataCite have confirmed verbally that they'll fix this.

Jmuccigr · 2019-12-18T21:52:00Z

How can I tell which DOIs are affected? I've got a couple that resolve just fine via the doi.org URI, but not with Zotero.

alexis-michaud · 2020-01-22T18:56:23Z

How can I tell which DOIs are affected? I've got a couple that resolve just fine via the doi.org URI, but not with Zotero.

@Jmuccigr : You can't tell from looking at the DOI: you need to know if the database where the corresponding metadata are storied is at DataCite or CrossRef. If the former, Zotero import won't work until the bug mentioned above is fixed at their end.

alexis-michaud · 2020-01-22T18:57:03Z

Hi, just a note to thank for the information on what's going wrong here and confirm that the fix will be hailed with much enthusiasm. The Pangloss Collection, an open archive of endangered languages, now has DOIs for all resources, with serious metadata (code available here), and seamless integration with Zotero would be super-cool.

(Here's one such DOI, if anyone wants to test: https://doi.org/10.24397/pangloss-0005709 )

adam3smith · 2020-01-29T15:58:16Z

How can I tell which DOIs are affected?

If you use https://search.datacite.org/works/{doi} in your browser and it shows a search result, then it's a datacite DOI. So for the ones above:
https://search.datacite.org/works/10.24397/pangloss-0005709

alexis-michaud · 2020-01-29T15:59:34Z

👍

bjohas · 2020-04-11T20:20:13Z

Would it be at all possible to follow up DataCite? We'd really like to use Zenodo DOIs for import to Zotero... https://forums.zotero.org/discussion/comment/353170

alexis-michaud · 2020-04-12T08:08:03Z

DataCite Technical Director Martin Fenner kindly answered an e-mail query sent to him on this topic saying:
"this is an open bug that we will hopefully fix soon enough"
The message is from April 6th, 2020.
We too look forward very much to the fix.

mfenner · 2020-04-12T09:48:23Z

@alexis-michaud thanks for posting this. Because we are dealing with a number of other important issues at DataCite right now, I am reluctant to give a specific timeline for a fix. But we will aim to resolve this in April.

dstillman · 2020-04-20T11:29:28Z

@adam3smith: This seems to be causing a freeze in PDF metadata retrieval. We should obviously fix that in Zotero itself, but as a quick hack that we can push out right away, do you think we can change

translators/DOI Content Negotiation.js

Line 87 in ec2eaef

else if (text.includes("http://datacite.org/schema")) {

to just look for "agency": "DataCite"? (Trying to parse the text as JSON and access the property would be cleaner but probably not necessary. Obviously checking the response Content-Type would be best but we can't do that currently.)

mfenner · 2020-04-20T14:29:06Z

Unfortunately "agency" is not yet available for all DOIs in our API, so would probably also not reliably work. We have some other fields that are probably unique (e.g. ClientId, ProviderId, state), but schema and agency are really the two fields that make the most sense.

Until the end of the year we will be adding many DOIs from other registration agencies to our index, and we have already added 8.5 million Crossref DOIs. As you can see in https://api.datacite.org/dois?client-id=crossref.citations, all these Crossref DOIs use schema http://datacite.org/schema/kernel-4, as we convert the Crossref metadata into DataCite XML. So thinking about it a bit more, agency is the best field to determine whether a DOI from our APIs is a DataCite DOI.

Obviously you fetch metadata via DOI content negotiation so should never run into the Crossref DOIs in our index. One question I have is why you don't directly ask for Citeproc JSON from DataCite in the DOI content negotiation, the implementation is much better than a few years ago.

To summarize, what is the best way to fix this? Should DataCite fix the inclusion of the schema "http://datacite.org/schema" as quickly as possible, but definitely before the end of April, or do you want to switch to another strategy to detect a DataCite DOI, e.g. via agency?

adam3smith · 2020-04-20T14:34:58Z

One question I have is why you don't directly ask for Citeproc JSON from DataCite in the DOI content negotiation, the implementation is much better than a few years ago.

I don't remember the details, but we experimented with this in January 2019 and found that we got significantly better import using the Datacite JSON


        DOI Content Negotiation: Look for other indicators of DataCite JSON

#2018 (comment) This is a temporary fix until the DataCite JSON includes "http://datacite.org/schema" again or we can check the Content-Type response.


        DataCite JSON: Make most fields optional

``` curl -L -v \ -H "Accept: application/vnd.datacite.datacite+json, application/vnd.crossref.unixref+xml, application/vnd.citationstyles.csl+json" \ 'https://doi.org/10.7916/d8959hr1' ``` The JSON for https://doi.org/10.7916/d8959hr1 didn't have many of the fields expected by the translator, so I made them optional. The new test fails because of #2018.

dstillman · 2020-04-20T20:58:58Z

OK, I put in a temporary fix for Retrieve Metadata for PDF. The DOI Content Negotiation translator is now looking for the strings "agency": "DataCite" or "providerId": in addition to http://datacite.org/schema.

Unfortunately that wasn't sufficient, because at least for 10.7916/D8959HR1, the DataCite JSON translator then failed with data.subjects is undefined looking for tags, and if I put that in a conditional it failed with data.formats is undefined looking for medium. Others would have failed after that, so I just made most of the fields optional.

For reference, here's the request we're making:

curl -v -L -H "Accept: application/vnd.datacite.datacite+json, application/vnd.crossref.unixref+xml, application/vnd.citationstyles.csl+json" 'https://doi.org/10.7916%2Fd8959hr1'

When I tried just removing application/vnd.datacite.datacite+json from the Accept header, DataCite returned a 200 with Content-Type: application/vnd.crossref.unixref+xml and no content (which seems wrong). It only worked if I removed Crossref XML from Accept as well, in which case it returned CSL JSON.

But for now, with the above fixes, the DataCite JSON is working, at least for this DOI.

zuphilip added the Upstream Dependency label Oct 10, 2019

dstillman added a commit that referenced this issue Apr 20, 2020

DOI Content Negotiation: Look for other indicators of DataCite JSON

17e3773

#2018 (comment) This is a temporary fix until the DataCite JSON includes "http://datacite.org/schema" again or we can check the Content-Type response.

zotero / translators

Datacite DOIs partly broken #2018

Datacite DOIs partly broken #2018

adam3smith commented Oct 10, 2019

This comment has been minimized.

adam3smith commented Oct 21, 2019

This comment has been minimized.

Jmuccigr commented Dec 18, 2019

This comment has been minimized.

alexis-michaud commented Jan 22, 2020

This comment has been minimized.

alexis-michaud commented Jan 22, 2020

This comment has been minimized.

adam3smith commented Jan 29, 2020

This comment has been minimized.

alexis-michaud commented Jan 29, 2020

This comment has been minimized.

bjohas commented Apr 11, 2020

This comment has been minimized.

alexis-michaud commented Apr 12, 2020 •

edited

This comment has been minimized.

mfenner commented Apr 12, 2020

This comment has been minimized.

dstillman commented Apr 20, 2020

This comment has been minimized.

mfenner commented Apr 20, 2020

This comment has been minimized.

adam3smith commented Apr 20, 2020

This comment has been minimized.

dstillman commented Apr 20, 2020 •

edited

zotero / translators

Join GitHub today

Datacite DOIs partly broken #2018

Datacite DOIs partly broken #2018

Comments

adam3smith commented Oct 10, 2019

This comment has been minimized.

adam3smith commented Oct 21, 2019

This comment has been minimized.

Jmuccigr commented Dec 18, 2019

This comment has been minimized.

alexis-michaud commented Jan 22, 2020

This comment has been minimized.

alexis-michaud commented Jan 22, 2020

This comment has been minimized.

adam3smith commented Jan 29, 2020

This comment has been minimized.

alexis-michaud commented Jan 29, 2020

This comment has been minimized.

bjohas commented Apr 11, 2020

This comment has been minimized.

alexis-michaud commented Apr 12, 2020 • edited

This comment has been minimized.

mfenner commented Apr 12, 2020

This comment has been minimized.

dstillman commented Apr 20, 2020

This comment has been minimized.

mfenner commented Apr 20, 2020

This comment has been minimized.

adam3smith commented Apr 20, 2020

This comment has been minimized.

dstillman commented Apr 20, 2020 • edited

alexis-michaud commented Apr 12, 2020 •

edited

dstillman commented Apr 20, 2020 •

edited