Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datacite DOIs partly broken #2018

Open
adam3smith opened this issue Oct 10, 2019 · 13 comments
Open

Datacite DOIs partly broken #2018

adam3smith opened this issue Oct 10, 2019 · 13 comments

Comments

@adam3smith
Copy link
Collaborator

@adam3smith adam3smith commented Oct 10, 2019

I have not investigated how widely that's the case, but for at least some Datacite DOIs, the schemaVersion has been removed and thus our detect (and hence import) fails. I've reported this upstream at Datacite. I'd suggest we wait for a couple of days before changing the translator as the schema is by far the most elegant way to detect.

@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

@adam3smith adam3smith commented Oct 21, 2019

DataCite have confirmed verbally that they'll fix this.

@Jmuccigr

This comment has been minimized.

Copy link
Contributor

@Jmuccigr Jmuccigr commented Dec 18, 2019

How can I tell which DOIs are affected? I've got a couple that resolve just fine via the doi.org URI, but not with Zotero.

@alexis-michaud

This comment has been minimized.

Copy link

@alexis-michaud alexis-michaud commented Jan 22, 2020

How can I tell which DOIs are affected? I've got a couple that resolve just fine via the doi.org URI, but not with Zotero.

@Jmuccigr : You can't tell from looking at the DOI: you need to know if the database where the corresponding metadata are storied is at DataCite or CrossRef. If the former, Zotero import won't work until the bug mentioned above is fixed at their end.

@alexis-michaud

This comment has been minimized.

Copy link

@alexis-michaud alexis-michaud commented Jan 22, 2020

Hi, just a note to thank for the information on what's going wrong here and confirm that the fix will be hailed with much enthusiasm. The Pangloss Collection, an open archive of endangered languages, now has DOIs for all resources, with serious metadata (code available here), and seamless integration with Zotero would be super-cool.

(Here's one such DOI, if anyone wants to test: https://doi.org/10.24397/pangloss-0005709 )

@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

@adam3smith adam3smith commented Jan 29, 2020

How can I tell which DOIs are affected?

If you use https://search.datacite.org/works/{doi} in your browser and it shows a search result, then it's a datacite DOI. So for the ones above:
https://search.datacite.org/works/10.24397/pangloss-0005709

@alexis-michaud

This comment has been minimized.

Copy link

@alexis-michaud alexis-michaud commented Jan 29, 2020

👍

@bjohas

This comment has been minimized.

Copy link

@bjohas bjohas commented Apr 11, 2020

Would it be at all possible to follow up DataCite? We'd really like to use Zenodo DOIs for import to Zotero... https://forums.zotero.org/discussion/comment/353170

@alexis-michaud

This comment has been minimized.

Copy link

@alexis-michaud alexis-michaud commented Apr 12, 2020

DataCite Technical Director Martin Fenner kindly answered an e-mail query sent to him on this topic saying:
"this is an open bug that we will hopefully fix soon enough"
The message is from April 6th, 2020.
We too look forward very much to the fix.

@mfenner

This comment has been minimized.

Copy link
Contributor

@mfenner mfenner commented Apr 12, 2020

@alexis-michaud thanks for posting this. Because we are dealing with a number of other important issues at DataCite right now, I am reluctant to give a specific timeline for a fix. But we will aim to resolve this in April.

@dstillman

This comment has been minimized.

Copy link
Member

@dstillman dstillman commented Apr 20, 2020

@adam3smith: This seems to be causing a freeze in PDF metadata retrieval. We should obviously fix that in Zotero itself, but as a quick hack that we can push out right away, do you think we can change

else if (text.includes("http://datacite.org/schema")) {
to just look for "agency": "DataCite"? (Trying to parse the text as JSON and access the property would be cleaner but probably not necessary. Obviously checking the response Content-Type would be best but we can't do that currently.)

@mfenner

This comment has been minimized.

Copy link
Contributor

@mfenner mfenner commented Apr 20, 2020

Unfortunately "agency" is not yet available for all DOIs in our API, so would probably also not reliably work. We have some other fields that are probably unique (e.g. ClientId, ProviderId, state), but schema and agency are really the two fields that make the most sense.

Until the end of the year we will be adding many DOIs from other registration agencies to our index, and we have already added 8.5 million Crossref DOIs. As you can see in https://api.datacite.org/dois?client-id=crossref.citations, all these Crossref DOIs use schema http://datacite.org/schema/kernel-4, as we convert the Crossref metadata into DataCite XML. So thinking about it a bit more, agency is the best field to determine whether a DOI from our APIs is a DataCite DOI.

Obviously you fetch metadata via DOI content negotiation so should never run into the Crossref DOIs in our index. One question I have is why you don't directly ask for Citeproc JSON from DataCite in the DOI content negotiation, the implementation is much better than a few years ago.

To summarize, what is the best way to fix this? Should DataCite fix the inclusion of the schema "http://datacite.org/schema" as quickly as possible, but definitely before the end of April, or do you want to switch to another strategy to detect a DataCite DOI, e.g. via agency?

@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

@adam3smith adam3smith commented Apr 20, 2020

One question I have is why you don't directly ask for Citeproc JSON from DataCite in the DOI content negotiation, the implementation is much better than a few years ago.

I don't remember the details, but we experimented with this in January 2019 and found that we got significantly better import using the Datacite JSON

dstillman added a commit that referenced this issue Apr 20, 2020
#2018 (comment)

This is a temporary fix until the DataCite JSON includes
"http://datacite.org/schema" again or we can check the Content-Type
response.
dstillman added a commit that referenced this issue Apr 20, 2020
```
curl -L -v \
    -H "Accept: application/vnd.datacite.datacite+json, application/vnd.crossref.unixref+xml, application/vnd.citationstyles.csl+json" \
    'https://doi.org/10.7916/d8959hr1'
```

The JSON for https://doi.org/10.7916/d8959hr1 didn't have many of the
fields expected by the translator, so I made them optional.

The new test fails because of #2018.
@dstillman

This comment has been minimized.

Copy link
Member

@dstillman dstillman commented Apr 20, 2020

OK, I put in a temporary fix for Retrieve Metadata for PDF. The DOI Content Negotiation translator is now looking for the strings "agency": "DataCite" or "providerId": in addition to http://datacite.org/schema.

Unfortunately that wasn't sufficient, because at least for 10.7916/D8959HR1, the DataCite JSON translator then failed with data.subjects is undefined looking for tags, and if I put that in a conditional it failed with data.formats is undefined looking for medium. Others would have failed after that, so I just made most of the fields optional.

For reference, here's the request we're making:

curl -v -L -H "Accept: application/vnd.datacite.datacite+json, application/vnd.crossref.unixref+xml, application/vnd.citationstyles.csl+json" 'https://doi.org/10.7916%2Fd8959hr1'

When I tried just removing application/vnd.datacite.datacite+json from the Accept header, DataCite returned a 200 with Content-Type: application/vnd.crossref.unixref+xml and no content (which seems wrong). It only worked if I removed Crossref XML from Accept as well, in which case it returned CSL JSON.

But for now, with the above fixes, the DataCite JSON is working, at least for this DOI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.