[don't merge] Switch DOI to DOI.org #1135

adam3smith · Sep 4, 2016

@zuphilip -- this obviously needs cleanup, but I wanted to get your
thoughts on this. It's taking advantage of the now available CSL_JSON
via content negotiation on doi.org (and the functionality you added to
ZU.doGet to use it. The main advantages are

Fewer API requests
Works in Chrome connector for items with DataCite and CrossRef DOIs
Will work with additional services as they become available

Open questions:

What's a good name for the new translator
Should we remove the old DataCite and CrossRef ones, or leave them in
as a fallback at least initially?
Any thoughts on how to test this for data quality? I was going to
manually go through all DOIs that are in the CrossRef translator. Any
other ideas?

As well as any other conceptual issues you can think of (as I say, don't
worry about code quality for now)

zuphilip · Sep 4, 2016

Cool! I also wanted to look at this anyway at some point...

At doi.org the request will only be forwarded to the correct agency with the same header request and only if this agency can handle this header request, we will receive CSL JSON back. Otherwise we might be redirected to some landing page in HTML. Besides CrossRef and DataCite also mEDRA should support content-negation with CSL JSON header, see table at http://www.crosscite.org/cn/.

What's a good name for the new translator

I like DOI Content Negotiation. 👍

It is a search translator and can thereby also been used for the doi search from Zotero itself, i.e. the "magic" wand. I don't know if this effects the discussion here.

Should we remove the old DataCite and CrossRef ones, or leave them in
as a fallback at least initially?

If everything works out fine, then we should not (strictly) need the single search translators anymore. However, this feature is only available from Zotero v4.0.29.11 and therefore anyone running with an older version would be negatively affected by an deletion of the two translators. Maybe we can simple increase the priority numbers of those?

Any thoughts on how to test this for data quality? I was going to
manually go through all DOIs that are in the CrossRef translator. Any
other ideas?

You can create a gist with the examples and then test the web translator (which will implicitly test the search translator), e.g. https://gist.github.com/zuphilip/9e5101219f453964eb7e . It would also be good to have more information when the translation is failing for the users, i.e. the things we see at https://repo.zotero.org/errors.

a) Did you test this translator with Chrome connector already?
b) Do you know if there are any other agency providing metadata in some format over content negotiation or planning to do? Or can we convince them to do this?
c) I don't know what exactly the consequences performance-wise are. I guess that it should be better performance for all working agencies, but how about the performance of dois from other agencies?
d) Maybe, you can try to extend the function filterQuery(items, checkOnly) in a similar way as we have getSearchResults and thereby also using it in detectWeb (there seems to be quite some overlap of lines).
e) It would be nice to indicate which agency actually provided the metadata, but I don't know if we have this information somewhere...

Let me know if I can help you further.

adam3smith · Sep 8, 2016

To your questions:
a) No -- Standalone is actually not on 4.0.29.11 yet, so would have to force Firefox into server mode and might as well wait then.
b) just the three listed on http://www.crosscite.org/cn/
c) The only agency not covered explicitly for which we currently have a translator is Airiti.
d)OK, will do
e) CrossRef actually puts this in the data (it says source: CrossRef in the JSON)

More generally, in doing some quality testing, I'm finding that metadata for books and book chapters in CrossRef are of much lower quality in the CSL JSON than in their UnixRef format. I have contacted them about this and will see what they say. If improvements are not in the card soon, we can still handle all of this in a single translator using content negotation: we can specify an order of preferred formats in the accept header and can just put UnixRef and Datacite XML in there, then check which we get (which should be easy) and run the respective parsing code.

zuphilip · Jun 4, 2017

zuphilip reviewed Jun 4, 2017

View changes

zuphilip · Jun 4, 2017

DOI Content Negotiations.js

+			item.genre = item['container-title']
+		}
+		if (item.type != "article-journal" && item.type != "paper-conference") {
+			item.note = "DOI: " + item.DOI;


This should actually go into extra field.

zuphilip · Jan 1, 2018

More generally, in doing some quality testing, I'm finding that metadata for books and book chapters in CrossRef are of much lower quality in the CSL JSON than in their UnixRef format. I have contacted them about this and will see what they say.

@adam3smith Any update on this matter?

Morveover, testing in 5.0 would be needed now...

@adam3smith And just checking: Is this change still what we want?

adam3smith · Jan 1, 2018

I definitely still want to do this, yes, and I'd be surprised if it didnt' work in 5.0.
The biggest issue is that last I checked we still had the metadata quality issue with CrossRef. We can work around this and still use DOI content negotiation (you can sequence preferred formats and we could distinguish them easily) but obviously it'd be nice to be able to not worry about the UnixRef parser at all any more. I'll look at this again.

dstillman · Feb 26, 2018

@adam3smith: Somewhat tangential, but just to understand this better, for the Zenodo example you gave in the forums, 10.5281/zenodo.889811, it seems the data from content negotiation is different from the metadata available on zenodo.org, even in the same format:

curl -LH "Accept: application/vnd.citationstyles.csl+json" https://doi.org/10.5281/zenodo.889811

{
  "type": "report",
  "id": "https://doi.org/10.5281/zenodo.889811",
  "categories": "qualitative methods",
  "author": [
    {
      "family": "Buthe",
      "given": "Tim"
    },
    {
      "family": "Jacobs",
      "given": "Alan"
    }
  ],
  "issued": {
    "date-parts": [
      [
        2015,
        9,
        30
      ]
    ]
  },
  "abstract": "In this issue, we are delighted to present a symposium, inspired by a panel at the 2015 APSA Annual Meeting, that explores a range of innovative linkages between the ventures of interpretation and causal inference. In remarkably diverse ways, the essays in this collection propose approaches to empirical research that combine tools and logics of causal inferences with techniques and understandings typically associated with interpretive inquiry. Our contributors make the case, respectively, for joining cross-case comparison to ethnography, interpretation to process tracing, ethnographic fieldwork to social network analysis, and interpretive discourse analysis to the quantitative measurement of identity. As Guest Editor of the symposium, Edward Schatz both introduces the pieces and offers a concluding synthesis that unpacks a set of conceptual ambiguities and tensions with which positivist-interpretivist bridge-building efforts must grapple.",
  "DOI": "10.5281/zenodo.889811",
  "page": "-",
  "publisher": "Zenodo",
  "title": "Letter From The Editors"
}

vs.

https://zenodo.org/record/889811/export/csl

{
  "DOI": "10.5281/zenodo.889811", 
  "abstract": "<p>In this issue, we are delighted to present a symposium, inspired by a panel at the 2015 APSA Annual Meeting, that explores a range of innovative linkages between the ventures of interpretation and causal inference. In remarkably diverse ways, the essays in this collection propose approaches to empirical research that combine tools and logics of causal inferences with techniques and understandings typically associated with interpretive inquiry. Our contributors make the case, respectively, for joining cross-case comparison to ethnography, interpretation to process tracing, ethnographic fieldwork to social network analysis, and interpretive discourse analysis to the quantitative measurement of identity. As Guest Editor of the symposium, Edward Schatz both introduces the pieces and offers a concluding synthesis that unpacks a set of conceptual ambiguities and tensions with which positivist-interpretivist bridge-building efforts must grapple.\u00a0</p>", 
  "author": [
    {
      "family": "Buthe, Tim"
    }, 
    {
      "family": "Jacobs, Alan"
    }
  ], 
  "container_title": "Qualitative & Multi-Method Research", 
  "id": "889811", 
  "issue": "2", 
  "issued": {
    "date-parts": [
      [
        2015, 
        9, 
        30
      ]
    ]
  }, 
  "page": "2", 
  "publisher": "Zenodo", 
  "title": "Letter from the editors", 
  "type": "article-journal", 
  "volume": "13"
}

Seems like the former is preprint data that was never updated? Is that common?

I was looking because I was curious how complete CSL-JSON data is compared to other formats. In this case that's not relevant, since they're both CSL-JSON, but are there fields we might care about that don't exist in CSL-JSON? (I didn't figure out any other formats that worked via CN for that DOI anyway, so perhaps irrelevant…)

zuphilip · Feb 26, 2018

Seems like the former is preprint data that was never updated?

I don't think this is what you see here. I guess that the difference comes from the DataCite metadata schema which misses some details which are commons for journal articles, e.g. no journal title, volume, issue, pages.

If you start with less information to create a CSL-JSON at DataCite than it understandable that this results in a different CSL-JSON than when you start with more information at Zenodo.

adam3smith · Feb 26, 2018

yes, @zuphilip has that exactly right -- this is caused by back and forth translation to/from an unsuitable schema. I've been talking to DataCite and they're very likely going to update their schema to better accommodate articles in the future.

adam3smith · Feb 26, 2018

You can see the full entry into the datacite catalog using curl -LH "Accept: application/vnd.datacite.datacite+json" https://doi.org/10.5281/zenodo.889811

  "id": "https://doi.org/10.5281/zenodo.889811",
  "doi": "10.5281/zenodo.889811",
  "creator": [
    {
      "type": "Person",
      "name": "Tim Buthe",
      "givenName": "Tim",
      "familyName": "Buthe"
    },
    {
      "type": "Person",
      "name": "Alan Jacobs",
      "givenName": "Alan",
      "familyName": "Jacobs"
    }
  ],
  "title": "Letter From The Editors",
  "publisher": "Zenodo",
  "resource_type_general": "Text",
  "resource_type": "Journal article",
  "subject": [
    [
      "qualitative methods"
    ]
  ],
  "date_published": "2015-09-30",
  "date_registered": "2018-01-03T19:25:14.000Z",
  "date_updated": "2018-01-10T16:45:14.000Z",
  "publication_year": 2015,
  "alternate_identifier": {
    "type": "url",
    "name": "https://zenodo.org/record/10.5281/zenodo.889811"
  },
  "rights": [
    {
      "id": "https://creativecommons.org/licenses/by-nc-nd/4.0",
      "name": "Creative Commons Attribution-NonCommercial-NoDerivatives"
    },
    {
      "name": "Open Access"
    }
  ],
  "description": {
    "type": "Abstract",
    "text": "In this issue, we are delighted to present a symposium, inspired by a panel at the 2015 APSA Annual Meeting, that explores a range of innovative linkages between the ventures of interpretation and causal inference. In remarkably diverse ways, the essays in this collection propose approaches to empirical research that combine tools and logics of causal inferences with techniques and understandings typically associated with interpretive inquiry. Our contributors make the case, respectively, for joining cross-case comparison to ethnography, interpretation to process tracing, ethnographic fieldwork to social network analysis, and interpretive discourse analysis to the quantitative measurement of identity. As Guest Editor of the symposium, Edward Schatz both introduces the pieces and offers a concluding synthesis that unpacks a set of conceptual ambiguities and tensions with which positivist-interpretivist bridge-building efforts must grapple."
  },
  "schemaVersion": "http://datacite.org/schema/kernel-3",
  "provider_id": "CERN",
  "client_id": "CERN.ZENODO",
  "provider": "DataCite"
}

The obvious things we're not getting from CSL JSON are subject (tags) and license (rights)

adam3smith · Jan 24, 2019

Reviewing with @zuphilip -- we're currently thinking that to get the best result, we'll want to

Specify specific formats (CrossRef, DataCite), both of which are more expressive (and complete) than CSL JSON. We have a CrossRef UnixRef importers (one in the REST the other one in the standard CrossRef; we'll likely take the regualr CrossRef one because that's for the XML). We'd need to write one for DataCite JSON.
Then fall back to CSL JSON

This will currently work for mEDRA, Japan Link Center, and Kistic. crosscite.org suggests it should also work for ISTIC, but that's currently not the case.

Additional idea for the longer term would be to check if the EU publication office (which doesn't support CN) has a very limited set of prefixes. If that's the case, we could follow the URL and import from the landing page which has OK metadata. Not implementing this isn't a showstopper -- we currently fail on OP DOIs, too.

zuphilip · Jan 24, 2019

For the call URL we can use something like

curl -LH "Accept: application/vnd.crossref.unixref+xml, application/vnd.datacite.datacite+json, application/vnd.citationstyles.csl+json" https://doi.org/10.5281/zenodo.11813

and for differentiation of the different JSON formats, DataCite includes

 "schemaVersion": "http://datacite.org/schema/kernel-4",

which CSL JSON does not.

zuphilip · Jan 26, 2019

zuphilip reviewed Jan 26, 2019

View changes

DOI.js

zuphilip · Jan 27, 2019

zuphilip reviewed Jan 27, 2019

View changes

These are some comments about the code to clean it up. Some easier things I suggested directly in my PR adam3smith#11 to your branch. In general this looks good. This PR now depends that we first merge #1812 or merge that first into this branch. I only tested a few cases and try to do some more tests, but they work fine.

deleted.txt

DataCite.js

DOI Content Negotiations.js

zuphilip · Jan 27, 2019

DOI Content Negotiations.js

+			Z.debug(item)
+			if (item.itemType == "report") {
+				item.reportType = item.publicationTitle;
+			}


Why do we do these fixes here? I would rather suggest to do these fixes in the CSL JSON translator... If possible then these three cases would just differentiate by the different translator we choose.

zuphilip · Jan 27, 2019

DOI Content Negotiations.js

+				"url": "http://dx.doi.org/10.12763/ONA1045",
+				"DOI": "10.12763/ONA1045",
+				"date": "1734",
+				"libraryCatalog": "DataCite"


Update the test here.

You do this manually, right?

zuphilip · Jan 28, 2019

Okay, I tested with 140 Crossref DOIs and 13 Datacite DOIs found in My Library. All Datacite DOIs are working, but two Crossref DOIs failed because the title was empty (fixed in PR) and two other DOIs failed where I don't know why exactly: 10.1002/(SICI)1097-461X(1999)72:4<287::AID-QUA11>3.0.CO;2-Y, 10.1002/(SICI)1099-1360(199711)6:6<320::AID-MCDA164>3.0.CO;2-2. Overall this PR seems now quite stable. For testing the Web translator one can use the page https://gist.github.com/zuphilip/9e5101219f453964eb7e but this will be extended anyway in #1799 by @mrtcode.

zuphilip · Jan 28, 2019

zuphilip reviewed Jan 28, 2019

View changes

DOI Content Negotiations.js

mfenner · Jan 31, 2019

@zuphilip SICIs (your failing examples, e.g. 10.1002/(SICI)1097-461X(1999)72:4<287::AID-QUA11>3.0.CO;2-Y) are a pain to work and break many things. Luckily there a child of the 1990s and no new ones are generated (Crossref stopped supporting them at some point I think).

adam3smith · Feb 2, 2019

This is working nicely for me now & a clear improvement over current behavior. It's a major change, though, so would be good to get another set of eyes on this before merging.
@dstillman @mrtcode would one of you be willing to take a look?

adam3smith · Apr 2, 2019

Or do you mean to stash on the branch and then pop on master?

Yes, some version of that.

retorquere · Apr 2, 2019

When I run the current state of this DOI translator, the tests time out without completing.

adam3smith · Apr 2, 2019

You included the other translators in the translator directory? Pretty sure I tested this last night.

retorquere · Apr 2, 2019

Ah -- no, I did not.

adam3smith · Apr 2, 2019

The deletions are only relevant if you're going to test add by identifier, but you'll need all added&changed files for the DOI.js tests

retorquere · Apr 2, 2019

I think I have the PR-on-PR submitted.

adam3smith · Apr 2, 2019

Yup! I'll look more closely tonight, but generally looks great, thanks. I have one question right away -- this was a part of the code that I didn't quite understand and I think your code does something slightly different -- which may be fine, but I want to understand what we were thinking originally.

retorquere · Apr 2, 2019

I don't know -- my guess was that at some point, when there was just one, the selector wasn't shown (edit: this was indeed the case, but this was removed in 569df5d while the check stayed in place). But I didn't want to change any more than required to pacify the linter.

retorquere · Apr 3, 2019

When I go to https://www.ncbi.nlm.nih.gov/pubmed/?term=Recent+Trends+in+Advanced+Contact+Lenses.+Advanced+Healthcare+Materials and fetch using the DOI translator, I don't get the abstract, but if I follow the DOI, I end up at https://onlinelibrary.wiley.com/doi/full/10.1002/adhm.201801390, and when I fetch there, I do get an abstract. DOI Content Negotiation seems to play a role here, but can someone walk me through the rough outline of what the DOI chain does? Would it be hard to add the abstract fetching?

adam3smith · Apr 3, 2019

(...) I do get an abstract. DOI Content Negotiation seems to play a role here, but can someone walk me through the rough outline of what the DOI chain does? Would it be hard to add the abstract fetching?

The two access entirely different things: When you use the DOI translator, it requests metadata for the DOI via content negotiation (i.e. specifying the metadata format in the accept header of the request) and the metadata is then provided by Crossref (or another DOI registration agency, but mostly Crossref for articles). Very few articles on Crossref include abstracts in the metadata.

When you go straight to Wiley, you use the dedicated translator for Wiley, which uses the metadata provided by Wiley in its export functions (and may also try to scrape the abstract from the page). But of course prior to following the DOI, you don't know which publisher/site you'll land on, so you can't specify the translator. Because of that, within the current translator framework, getting the abstract isn't possible.

I think it'd be conceptually doable to follow the DOI resolution (i.e. in this case to Wiley), see if we have a translator for that page, and if so, import using that translator (and otherwise fall back to using DOI metadata), but whether and how to do that is a more involved discussion, probably better for zotero-dev. Among its downsides is that it's much slower since you're making multiple calls that require full pageloads now rather than a lightweight call to an API.

adam3smith · Apr 4, 2019

OK, the only linting error remaining are the no-loop-func ones in DOI: @retorquere removed the outer IIFE, but there are callback function on .setHandler that I think we need.
@dstillman OK to add exceptions for those? See https://github.com/adam3smith/translators/blob/DOI/DOI.js#L144

retorquere · Apr 4, 2019

It's possible to lift the handlers out of the loop but I feel it'd be less readable. The other possibility I thought might work was using Promise.all, but I've tested and I couldn't get it to work.

adam3smith · Apr 4, 2019

Yeah, agree on not moving the handlers out of the loop, that makes no sense conceptually.

dstillman · Apr 4, 2019

OK to add exceptions for those?

Yeah, that's fine.

adam3smith · Apr 5, 2019

Alright! I think this is ready to merge. Will do so tomorrow PM until I hear any objections. Thanks everyone for help & advice !

retorquere · Apr 5, 2019

Just as a general point of interest -- can someone tell my why this promisified version doesn't work? The DOI scraping works of course, but in Scaffold I get TranslatorTester: DOI Test 1: failed (Translation failed: no items returned) even if I can see the then part is ran just after:

20:12:35 TranslatorTester: Running 1 test for DOI
20:12:35 TranslatorTester: Running DOI Test 1
20:12:37 TranslatorTester: Translating http://blog.apastyle.org/apastyle/digital-object-identifier-doi/
20:12:37 [
             "0": "10.1037/arc0000014"
             "1": "10.1037/rmh0000008"
             "2": "10.1037/a0028240"
         ]
20:12:38 Translation successful
20:12:38 TranslatorTester: DOI Test 1: failed (Translation failed: no items returned)
20:12:38 got 3 DOIs

dstillman · Apr 5, 2019

Web translators don't properly support promises yet. We'll have proper async/await support in the near future.

dstillman · Apr 5, 2019

Related: zotero/zotero#1609

adam3smith · Apr 5, 2019

sorry, I think I said something to the contrary above, having remembered Dan's work on import translators.

adam3smith · Apr 7, 2019

Alright, here we go! Thanks everyone!

Some cleanup
This now works decently well

3c91d53

adam3smith referenced this pull request Apr 12, 2018
Merged
CrossRef REST #1618

Merge branch 'master' into DOI

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

8dd8d6a

zuphilip referenced this pull request Jan 26, 2019
Merged
Add new Datacite JSON translator #1812

adam3smith and others added some commits Jan 26, 2019

Actually add license
Delete Datacite & Crossref

Loading status checks…

8c5e682

Update and simplify

3d331e4

Handle cases where title is empty
E.g. 10.1023/A:1005640604267 or 10.4086/cjtcs.2012.002

88e1877

Merge pull request #11 from zuphilip/DOI
Update and simplify

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

d8d0e01

Address review comments
Update tests

Loading status checks…

a3cfc13

retorquere added some commits Apr 2, 2019

treeWalker is substantially faster than xpath

30e8e17

address lint reports

9e87c5e

remove multi-provider handling

4ed54fa

locally disable no-loop-func

Loading status checks…

fa24f6f

adam3smith merged commit ed48cb5 into zotero:master Apr 7, 2019
1 check passed

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

zotero/translators

Join GitHub today

[don't merge] Switch DOI to DOI.org #1135

Conversation

adam3smith commented Sep 4, 2016

This comment has been minimized.

zuphilip commented Sep 4, 2016

This comment has been minimized.

adam3smith commented Sep 8, 2016

zuphilip reviewed Jun 4, 2017 View changes

This comment has been minimized.

zuphilip Jun 4, 2017

This comment has been minimized.

zuphilip commented Jan 1, 2018

This comment has been minimized.

adam3smith commented Jan 1, 2018

This comment has been minimized.

dstillman commented Feb 26, 2018

This comment has been minimized.

zuphilip commented Feb 26, 2018

This comment has been minimized.

adam3smith commented Feb 26, 2018

This comment has been minimized.

adam3smith commented Feb 26, 2018

adam3smith referenced this pull request Apr 12, 2018

CrossRef REST #1618

This comment has been minimized.

adam3smith commented Jan 24, 2019 • edited

This comment has been minimized.

zuphilip commented Jan 24, 2019

zuphilip referenced this pull request Jan 26, 2019

Add new Datacite JSON translator #1812

adam3smith added some commits Jan 26, 2019

zuphilip reviewed Jan 26, 2019 View changes

adam3smith and others added some commits Jan 26, 2019

zuphilip reviewed Jan 27, 2019 View changes

zuphilip left a comment • edited

This comment has been minimized.

zuphilip Jan 27, 2019

This comment has been minimized.

zuphilip Jan 27, 2019

This comment has been minimized.

zuphilip commented Jan 28, 2019 • edited

zuphilip reviewed Jan 28, 2019 View changes

This comment has been minimized.

mfenner commented Jan 31, 2019

This comment has been minimized.

adam3smith commented Feb 2, 2019

This comment has been minimized.

adam3smith commented Apr 2, 2019

This comment has been minimized.

retorquere commented Apr 2, 2019

This comment has been minimized.

adam3smith commented Apr 2, 2019

This comment has been minimized.

retorquere commented Apr 2, 2019

This comment has been minimized.

adam3smith commented Apr 2, 2019

retorquere added some commits Apr 2, 2019

This comment has been minimized.

retorquere commented Apr 2, 2019

This comment has been minimized.

adam3smith commented Apr 2, 2019

This comment has been minimized.

retorquere commented Apr 2, 2019 • edited

This comment has been minimized.

retorquere commented Apr 3, 2019

This comment has been minimized.

adam3smith commented Apr 3, 2019

retorquere and others added some commits Apr 3, 2019

This comment has been minimized.

adam3smith commented Apr 4, 2019

This comment has been minimized.

retorquere commented Apr 4, 2019

This comment has been minimized.

adam3smith commented Apr 4, 2019

This comment has been minimized.

dstillman commented Apr 4, 2019

This comment has been minimized.

adam3smith commented Apr 5, 2019

zuphilip reviewed Jun 4, 2017

View changes

adam3smith commented Jan 24, 2019 •

edited

zuphilip reviewed Jan 26, 2019

View changes

zuphilip reviewed Jan 27, 2019

View changes

zuphilip left a comment •

edited

zuphilip commented Jan 28, 2019 •

edited

zuphilip reviewed Jan 28, 2019

View changes

retorquere commented Apr 2, 2019 •

edited

adam3smith merged commit `ed48cb5` into zotero:master Apr 7, 2019
1 check passed