Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up[don't merge] Switch DOI to DOI.org #1135
Conversation
This comment has been minimized.
This comment has been minimized.
Cool! I also wanted to look at this anyway at some point... At
I like DOI Content Negotiation. It is a search translator and can thereby also been used for the doi search from Zotero itself, i.e. the "magic" wand. I don't know if this effects the discussion here.
If everything works out fine, then we should not (strictly) need the single search translators anymore. However, this feature is only available from Zotero
You can create a gist with the examples and then test the web translator (which will implicitly test the search translator), e.g. https://gist.github.com/zuphilip/9e5101219f453964eb7e . It would also be good to have more information when the translation is failing for the users, i.e. the things we see at https://repo.zotero.org/errors. a) Did you test this translator with Chrome connector already? Let me know if I can help you further. |
This comment has been minimized.
This comment has been minimized.
To your questions: More generally, in doing some quality testing, I'm finding that metadata for books and book chapters in CrossRef are of much lower quality in the CSL JSON than in their UnixRef format. I have contacted them about this and will see what they say. If improvements are not in the card soon, we can still handle all of this in a single translator using content negotation: we can specify an order of preferred formats in the accept header and can just put UnixRef and Datacite XML in there, then check which we get (which should be easy) and run the respective parsing code. |
zuphilip
reviewed
Jun 4, 2017
item.genre = item['container-title'] | ||
} | ||
if (item.type != "article-journal" && item.type != "paper-conference") { | ||
item.note = "DOI: " + item.DOI; |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@adam3smith Any update on this matter? Morveover, testing in 5.0 would be needed now... @adam3smith And just checking: Is this change still what we want? |
This comment has been minimized.
This comment has been minimized.
I definitely still want to do this, yes, and I'd be surprised if it didnt' work in 5.0. |
This comment has been minimized.
This comment has been minimized.
@adam3smith: Somewhat tangential, but just to understand this better, for the Zenodo example you gave in the forums, 10.5281/zenodo.889811, it seems the data from content negotiation is different from the metadata available on zenodo.org, even in the same format:
vs. https://zenodo.org/record/889811/export/csl
Seems like the former is preprint data that was never updated? Is that common? I was looking because I was curious how complete CSL-JSON data is compared to other formats. In this case that's not relevant, since they're both CSL-JSON, but are there fields we might care about that don't exist in CSL-JSON? (I didn't figure out any other formats that worked via CN for that DOI anyway, so perhaps irrelevant…) |
This comment has been minimized.
This comment has been minimized.
I don't think this is what you see here. I guess that the difference comes from the DataCite metadata schema which misses some details which are commons for journal articles, e.g. no journal title, volume, issue, pages. If you start with less information to create a CSL-JSON at DataCite than it understandable that this results in a different CSL-JSON than when you start with more information at Zenodo. |
This comment has been minimized.
This comment has been minimized.
yes, @zuphilip has that exactly right -- this is caused by back and forth translation to/from an unsuitable schema. I've been talking to DataCite and they're very likely going to update their schema to better accommodate articles in the future. |
This comment has been minimized.
This comment has been minimized.
You can see the full entry into the datacite catalog using
The obvious things we're not getting from CSL JSON are subject (tags) and license (rights) |
This comment has been minimized.
This comment has been minimized.
Reviewing with @zuphilip -- we're currently thinking that to get the best result, we'll want to
This will currently work for mEDRA, Japan Link Center, and Kistic. crosscite.org suggests it should also work for ISTIC, but that's currently not the case. Additional idea for the longer term would be to check if the EU publication office (which doesn't support CN) has a very limited set of prefixes. If that's the case, we could follow the URL and import from the landing page which has OK metadata. Not implementing this isn't a showstopper -- we currently fail on OP DOIs, too. |
This comment has been minimized.
This comment has been minimized.
For the call URL we can use something like
and for differentiation of the different JSON formats, DataCite includes
which CSL JSON does not. |
adam3smith
added some commits
Jan 26, 2019
zuphilip
reviewed
Jan 26, 2019
adam3smith
and others
added some commits
Jan 26, 2019
zuphilip
reviewed
Jan 27, 2019
These are some comments about the code to clean it up. Some easier things I suggested directly in my PR adam3smith#11 to your branch. In general this looks good. This PR now depends that we first merge #1812 or merge that first into this branch. I only tested a few cases and try to do some more tests, but they work fine. |
deleted.txt Outdated
DOI Content Negotiations.js Outdated
Z.debug(item) | ||
if (item.itemType == "report") { | ||
item.reportType = item.publicationTitle; | ||
} |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jan 27, 2019
Collaborator
Why do we do these fixes here? I would rather suggest to do these fixes in the CSL JSON translator... If possible then these three cases would just differentiate by the different translator we choose.
"url": "http://dx.doi.org/10.12763/ONA1045", | ||
"DOI": "10.12763/ONA1045", | ||
"date": "1734", | ||
"libraryCatalog": "DataCite" |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Okay, I tested with 140 Crossref DOIs and 13 Datacite DOIs found in My Library. All Datacite DOIs are working, but two Crossref DOIs failed because the title was empty (fixed in PR) and two other DOIs failed where I don't know why exactly: |
This comment has been minimized.
This comment has been minimized.
@zuphilip SICIs (your failing examples, e.g. 10.1002/(SICI)1097-461X(1999)72:4<287::AID-QUA11>3.0.CO;2-Y) are a pain to work and break many things. Luckily there a child of the 1990s and no new ones are generated (Crossref stopped supporting them at some point I think). |
This comment has been minimized.
This comment has been minimized.
This is working nicely for me now & a clear improvement over current behavior. It's a major change, though, so would be good to get another set of eyes on this before merging. |
zuphilip
reviewed
Feb 3, 2019
if(typeof item[field] != 'string') continue; | ||
//check for control characters that should never be in strings from CrossRef | ||
if(/[\u007F-\u009F]/.test(item[field])) { | ||
item[field] = decodeURIComponent(escape(item[field])); |
This comment has been minimized.
This comment has been minimized.
zuphilip
Feb 3, 2019
Collaborator
Can we do the same here without the escape
function which one shouldn't use anymore?
This comment has been minimized.
This comment has been minimized.
adam3smith
Feb 4, 2019
Author
Collaborator
Not easily. Neither JSON.parse
(SO's top answer) nor encodeURIComponent
(which is almost the same as escape
), work exactly the same way (and don't work for the case referenced here), so we'd have to build a function for this specific purpose. Since escape
is going nowhere -- it's technically not even deprecated -- I'd be inclined to just leave it.
This comment has been minimized.
This comment has been minimized.
zuphilip
Feb 4, 2019
Collaborator
I played a little around and it seems that the following code might work:
var escaped = item[field].replace(/[^0-9A-Za-z ]/g, function(c) {
return "%" + c.charCodeAt(0).toString(16);
});
item[field] = decodeURIComponent(escaped);
However, I don't understand what we do here conceptually. Escaping and then decoding again should more or less give the same string as we have in the beginning...
This comment has been minimized.
This comment has been minimized.
Note to self -- we still need to adjust priorities; no need to keep content-negotiation at 80 |
This comment has been minimized.
This comment has been minimized.
(This should be "DOI Content Negotiation.js" (no "s"), right?) |
This comment has been minimized.
This comment has been minimized.
We should address https://forums.zotero.org/discussion/54197/aps-articles-missing-page-number-in-add-by-doi#latest before merging. |
adam3smith commentedSep 4, 2016
@zuphilip -- this obviously needs cleanup, but I wanted to get your
thoughts on this. It's taking advantage of the now available CSL_JSON
via content negotiation on doi.org (and the functionality you added to
ZU.doGet to use it. The main advantages are
Open questions:
as a fallback at least initially?
manually go through all DOIs that are in the CrossRef translator. Any
other ideas?
As well as any other conceptual issues you can think of (as I say, don't
worry about code quality for now)