Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recognition of articles and getting PDFs in Nature #1766

Merged
merged 2 commits into from Oct 13, 2018

Conversation

Projects
None yet
2 participants
@adam3smith
Copy link
Collaborator

adam3smith commented Oct 12, 2018

(this was reported on the Zotero forums, but can't find it (?)

@adam3smith adam3smith requested a review from zuphilip Oct 12, 2018

@zuphilip

This comment has been minimized.

Copy link
Collaborator

zuphilip commented Oct 12, 2018

I guess that it fixes #1763 and maybe this is the report you remembered?

@zuphilip
Copy link
Collaborator

zuphilip left a comment

Okay, looks fine. Some small comments and suggestions. The most fragile thing is IMO the order of the attribute values of data-track-action which we are depending on several places. However, we can hope that the order is fixed in their sites (at least I would assume that, as long as we don't hear anything else).

if (!item.attachments) {
var hasPDF = false;
for (let attach of item.attachments){
if (attach.title.includes("PDF")) {

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 12, 2018

Collaborator

Let us filter on the mimeType == "application/pdf" instead.

}
}
if (!hasPDF) {
item.attachments=[];

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 12, 2018

Collaborator

That line seems negligible, because its value is overwritten by the next line.

var m = url.match(/(^[^#?]+\/)(?:full|abs)(\/[^#?]+?\.)[a-zA-Z]+(?=$|\?|#)/);
if (m && m.length) return m[1] + 'pdf' + m[2] + 'pdf';
else if (attr(doc, 'a[data-track-action="download pdf"]', 'href')) {
return attr(doc, 'a[data-track-action="download pdf"]', 'href');

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 12, 2018

Collaborator

Is the check before here needed? attr should return null if the element is not found, which should be okay here.

@@ -354,7 +361,7 @@ function scrapeRIS(doc, url, next) {
if (!risURL) risURL = doc.evaluate('//li[@class="download-citation"]/a', doc, null, XPathResult.ANY_TYPE, null).iterateNext();
if (!risURL) risURL = doc.evaluate('//a[normalize-space(text())="Export citation" and not(@href="#")]', doc, null, XPathResult.ANY_TYPE, null).iterateNext();
if (!risURL) risURL = ZU.xpath(doc, '//ul[@data-component="article-info-list"]//a[@data-track-source="citation-download"]')[0];

if (!risURL) risURL = doc.querySelectorAll('a[data-track-action="download article citation"]')[0];

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 12, 2018

Collaborator

For the first element you can also use .querySelector cf. https://developer.mozilla.org/de/docs/Web/API/Document/querySelector

@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

adam3smith commented Oct 12, 2018

OK, all done

@zuphilip zuphilip merged commit 4381a20 into zotero:master Oct 13, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zuphilip

This comment has been minimized.

Copy link
Collaborator

zuphilip commented Oct 13, 2018

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.