Update CiteSeer to work from PDF paper view #1535

GuyAglionby · Jan 28, 2018

As the title says, as the DOI is available in the PDF url we can get the URL for the summary page and grab the information from there.

Thanks and any comments appreciated.

zuphilip · Jan 28, 2018

zuphilip requested changes Jan 28, 2018

View changes

Can you add a test case for this as well? Please have a look also at my other two inline comments, just two nits.

zuphilip · Jan 28, 2018

CiteSeer.js

@@ -40,7 +40,8 @@ function detectWeb(doc, url) {
 	if ((url.indexOf('/search') != -1 || url.indexOf('/showciting') != -1) && getSearchResults(doc).length) {
 		return "multiple";
 	}
-	if (url.indexOf('/viewdoc/') != -1 && doc.getElementById('bibtex')) {
+	if ((url.indexOf('/viewdoc/') != -1 && doc.getElementById('bibtex'))
+		|| url.indexOf('/download?doi=') != -1) {


We prefer nowadays the function .includes() for such tests. Can you change these two (and possibly also the ones in the line above)?

zuphilip · Jan 28, 2018

CiteSeer.js

@@ -66,6 +67,11 @@ function doWeb(doc, url) {
 			}
 			ZU.processDocuments(articles, scrape);
 		});
+	} else if (url.indexOf('/download?doi=') !== -1) {
+		let doi = url.replace('http://citeseerx.ist.psu.edu/viewdoc/download?doi=', '');
+		doi = doi.replace('&rep=rep1', '').replace('&type=pdf', '');


This looks a little fragile. Thus, I would prefer to work here with a regular expression, i.e. something like:

let doi = url.match(/\/download\?doi=([^&]*)/); let paperUrl = 'http://citeseerx.ist.psu.edu/viewdoc/summary?doi=' + doi[1]; ZU.processDocuments(paperUrl, scrape);

(Note, this code is not tested.)

GuyAglionby · Jan 28, 2018

Thanks for the comments! I added a test with the correct info, however when I run it I get an error:

TranslatorTester: Translating http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.415.9750&rep=rep1&type=pdf

TranslatorTester: CiteSeer Test 1: failed (Translation failed to initialize: TypeError: doc is null)

I assume this is because it's a direct link to a PDF. I looked at the translator for arXiv.org and it has a PDF test case that also fails with the same error. I'm on Zotero 5.0.34 and Scaffold 3.3.0. Let me know if you want me to push the test case anyway.

I have made the other changes as requested and tested them :)

zuphilip · Jan 28, 2018

Thank you! You are right, Scaffold tries to open the pdf directly and therefore a detection is not possible. Thus, I mentioned the test case in the comment and add some other cleanup and pushed directly to your branch.

Update CiteSeer to work from PDF paper view

Loading status checks…

c27de8b

Small changes to CiteSeer

Loading status checks…

78e8c93

GuyAglionby referenced this pull request Jan 28, 2018
Closed
Add new Google Research translator #1503

Update CiteSeer.js

Loading status checks…

e881b6c

zuphilip merged commit ba56c70 into zotero:master Jan 28, 2018
1 check passed

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

psisquared2 added a commit to psisquared2/translators that referenced this pull request Feb 8, 2018

Update CiteSeer to work from PDF paper view (zotero#1535)
Also hardcode detection of one test case and small cleanup work.

58ba007

zotero/translators

Update CiteSeer to work from PDF paper view #1535

Update CiteSeer to work from PDF paper view #1535

GuyAglionby commented Jan 28, 2018

zuphilip requested changes Jan 28, 2018

View changes

zuphilip left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

GuyAglionby commented Jan 28, 2018

GuyAglionby referenced this pull request Jan 28, 2018

Add new Google Research translator #1503

This comment has been minimized.

zuphilip commented Jan 28, 2018

zuphilip merged commit `ba56c70` into zotero:master Jan 28, 2018
1 check passed

1 check passed

psisquared2 added a commit to psisquared2/translators that referenced this pull request Feb 8, 2018

zotero/translators

Join GitHub today

Update CiteSeer to work from PDF paper view #1535

Conversation

GuyAglionby commented Jan 28, 2018

zuphilip requested changes Jan 28, 2018 View changes

zuphilip left a comment

This comment has been minimized.

zuphilip Jan 28, 2018

This comment has been minimized.

zuphilip Jan 28, 2018

This comment has been minimized.

GuyAglionby commented Jan 28, 2018

GuyAglionby referenced this pull request Jan 28, 2018

Add new Google Research translator #1503

This comment has been minimized.

zuphilip commented Jan 28, 2018

Hide details View details zuphilip merged commit ba56c70 into zotero:master Jan 28, 2018 1 check passed

1 check passed

psisquared2 added a commit to psisquared2/translators that referenced this pull request Feb 8, 2018

zuphilip requested changes Jan 28, 2018

View changes

zuphilip merged commit `ba56c70` into zotero:master Jan 28, 2018
1 check passed