Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upUpdate CiteSeer to work from PDF paper view #1535
Conversation
zuphilip
requested changes
Jan 28, 2018
Can you add a test case for this as well? Please have a look also at my other two inline comments, just two nits. |
@@ -40,7 +40,8 @@ function detectWeb(doc, url) { | |||
if ((url.indexOf('/search') != -1 || url.indexOf('/showciting') != -1) && getSearchResults(doc).length) { | |||
return "multiple"; | |||
} | |||
if (url.indexOf('/viewdoc/') != -1 && doc.getElementById('bibtex')) { | |||
if ((url.indexOf('/viewdoc/') != -1 && doc.getElementById('bibtex')) | |||
|| url.indexOf('/download?doi=') != -1) { |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jan 28, 2018
Collaborator
We prefer nowadays the function .includes()
for such tests. Can you change these two (and possibly also the ones in the line above)?
@@ -66,6 +67,11 @@ function doWeb(doc, url) { | |||
} | |||
ZU.processDocuments(articles, scrape); | |||
}); | |||
} else if (url.indexOf('/download?doi=') !== -1) { | |||
let doi = url.replace('http://citeseerx.ist.psu.edu/viewdoc/download?doi=', ''); | |||
doi = doi.replace('&rep=rep1', '').replace('&type=pdf', ''); |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jan 28, 2018
Collaborator
This looks a little fragile. Thus, I would prefer to work here with a regular expression, i.e. something like:
let doi = url.match(/\/download\?doi=([^&]*)/);
let paperUrl = 'http://citeseerx.ist.psu.edu/viewdoc/summary?doi=' + doi[1];
ZU.processDocuments(paperUrl, scrape);
(Note, this code is not tested.)
This comment has been minimized.
This comment has been minimized.
Thanks for the comments! I added a test with the correct info, however when I run it I get an error:
I assume this is because it's a direct link to a PDF. I looked at the translator for arXiv.org and it has a PDF test case that also fails with the same error. I'm on Zotero 5.0.34 and Scaffold 3.3.0. Let me know if you want me to push the test case anyway. I have made the other changes as requested and tested them :) |
GuyAglionby
referenced this pull request
Jan 28, 2018
Closed
Add new Google Research translator #1503
This comment has been minimized.
This comment has been minimized.
Thank you! You are right, Scaffold tries to open the pdf directly and therefore a detection is not possible. Thus, I mentioned the test case in the comment and add some other cleanup and pushed directly to your branch. |
GuyAglionby commentedJan 28, 2018
As the title says, as the DOI is available in the PDF url we can get the URL for the summary page and grab the information from there.
Thanks and any comments appreciated.