Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd new Google Research translator #1503
Conversation
zuphilip
added
the
New Translator
label
Dec 25, 2017
GuyAglionby
closed this
Dec 27, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
did you close this on purpose? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
GuyAglionby
Dec 27, 2017
Contributor
Oh -- no, sorry about that. Reopened.
I will commit changes later on to take into account the feedback on the other commit
Oh -- no, sorry about that. Reopened. I will commit changes later on to take into account the feedback on the other commit |
GuyAglionby
reopened this
Dec 27, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
OK, will wait with the review then, thanks! |
zuphilip
added
the
waiting
label
Jan 1, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
GuyAglionby
Jan 28, 2018
Contributor
Hi, sorry for the delay but I've now rewritten the translator so it can handle author pages, search results, and the pages itself. I apologise that this is quite a big PR.
Just a weird thing I'd like to explain: some of the search results are direct links to the paper's PDF in the format https://research.google.com/pubs/archive/#####.pdf . This ID (the #####
) can be used to find the publication's info page containing all the necessary info to create a Zotero item, which is what I have done. However, in some cases this page doesn't exist, so the extra GET checks that.
I'm not happy about how this is done on line 140 -- ideally we'd use responseObj.status
, however when I try this I get a Permission Denied error, which I understand is CSRF protection. I don't know if there's any way to get around this, and would appreciate any pointers :)
Any other comments welcomed, thanks in advance!
Hi, sorry for the delay but I've now rewritten the translator so it can handle author pages, search results, and the pages itself. I apologise that this is quite a big PR. Just a weird thing I'd like to explain: some of the search results are direct links to the paper's PDF in the format https://research.google.com/pubs/archive/#####.pdf . This ID (the I'm not happy about how this is done on line 140 -- ideally we'd use Any other comments welcomed, thanks in advance! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dstillman
Jan 28, 2018
Member
However, in some cases this page doesn't exist, so the extra GET checks that.
Can you provide an example of a page where that happens?
Can you provide an example of a page where that happens? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
GuyAglionby
Jan 28, 2018
Contributor
Sorry, not sure how this slipped my mind. First result of https://research.google.com/search.html?q=simple%20risk%20bounds is https://research.google.com/pubs/archive/35621.pdf , where the corresponding https://research.google.com/pubs/pub35621.html is a 404.
Sorry, not sure how this slipped my mind. First result of https://research.google.com/search.html?q=simple%20risk%20bounds is https://research.google.com/pubs/archive/35621.pdf , where the corresponding https://research.google.com/pubs/pub35621.html is a 404. |
adam3smith
requested changes
Jan 28, 2018
I'm happy with this, just 2 small nits. @dstillman any objections to the handling of the 404s?
] | ||
}, | ||
{ | ||
"type": "web", |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adam3smith
Jan 28, 2018
Collaborator
could you also add a test for search results (or am I just overlooking that)?
adam3smith
Jan 28, 2018
Collaborator
could you also add a test for search results (or am I just overlooking that)?
zuphilip
removed
the
waiting
label
Jan 28, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
GuyAglionby
Jan 28, 2018
Contributor
This may be a silly question, but if I add a test as follows:
{
"type": "web",
"url": "https://research.google.com/search.html?q=algorithm",
"defer": true,
"items": "multiple"
}
It fails with TranslatorTester: Google Research Test 1: failed (Translation failed: no items returned)
. It seems like this is because the search results are loaded by JS after the page is loaded, which is why I put the defer
in there, but that doesn't seem to be actually making the test wait. I think I'm missing something?
I'm also happy to add a test case for the direct PDF link, but am having the same issue with the test failing as with the CiteSeer parser.
Happy to make the other change, will push with the test cases as and when. Thanks!
This may be a silly question, but if I add a test as follows:
It fails with I'm also happy to add a test case for the direct PDF link, but am having the same issue with the test failing as with the CiteSeer parser. Happy to make the other change, will push with the test cases as and when. Thanks! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adam3smith
Feb 1, 2018
Collaborator
Yeah, the test framework sometimes hits limits, do what @zuphilip did for the other translator, i.e. comment out the test so that we have it ready for manual testing as needed. Once you remove the itemID, this is ready to go. Thanks!
Yeah, the test framework sometimes hits limits, do what @zuphilip did for the other translator, i.e. comment out the test so that we have it ready for manual testing as needed. Once you remove the itemID, this is ready to go. Thanks! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Amended as requested. Thanks! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
GuyAglionby
Feb 15, 2018
Contributor
Hey there, just wanted to double check that I wasn't missing any other requested changes before this is merged?
Hey there, just wanted to double check that I wasn't missing any other requested changes before this is merged? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
I don't think so. I'm just behind with review & merge |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
No problem at all -- I just wanted to be sure. Cheers! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
superseded by #1741 |
GuyAglionby commentedDec 23, 2017
There was some embedded metadata but it didn't capture everything + it didn't capture the PDF, so added a translator :)