New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Google Research translator #1503

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@GuyAglionby
Contributor

GuyAglionby commented Dec 23, 2017

There was some embedded metadata but it didn't capture everything + it didn't capture the PDF, so added a translator :)

@adam3smith

This comment has been minimized.

Show comment
Hide comment
@adam3smith

adam3smith Dec 27, 2017

Collaborator

did you close this on purpose?

Collaborator

adam3smith commented Dec 27, 2017

did you close this on purpose?

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby

GuyAglionby Dec 27, 2017

Contributor

Oh -- no, sorry about that. Reopened.

I will commit changes later on to take into account the feedback on the other commit

Contributor

GuyAglionby commented Dec 27, 2017

Oh -- no, sorry about that. Reopened.

I will commit changes later on to take into account the feedback on the other commit

@GuyAglionby GuyAglionby reopened this Dec 27, 2017

@adam3smith

This comment has been minimized.

Show comment
Hide comment
@adam3smith

adam3smith Dec 27, 2017

Collaborator

OK, will wait with the review then, thanks!

Collaborator

adam3smith commented Dec 27, 2017

OK, will wait with the review then, thanks!

@zuphilip zuphilip added the waiting label Jan 1, 2018

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby

GuyAglionby Jan 28, 2018

Contributor

Hi, sorry for the delay but I've now rewritten the translator so it can handle author pages, search results, and the pages itself. I apologise that this is quite a big PR.

Just a weird thing I'd like to explain: some of the search results are direct links to the paper's PDF in the format https://research.google.com/pubs/archive/#####.pdf . This ID (the #####) can be used to find the publication's info page containing all the necessary info to create a Zotero item, which is what I have done. However, in some cases this page doesn't exist, so the extra GET checks that.

I'm not happy about how this is done on line 140 -- ideally we'd use responseObj.status, however when I try this I get a Permission Denied error, which I understand is CSRF protection. I don't know if there's any way to get around this, and would appreciate any pointers :)

Any other comments welcomed, thanks in advance!

Contributor

GuyAglionby commented Jan 28, 2018

Hi, sorry for the delay but I've now rewritten the translator so it can handle author pages, search results, and the pages itself. I apologise that this is quite a big PR.

Just a weird thing I'd like to explain: some of the search results are direct links to the paper's PDF in the format https://research.google.com/pubs/archive/#####.pdf . This ID (the #####) can be used to find the publication's info page containing all the necessary info to create a Zotero item, which is what I have done. However, in some cases this page doesn't exist, so the extra GET checks that.

I'm not happy about how this is done on line 140 -- ideally we'd use responseObj.status, however when I try this I get a Permission Denied error, which I understand is CSRF protection. I don't know if there's any way to get around this, and would appreciate any pointers :)

Any other comments welcomed, thanks in advance!

@dstillman

This comment has been minimized.

Show comment
Hide comment
@dstillman

dstillman Jan 28, 2018

Member

However, in some cases this page doesn't exist, so the extra GET checks that.

Can you provide an example of a page where that happens?

Member

dstillman commented Jan 28, 2018

However, in some cases this page doesn't exist, so the extra GET checks that.

Can you provide an example of a page where that happens?

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby
Contributor

GuyAglionby commented Jan 28, 2018

@adam3smith

I'm happy with this, just 2 small nits. @dstillman any objections to the handling of the 404s?

Show outdated Hide outdated Google Research.js
]
},
{
"type": "web",

This comment has been minimized.

@adam3smith

adam3smith Jan 28, 2018

Collaborator

could you also add a test for search results (or am I just overlooking that)?

@adam3smith

adam3smith Jan 28, 2018

Collaborator

could you also add a test for search results (or am I just overlooking that)?

@zuphilip zuphilip removed the waiting label Jan 28, 2018

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby

GuyAglionby Jan 28, 2018

Contributor

This may be a silly question, but if I add a test as follows:

{
	"type": "web",
	"url": "https://research.google.com/search.html?q=algorithm",
	"defer": true,
	"items": "multiple"
}

It fails with TranslatorTester: Google Research Test 1: failed (Translation failed: no items returned). It seems like this is because the search results are loaded by JS after the page is loaded, which is why I put the defer in there, but that doesn't seem to be actually making the test wait. I think I'm missing something?

I'm also happy to add a test case for the direct PDF link, but am having the same issue with the test failing as with the CiteSeer parser.

Happy to make the other change, will push with the test cases as and when. Thanks!

Contributor

GuyAglionby commented Jan 28, 2018

This may be a silly question, but if I add a test as follows:

{
	"type": "web",
	"url": "https://research.google.com/search.html?q=algorithm",
	"defer": true,
	"items": "multiple"
}

It fails with TranslatorTester: Google Research Test 1: failed (Translation failed: no items returned). It seems like this is because the search results are loaded by JS after the page is loaded, which is why I put the defer in there, but that doesn't seem to be actually making the test wait. I think I'm missing something?

I'm also happy to add a test case for the direct PDF link, but am having the same issue with the test failing as with the CiteSeer parser.

Happy to make the other change, will push with the test cases as and when. Thanks!

@adam3smith

This comment has been minimized.

Show comment
Hide comment
@adam3smith

adam3smith Feb 1, 2018

Collaborator

Yeah, the test framework sometimes hits limits, do what @zuphilip did for the other translator, i.e. comment out the test so that we have it ready for manual testing as needed. Once you remove the itemID, this is ready to go. Thanks!

Collaborator

adam3smith commented Feb 1, 2018

Yeah, the test framework sometimes hits limits, do what @zuphilip did for the other translator, i.e. comment out the test so that we have it ready for manual testing as needed. Once you remove the itemID, this is ready to go. Thanks!

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby

GuyAglionby Feb 1, 2018

Contributor

Amended as requested. Thanks!

Contributor

GuyAglionby commented Feb 1, 2018

Amended as requested. Thanks!

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby

GuyAglionby Feb 15, 2018

Contributor

Hey there, just wanted to double check that I wasn't missing any other requested changes before this is merged?

Contributor

GuyAglionby commented Feb 15, 2018

Hey there, just wanted to double check that I wasn't missing any other requested changes before this is merged?

@adam3smith

This comment has been minimized.

Show comment
Hide comment
@adam3smith

adam3smith Feb 15, 2018

Collaborator

I don't think so. I'm just behind with review & merge

Collaborator

adam3smith commented Feb 15, 2018

I don't think so. I'm just behind with review & merge

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby

GuyAglionby Feb 15, 2018

Contributor

No problem at all -- I just wanted to be sure. Cheers!

Contributor

GuyAglionby commented Feb 15, 2018

No problem at all -- I just wanted to be sure. Cheers!

@adam3smith

This comment has been minimized.

Show comment
Hide comment
@adam3smith

adam3smith Sep 15, 2018

Collaborator

superseded by #1741

Collaborator

adam3smith commented Sep 15, 2018

superseded by #1741

@adam3smith adam3smith closed this Sep 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment