Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Books #2052

Merged
merged 6 commits into from Dec 8, 2019

Conversation

@adam3smith
Copy link
Collaborator

adam3smith commented Nov 8, 2019

Account for new format, simplify

Google Books.js Outdated Show resolved Hide resolved
Google Books.js Outdated Show resolved Hide resolved
@adam3smith adam3smith force-pushed the adam3smith:google-books branch from 58808ed to 50b67b5 Nov 8, 2019
@zuphilip

This comment has been minimized.

Copy link
Collaborator

zuphilip commented Nov 9, 2019

I guess that this then fixes #2049.

Copy link
Collaborator

zuphilip left a comment

Looks fine in general. Just a few comments.

It is bad, that the authors in the test cases are multiplied with different name variants, but this is completely Google's fault and I don't see a way for us to fix it here.

Google Books.js Outdated Show resolved Hide resolved
Google Books.js Outdated Show resolved Hide resolved
Google Books.js Outdated Show resolved Hide resolved
var items = {};
var found = false;
// TODO: adjust the CSS selector
var rows = doc.querySelectorAll('div.bHexk>a, div.Q9MA7b>a');

This comment has been minimized.

Copy link
@zuphilip

This comment has been minimized.

Copy link
@adam3smith

adam3smith Nov 10, 2019

Author Collaborator

yeah, the first one works for all regular searches, the 2nd one for all google play ones. Will add a comment to explain. I know the class names look wonky, but I looked through quite a number of different versions & searches and it worked everywhere. The alternative would be a longer div tree without attribute, which I felt was more fragile.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Nov 10, 2019

Collaborator

When I open the mentioned search page in my browser, then I don't see any divs of class bHexk. Do you? Is this page depending on the regions where you are located?

firefox_2019-11-10_12-13-51

This comment has been minimized.

Copy link
@adam3smith

adam3smith Nov 11, 2019

Author Collaborator

oh wow, yes, everything below <div class="srg"> looks completely different for me.
I think I'll go with ZU.xpath(doc, '//div[@class="srg"]//a[h3]) and then just test for length and handle the play store separately.
image

@zuphilip zuphilip changed the title Google books Google Books Nov 9, 2019
@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

adam3smith commented Nov 10, 2019

Thanks -- fixed or responded. Yeah, I looked for a solution to the author issue, but they're not just in the XML data but also in the page and the bibtex Google offer for the new google books layout, so I really wouldn't know how to avoid them.

@zuphilip

This comment has been minimized.

Copy link
Collaborator

zuphilip commented Nov 17, 2019

@adam3smith Let me know when you could look into the open problem #2052 (comment) and fix it.

@zuphilip zuphilip added the waiting label Nov 17, 2019
@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

adam3smith commented Nov 19, 2019

@zuphilip please take another look. Could you check if the playstore works for you / looks the same for you? I'm worried that class also changes depending on locale/user account/googles current mood

@zuphilip zuphilip removed the waiting label Nov 20, 2019
Copy link
Collaborator

zuphilip left a comment

Yes, the books on playstore work for me as well. That is fine. I have two more suggestions/comments.

"creator": "Simon Kornblith, Michael Berkowitz and Rintze Zelle",
"target": "^https?://(books|www)\\.google\\.[a-z]+(\\.[a-z]+)?/(books(/.*)?\\?(.*id=.*|.*q=.*)|search\\?.*?(btnG=Search\\+Books|tbm=bks))|^https?://play\\.google\\.[a-z]+(\\.[a-z]+)?/(store/)?(books|search\\?.+&c=books)",
"creator": "Simon Kornblith, Michael Berkowitz, Rintze Zelle, and Sebastian Karcher",
"target": "^https?://(books|www)\\.google\\.[a-z]+/(books(/.*)?\\?(.*id=.*|.*q=.*)|search\\?.*?(btnG=Search\\+Books|tbm=bks)|books/edition/)|^https?://play\\.google\\.[a-z]+(\\.[a-z]+)?/(store/)?(books|search\\?.+&c=books)",

This comment has been minimized.

Copy link
@zuphilip

zuphilip Nov 21, 2019

Collaborator

What about audiobooks, e.g. https://play.google.com/store/audiobooks/details/Arthur_Conan_Doyle_Der_Hund_von_Baskerville?id=AQAAAECM-EG43M which is also linked in the test case below?

This comment has been minimized.

Copy link
@zuphilip

zuphilip Nov 21, 2019

Collaborator

Moreover, also the following link is possible https://play.google.com/store/search?c=books&q=kling but not recognized.

@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

adam3smith commented Dec 8, 2019

@zuphilip if you'd take another look?
I've added support for https://play.google.com/store/search?c=books&q=kling type URLs in the play store and removed detection/selection for audiobooks -- their metadata isn't included in the standard google book metadata API and I think it's fine to not cover them. The play store is probably a fairly marginal source here anyway

@zuphilip zuphilip merged commit 82b64f9 into zotero:master Dec 8, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zuphilip

This comment has been minimized.

Copy link
Collaborator

zuphilip commented Dec 8, 2019

Great! Thank you @adam3smith ! This is now squashed and merged 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.