Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upFix itemType detection and add ISBN to Internet Archive metadata #2137
Conversation
Books on IA have ISBNs but this info isn't being scraped. Add support to add this info. Add book to tests with ISBN (Example URL) https://archive.org/details/darktowersdeutsc0000enri/mode/2up
This work but for some reason the 10 digit isbns aren't making it into the final citation... do those get cleaned out or something upstream? |
which citation output (how generated) are you referring to? |
The test case I added, https://archive.org/details/darktowersdeutsc0000enri/mode/2up When I do ZU.debug(newItem.ISBN) the field has 4 isbns, but when I use the translator file and curl it the isbn 10s are missing. I'm not too bothered by that if you aren't but I was wondering if that was a symptom of something going wrong!
|
PS I've linted a little, do you prefer it to go as additional commits in this thread, or as a separate PR? |
linting in the same PR, different (non-squashed, obv) commit, please. Thanks! |
@mvolz add your linted update when you get a chance? |
I never finished it! But in the meantime it looks like IA type detection is broken, which I've added to the commit. I'll try to finish the linting too. |
Upstream, the icon that indicates the itemType is no longer within the h1 tag, so detectWeb was broken. This fixes it so the xpath is now able to select the div containing the icon and use this to determine item type.
There are now only two lines left to lint: I actually wasn't sure how to handle those because I'm not familiar enough with the code base >.< |
Thanks! -- I fixed those two things.
|
Fix problems identified by eslint
mvolz commentedMar 6, 2020
Books on IA have ISBNs but this info isn't being
scraped. Add support to add this info.
Add book to tests with ISBN (Example URL)
https://archive.org/details/darktowersdeutsc0000enri/mode/2up