Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix itemType detection and add ISBN to Internet Archive metadata #2137

Open
wants to merge 3 commits into
base: master
from

Conversation

@mvolz
Copy link
Contributor

mvolz commented Mar 6, 2020

Books on IA have ISBNs but this info isn't being
scraped. Add support to add this info.

Add book to tests with ISBN (Example URL)
https://archive.org/details/darktowersdeutsc0000enri/mode/2up

Books on IA have ISBNs but this info isn't being
scraped. Add support to add this info.

Add book to tests with ISBN (Example URL)
https://archive.org/details/darktowersdeutsc0000enri/mode/2up
@mvolz
Copy link
Contributor Author

mvolz commented Mar 6, 2020

This work but for some reason the 10 digit isbns aren't making it into the final citation... do those get cleaned out or something upstream?

@adam3smith
Copy link
Collaborator

adam3smith commented Mar 6, 2020

which citation output (how generated) are you referring to?
Zotero does have a clean.ISBN function that might discard the second ISBN, but I'm not sure where it would be run automatically (as opposed to being called explicitly in the translator)

@mvolz
Copy link
Contributor Author

mvolz commented Mar 6, 2020

which citation output (how generated) are you referring to?
Zotero does have a clean.ISBN function that might discard the second ISBN, but I'm not sure where it would be run automatically (as opposed to being called explicitly in the translator)

The test case I added, https://archive.org/details/darktowersdeutsc0000enri/mode/2up

When I do ZU.debug(newItem.ISBN) the field has 4 isbns, but when I use the translator file and curl it the isbn 10s are missing. I'm not too bothered by that if you aren't but I was wondering if that was a symptom of something going wrong!

curl -d 'https://archive.org/details/darktowersdeutsc0000enri/mode/2up' -H 'Content-Type: text/plain' http://127.0.0.1:1969/web

[{"key":"9S6E5TKU","version":0,"itemType":"book","creators":[{"firstName":"David","lastName":"Enrich","creatorType":"author"},{"name":"Internet Archive","creatorType":"contributor"}],"tags":[{"tag":"Trump, Donald, 1946-","type":1}],"title":"Dark towers : Deutsche Bank, Donald Trump, and an epic trail of destruction","abstractNote":"x, 402 pages ; 24 cm; \"A searing exposé by an award-winning journalist of the most scandalous bank in the world, including its shadowy ties to Donald Trump's business empire\"--; Includes bibliographical references (pages 367-390) and index; \"A searing expose by an award-winning journalist of the most scandalous bank in the world, including its shadowy ties to Donald Trump's business empire\"--","date":"2020","publisher":"New York, NY : Custom House","language":"eng","numPages":"426","ISBN":"9780062878816 9780062878830","url":"http://archive.org/details/darktowersdeutsc0000enri","libraryCatalog":"Internet Archive","accessDate":"2020-03-06T13:45:37Z","shortTitle":"Dark towers"}]
@mvolz
Copy link
Contributor Author

mvolz commented Mar 6, 2020

PS I've linted a little, do you prefer it to go as additional commits in this thread, or as a separate PR?

@adam3smith
Copy link
Collaborator

adam3smith commented Mar 6, 2020

linting in the same PR, different (non-squashed, obv) commit, please. Thanks!

@adam3smith
Copy link
Collaborator

adam3smith commented Apr 15, 2020

@mvolz add your linted update when you get a chance?

@mvolz
Copy link
Contributor Author

mvolz commented Jul 16, 2020

@mvolz add your linted update when you get a chance?

I never finished it! But in the meantime it looks like IA type detection is broken, which I've added to the commit. I'll try to finish the linting too.

@mvolz mvolz changed the title Add ISBN to Internet Archive metadata Fix itemType detection and add ISBN to Internet Archive metadata Jul 16, 2020
@mvolz mvolz force-pushed the mvolz:internetarchive branch from b9e9a8a to 817a25e Jul 16, 2020
Upstream, the icon that indicates the
itemType is no longer within the h1 tag,
so detectWeb was broken. This fixes it
so the xpath is now able to select the div
containing the icon and use this to
determine item type.
@mvolz mvolz force-pushed the mvolz:internetarchive branch from 817a25e to 671e338 Jul 16, 2020
@mvolz
Copy link
Contributor Author

mvolz commented Jul 16, 2020

There are now only two lines left to lint:
39:10 error Expected to return a value at the end of function 'detectWeb' consistent-return
225:52 error Expected to return a value at the end of function consistent-return

I actually wasn't sure how to handle those because I'm not familiar enough with the code base >.<

@mvolz mvolz force-pushed the mvolz:internetarchive branch from e852221 to 64f618d Jul 16, 2020
@adam3smith
Copy link
Collaborator

adam3smith commented Jul 16, 2020

Thanks! -- I fixed those two things.
Could you

  1. make sure the tests are updated (Zotero now runs those in CI and it looks like there are some differences, but those may also been due to different locations)
  2. pull my two commits and then rebase&squash this into 2 commits, one substantive the other one all the linting changes?
Fix problems identified by eslint
@mvolz mvolz force-pushed the mvolz:internetarchive branch from 17fcb52 to 36e3515 Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.