Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upSAGE Journals - Resolved tag issue + detection issues #2077
Conversation
It looks like the tags are grouped. If you only take the first group, that should work nicely. |
@@ -2,14 +2,14 @@ | |||
"translatorID": "908c1ca2-59b6-4ad8-b026-709b7b927bda", | |||
"label": "SAGE Journals", | |||
"creator": "Sebastian Karcher", | |||
"target": "^https?://journals\\.sagepub\\.com(/doi/((abs|full)/)?10\\.|/action/doSearch\\?|/toc/)", | |||
"target": "^https?://journals\\.sagepub\\.com(/doi/((abs|full|pdf)/)?10\\.|/action/doSearch\\?|/toc/)", |
This comment has been minimized.
This comment has been minimized.
adam3smith
Dec 7, 2019
Collaborator
Did you test this on a PDF page? Given that the filename for the post request (l. 91) is scraped from the page, I don't see how it would work.
This comment has been minimized.
This comment has been minimized.
placardo
Dec 7, 2019
Author
Contributor
I added the pdf because a lot of webpages besides doi/abs and doi/full have doi/pdf in their URL, see eg the search results included in the translator tests (https://journals.sagepub.com/action/doSearch?text1=test)
On Firefox when I'm on the actual pdf page the PDF translator is activated instead of SAGE Journals translator and on Chrome the SAGE Journal translator is sometimes activated and yet I can still import the item (eg here : https://journals.sagepub.com/doi/pdf/10.1177/003754979005400307?casa_token=8AElK4mX07EAAAAA:WxK8lh-rsHkjRwcM1Z8EldOJF-eKIOC3nCaL0gCLxHTQOohZcnxb9ykF8F7KJSboIuwQsisPgNl8cQ I think that's when I previously visited the webpage linking to the pdf), sometimes only the PDF translator shows up (eg here which goes directly to pdf without a webpage: https://journals.sagepub.com/doi/pdf/10.1177/003754979005400307)
This comment has been minimized.
This comment has been minimized.
adam3smith
Dec 8, 2019
Collaborator
Good catch on these turning up in search results, but that actually means we need to replace /doi/pdf
in search results with /doi/abs
to avoid import errors. Let me know if that makes sense and you can add it.
I don't have an explanation for why you'd get a working import from Chrome in the case you have -- there's really no way this should work that I can think of. The fact that it's always showing the PDF translator for Firefox is due to a restriction of the extension framework there, but that's obviously not a reason to include PDF URLs. With things working correctly, I'd expect false positives there so please rever that part both in the target and the detect function.
This comment has been minimized.
This comment has been minimized.
placardo
Dec 8, 2019
•
Author
Contributor
Actually this works without the filename argument in the POST request, if you remove it you still have a valid and complete response I guess the DOI is enough but I wasn't able to find any documentation about which arguments are necessary.
Should I really remove it from detect and target? I don't mean to squabble but it would defeat the fix as loads of pages have pdf in their URL but do not lead directly to a pdf because of the paywall and the translator wouldn't be triggered on those pages
This comment has been minimized.
This comment has been minimized.
adam3smith
Dec 10, 2019
Collaborator
You're right, the pdf should stay in the taget (and please do squabble if you think any of my or zuphilip's comments are wrong). I thought it redirected on paywalled pages, but it doesn't.
I've removed the unneeded post parameter (thanks for that) and redirected multiple imports to the abstract page (so we can get language, improved abstract, and where they exist, keywords).
If you could please update tests (since they now get keywords), this is ready to merge.
This comment has been minimized.
This comment has been minimized.
adam3smith
Dec 10, 2019
Collaborator
Oh, and please fix linting errors and warnings. Should be quick.
This comment has been minimized.
This comment has been minimized.
placardo
Dec 10, 2019
Author
Contributor
I added the tests manually since it does not work in Scaffold (cf above) and fixed the errors and warnings
This comment has been minimized.
This comment has been minimized.
adam3smith
Dec 10, 2019
Collaborator
oh, sorry! I could have done it then. I don't have that problem, so not sure what's going on with that.
This comment has been minimized.
This comment has been minimized.
placardo
Dec 10, 2019
Author
Contributor
np glad to help! I will look into it I'm sure there is a reasonable explanation
This comment has been minimized.
This comment has been minimized.
Thanks! |
placardo commentedDec 6, 2019
Fix #1789 and fix #2075. This works in my browser but it doesn't work in Scaffold; when the scrape function evaluate url.match() line 85 it throws an error, because in Scaffold the URL is https://journals.sagepub.com/action/**cookieAbsent** instead of the DOI but it seems to work otherwise.
From issue #1789 :
Does some Zotero Utility or something else exists to select only one language?