Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Galegroup #1750

Merged
merged 9 commits into from Oct 8, 2018

Conversation

Projects
None yet
2 participants
@adam3smith
Copy link
Collaborator

commented Sep 24, 2018

This works for all standard Gale databases and is greatly simplified. A
small subset of Gale looks completely different & it makes little sense
to put/keep those in the same translator:

  • NewsVault
  • Archives Unbound
  • The Times Digital Archive
  • The Times Literary Supplement Historical Archive, 1902-2013
  • Eighteenth Century Collections Online

There already is a separate translator for GaleGDC.

No tests -- while there are permalinks, those include the university username...

Fix Galegroup
This works for all standard Gale databases and is greatly simplified. A 
small subset of Gale looks completely different & it makes little sense 
to put/keep those in the same translator:
- NewsVault
- Archives Unbound
- The Times Digital Archive
- The Times Literary Supplement Historical Archive, 1902-2013
- Eighteenth Century Collections Online

There already is a separate translator for GaleGDC

@adam3smith adam3smith requested a review from zuphilip Sep 24, 2018

@adam3smith adam3smith changed the title Fix Galegroup [WIP; don't merge] Fix Galegroup Sep 24, 2018

adam3smith added some commits Sep 25, 2018

@adam3smith adam3smith changed the title [WIP; don't merge] Fix Galegroup Fix Galegroup Sep 25, 2018

@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 25, 2018

@zuphilip This is now actually ready to check out. Two separate translators, one simple handling the unified modern databases, one adapting the old one for the legacy ones.

@zuphilip
Copy link
Collaborator

left a comment

The new translator looks great and I have only some small comments, which should be easy to fix.

}

processPage(doc, url);
}
}
/** BEGIN TEST CASES **/
}/** BEGIN TEST CASES **/

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Let us keep that newline before the (empty) test section. This makes it much easier to parse later...

@@ -2,20 +2,20 @@
"translatorID": "4ea89035-3dc4-4ae3-b22d-726bc0d83a64",
"label": "Galegroup",
"creator": "Sebastian Karcher and Aurimas Vinckevicius",
"target": "^https?://(find|go)\\.galegroup\\.com",
"target": "^https?://(find\\.galegroup\\.com/|go\\.galegroup\\.com/gdsc)",
"minVersion": "3.0",
"maxVersion": "",
"priority": 100,
"inRepository": true,
"translatorType": 4,
"browserSupport": "gcsib",

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Add all of the options here

Show resolved Hide resolved Galegroup.js
return results;
}

//Academic OneFile
//Academic ASAP
// Ecco

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

1 space after //

@@ -196,6 +200,30 @@ function composeAttachmentTDA(doc, url) {
return composeAttachmentGNV(doc, url);
}

function composeAttachmentEcco(doc, url) {
// This is the code for a post request for the PDF, not sure if this is possible, though?

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Was this the question Dan answered somewhere? I would rather not have these large code block in the comments as following here.

var productName = attr(doc, 'input.citationToolsData', 'data-productname');

var documentData = '{"docId":"' + docId +'","documentUrl":"' + documentUrl + '","productName":"' + productName + '"}';
var post = "citationFormat=RIS&documentData=" +encodeURIComponent(documentData).replace(/%20/g, "+");

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Add space after +


function parseRis(text, attachment) {
text = text.trim();
//gale puts issue numbers in M1

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Add space after //. Also in the following lines.


var documentData = '{"docId":"' + docId +'","documentUrl":"' + documentUrl + '","productName":"' + productName + '"}';
var post = "citationFormat=RIS&documentData=" +encodeURIComponent(documentData).replace(/%20/g, "+");
var attachment = composeAttachment(doc, url);

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Any reason that you do the attachment thing here in the scrape function and then pass it to the scrapeRIS function rather than doing it directly in the scrapeRIS function?

//gale puts issue numbers in M1
text = text.replace(/M1\s*\-/g, "IS -");
//L2 is probably meant to be UR, but we can ignore it altogether
text = text.replace(/^L2\s+-.+\n/gm, '');

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Can we restrict the content a little further here which we can safely delete? Is it always an URL? I fear the .+ regex part here a little...

This comment has been minimized.

Copy link
@adam3smith

adam3smith Oct 7, 2018

Author Collaborator

This just remove the L2 line entirely and is meant to. .+ doesn't match new lines, so there's really no danger here.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Okay, but we could also be more specific, e.g. something like

text = text.replace(/^L2\s+-\s+https?:\/\/\S+\s*$/gm, '');

Then, if the field L2 is also used as something else then URLs, we still consider it.

translator.setTranslator("32d59d2d-b65a-4da4-b0a3-bdd3cfb979e7");
translator.setString(text);
translator.setHandler("itemDone", function (obj, item) {
if(attachment) item.attachments.push(attachment);

This comment has been minimized.

Copy link
@zuphilip

zuphilip Oct 7, 2018

Collaborator

Space after if

adam3smith added some commits Oct 7, 2018

@adam3smith

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 8, 2018

Thanks! I addressed all your other comments except the L2 one. L2 maps to attachment/html in Zotero, i.e. it's looking for a local attachment and doesn't make a lot of sense here. I've also failed to find examples of this in a couple of translators, so this might be a moot point, but I'd like to keep the line in just in case.

@zuphilip zuphilip merged commit 449675b into zotero:master Oct 8, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zuphilip

This comment has been minimized.

Copy link
Collaborator

commented Oct 8, 2018

Great! Merged now 🎉 ! Thank you @adam3smith !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.