Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upTranslator for Oxford Reference #1325
Conversation
This comment has been minimized.
This comment has been minimized.
I looked at your two examles and I would suggest to try to differentetiate the two type by looking at the classes in var body = document.getElementsByTagName("body")[0];
var item;
if (body.className).indexOf('dctype-oxencycl-entry') > -1) {
item = new Zotero.Item("encyclopediaArticle");
} else { // class is then 'dctype-book'
item = new Zotero.Item("book");
} |
This comment has been minimized.
This comment has been minimized.
@zuphilip For pages that are locked and only fully visible after subscription have limited content in the abstract note (for eg. Accutane in test cases). Is it fine to leave it that way? |
sonali0901
changed the title
WIP: Translator for Oxford Reference
Translator for Oxford Reference
Jun 15, 2017
zuphilip
reviewed
Jun 15, 2017
Some of your xpath should be already covered by the Embedded Metadata translator and in the future we also try to scrape there the JSON-LD. Thus, I suggest that you call EM first and then just add the missing parts. Please have also a look at my other comments. |
"translatorID": "62415874-b53c-4afd-86e8-814e18a986f6", | ||
"label": "Oxford Reference", | ||
"creator": "Sonali Gupta", | ||
"target": "http://www.oxfordreference.com/", |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
Start with ^https?
and escape the points, which means \.
in Scaffold or \\.
in the textfile directly.
|
||
function detectWeb(doc, url) { | ||
if (url.indexOf("/search") != -1) | ||
return "multiple"; |
This comment has been minimized.
This comment has been minimized.
return "bookSection"; | ||
} | ||
else | ||
return "book"; |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
I guess this leads also to a lot of misclassifications, because all other websites from the domain will return book
, e.g. http://www.oxfordreference.com/page/law-subject/law but also simply http://www.oxfordreference.com/
Is it enough to filter on url.indexOf('/view/')>-1
?
var items = {}; | ||
var found = false; | ||
var rows = ZU.xpath(doc, '//span[@class="titlePart"]/a'); | ||
var rowsExtendedTitle = ZU.xpath(doc, '//span[@class="title"]'); |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
Why do you need two xpaths here? This is IMO fragile because the elements of the two xpaths could be numbered differently...
This comment has been minimized.
This comment has been minimized.
sonali0901
Jun 15, 2017
•
Author
Contributor
I did this because with the first XPath only the tile of the word we searched was visible. For eg, I searched Shalimar the Clown and all the entries in the pop up had Salman Rushdie as the title. So I used the second XPath to concatenate the name of the book from where the reference came to make it easier for the user to choose which one to save.
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
I think it would then be better to try something like:
var rows = ZU.xpath(doc, '//span[@class="titlePart"]');
...
var href = ZU.xpathText(rows[i], './a/@href');
var title = ZU.trimInternal(rows[i].textContent);
(Note: this is untested code.)
This comment has been minimized.
This comment has been minimized.
sonali0901
Jun 15, 2017
Author
Contributor
The two spans have different class ids. I will check this though.
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
Okay, I looked closer now.
How about the xpath
var rows = ZU.xpath(doc, '//h3[@class="source"]/a[span[@class="title"]]');
?
This comment has been minimized.
This comment has been minimized.
sonali0901
Jun 22, 2017
Author
Contributor
This xpath is giving the title as the book title(and not the section) in the pop-up window and the href is the book section page.
|
||
var edition = ZU.xpathText(doc, '//meta[@property="http://schema.org/bookEdition"]/@content'); | ||
if(edition) | ||
item.edition = edition; |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
You can do this directly (because ZU.xpathText
is a nice function), i.e.
item.edition = ZU.xpathText(doc, '//meta[@property="http://schema.org/bookEdition"]/@content');
"items": [ | ||
{ | ||
"itemType": "bookSection", | ||
"title": "Accutane - Oxford Reference", |
This comment has been minimized.
This comment has been minimized.
"url": "http://www.oxfordreference.com/view/10.1093/acref/9780199546572.001.0001/acref-9780199546572-e-0009", | ||
"items": [ | ||
{ | ||
"itemType": "bookSection", |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
The title of the containing book should also be saved, i.e. here it should be "A-Z of Plastic Surgery".
"url": "http://www.oxfordreference.com/view/10.1093/acref/9780199546572.001.0001/acref-9780199546572-e-0009", | ||
"items": [ | ||
{ | ||
"itemType": "bookSection", |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
For chapters we should also save the title of the book. This seems missing here.
], | ||
"date": "2008", | ||
"ISBN": "9780199546572", | ||
"abstractNote": "Isotretinoin. The synthetic retinoid derivative 13-cis-retinoic acid (Accutane) used for severe Acne vulgaris. The dose is 1", |
This comment has been minimized.
This comment has been minimized.
"items": [ | ||
{ | ||
"itemType": "book", | ||
"title": "Concise Oxford Companion to English Literature - Oxford Reference", |
This comment has been minimized.
This comment has been minimized.
zuphilip
Jun 15, 2017
Collaborator
We should clean the title such that the output is without the " - Oxford Reference".
sonali0901
and others
added some commits
Jun 22, 2017
zuphilip
added
the
New Translator
label
Nov 12, 2017
adam3smith
merged commit 80bd37d
into
zotero:master
Nov 24, 2017
1 check passed
This comment has been minimized.
This comment has been minimized.
Thanks! |
sonali0901 commentedJun 8, 2017
Need some inputs on how to identify item type as I couldn't figure out any way to uniquely identify books and books sections.
Fixes #796