Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upRewrite ACL translator #1577
Conversation
zuphilip
requested changes
Mar 18, 2018
This looks great! Only some smaller things to go through and maybe try to distinguish the journal articles from the proceeding articles.
Are you aware of the new platform https://aclanthology.coli.uni-saarland.de/ ? They write that the legacy version (the one you looked at) will be replaced as the default version starting some time in 2018.
ACLWeb.js
function scrapeIndex(doc, items) { | ||
var results; | ||
var doImport; | ||
Copyright © 2018 Nathan Schneider, Guy Aglionby |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ACLWeb.js
"minVersion": "1.0.7", | ||
"creator": "Nathan Schneider, Guy Aglionby", | ||
"target": "^https?://(www\\.)?aclweb\\.org/anthology/[^#]+", | ||
"minVersion": "3", |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ACLWeb.js
function scrapeProceedings(doc, id) { | ||
let newItem = new Zotero.Item("conferencePaper"); |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zuphilip
Mar 18, 2018
Collaborator
There are also journal articles (at character J
and maybe others), e.g. http://aclweb.org/anthology/J/J17/. Can we try to distinguish those?
zuphilip
Mar 18, 2018
Collaborator
There are also journal articles (at character J
and maybe others), e.g. http://aclweb.org/anthology/J/J17/. Can we try to distinguish those?
ACLWeb.js
let paragraphXpath = '//p[a[text()="' + id + '"]]/'; | ||
let pdfURL = ZU.xpath(doc, paragraphXpath + 'a')[0].href; |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zuphilip
Mar 18, 2018
Collaborator
I suggest to filter here on the href attribute instead of relying on the order, i.e.
let pdfURL = ZU.xpathText(doc, paragraphXpath + 'a[contains(@href, "pdf")]/@href');
zuphilip
Mar 18, 2018
Collaborator
I suggest to filter here on the href attribute instead of relying on the order, i.e.
let pdfURL = ZU.xpathText(doc, paragraphXpath + 'a[contains(@href, "pdf")]/@href');
ACLWeb.js
return 'not(contains(., "' + title + '"))'; | ||
}).join(' and '); | ||
let baseXpath = '//div[@id="content"]/p[i[' + unwantedTitles + ']]/' |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ACLWeb.js
let etAl = ''; | ||
if (articleAuthors.length > 1) { | ||
etAl = ' et al.'; | ||
} |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zuphilip
Mar 18, 2018
Collaborator
You can also write this with the ternary operator:
let etAl = articleAuthors.length > 1 ? ' et al.' : '';
zuphilip
Mar 18, 2018
Collaborator
You can also write this with the ternary operator:
let etAl = articleAuthors.length > 1 ? ' et al.' : '';
}); | ||
}); | ||
} else if(url.endsWith('.bib')) { | ||
// e.g. http://www.aclweb.org/anthology/P10-4014.bib |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
GuyAglionby
Mar 18, 2018
Contributor
This test fails with an error
21:58:35 TranslatorTester: Running ACL Test 3
21:58:35 TranslatorTester: Translating http://www.aclweb.org/anthology/P10-4014.bib
21:58:35 TranslatorTester: ACL Test 3: failed (Translation failed to initialize: TypeError: doc is null)
I have double checked that it works in Chrome though, is that sufficient?
GuyAglionby
Mar 18, 2018
Contributor
This test fails with an error
21:58:35 TranslatorTester: Running ACL Test 3
21:58:35 TranslatorTester: Translating http://www.aclweb.org/anthology/P10-4014.bib
21:58:35 TranslatorTester: ACL Test 3: failed (Translation failed to initialize: TypeError: doc is null)
I have double checked that it works in Chrome though, is that sufficient?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
GuyAglionby
Mar 18, 2018
Contributor
Thanks for all of your comments! I have made the requested changes.
I've seen the other site, but didn't realise it was changing to be the main version. I'll look into making a translator for that as well. Thanks for the heads up!
Edit: Just seen @adam3smith commented here saying he was working on a rewrite for the new site. I'll hold off working on it until I hear back from you; happy to work on it if you haven't started but equally happy if you have.
Thanks for all of your comments! I have made the requested changes. I've seen the other site, but didn't realise it was changing to be the main version. I'll look into making a translator for that as well. Thanks for the heads up! Edit: Just seen @adam3smith commented here saying he was working on a rewrite for the new site. I'll hold off working on it until I hear back from you; happy to work on it if you haven't started but equally happy if you have. |
zuphilip
merged commit 9fba327
into
zotero:master
Mar 24, 2018
1 check passed
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zuphilip
Mar 24, 2018
Collaborator
Thank you @GuyAglionby very much! I did some final changes and the PR is now merged
@adam3smith Did you start to work on the new site of ACL? Or can @GuyAglionby start from scratch?
Thank you @GuyAglionby very much! I did some final changes and the PR is now merged @adam3smith Did you start to work on the new site of ACL? Or can @GuyAglionby start from scratch? |
GuyAglionby commentedMar 6, 2018
Hi there. While the previous translator works on some of the proceedings pages (http://www.aclweb.org/anthology/P/P93/), it doesn't work from the direct PDF view of papers, nor does it work for proceedings pages where there is no bibtex link (e.g.). I've re-written and modernised the code so it works for both of these cases.
Comments welcomed, thanks in advance!