New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ACL translator #1577

Merged
merged 3 commits into from Mar 24, 2018

Conversation

Projects
None yet
2 participants
@GuyAglionby
Contributor

GuyAglionby commented Mar 6, 2018

Hi there. While the previous translator works on some of the proceedings pages (http://www.aclweb.org/anthology/P/P93/), it doesn't work from the direct PDF view of papers, nor does it work for proceedings pages where there is no bibtex link (e.g.). I've re-written and modernised the code so it works for both of these cases.

Comments welcomed, thanks in advance!

@zuphilip

This looks great! Only some smaller things to go through and maybe try to distinguish the journal articles from the proceeding articles.

Are you aware of the new platform https://aclanthology.coli.uni-saarland.de/ ? They write that the legacy version (the one you looked at) will be replaced as the default version starting some time in 2018.

Show outdated Hide outdated ACLWeb.js
function scrapeIndex(doc, items) {
var results;
var doImport;
Copyright © 2018 Nathan Schneider, Guy Aglionby

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

Write only your name here, because this is a complete rewrite.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

Write only your name here, because this is a complete rewrite.

Show outdated Hide outdated ACLWeb.js
"minVersion": "1.0.7",
"creator": "Nathan Schneider, Guy Aglionby",
"target": "^https?://(www\\.)?aclweb\\.org/anthology/[^#]+",
"minVersion": "3",

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

3.0

@zuphilip

zuphilip Mar 18, 2018

Collaborator

3.0

Show outdated Hide outdated ACLWeb.js
function scrapeProceedings(doc, id) {
let newItem = new Zotero.Item("conferencePaper");

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

There are also journal articles (at character J and maybe others), e.g. http://aclweb.org/anthology/J/J17/. Can we try to distinguish those?

@zuphilip

zuphilip Mar 18, 2018

Collaborator

There are also journal articles (at character J and maybe others), e.g. http://aclweb.org/anthology/J/J17/. Can we try to distinguish those?

Show outdated Hide outdated ACLWeb.js
let paragraphXpath = '//p[a[text()="' + id + '"]]/';
let pdfURL = ZU.xpath(doc, paragraphXpath + 'a')[0].href;

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

I suggest to filter here on the href attribute instead of relying on the order, i.e.

let pdfURL = ZU.xpathText(doc, paragraphXpath + 'a[contains(@href, "pdf")]/@href');
@zuphilip

zuphilip Mar 18, 2018

Collaborator

I suggest to filter here on the href attribute instead of relying on the order, i.e.

let pdfURL = ZU.xpathText(doc, paragraphXpath + 'a[contains(@href, "pdf")]/@href');
Show outdated Hide outdated ACLWeb.js
return 'not(contains(., "' + title + '"))';
}).join(' and ');
let baseXpath = '//div[@id="content"]/p[i[' + unwantedTitles + ']]/'

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

Add semi-colon

@zuphilip

zuphilip Mar 18, 2018

Collaborator

Add semi-colon

Show outdated Hide outdated ACLWeb.js
let etAl = '';
if (articleAuthors.length > 1) {
etAl = ' et al.';
}

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

You can also write this with the ternary operator:

let etAl = articleAuthors.length > 1 ? ' et al.' : '';
@zuphilip

zuphilip Mar 18, 2018

Collaborator

You can also write this with the ternary operator:

let etAl = articleAuthors.length > 1 ? ' et al.' : '';
});
});
} else if(url.endsWith('.bib')) {
// e.g. http://www.aclweb.org/anthology/P10-4014.bib

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

This looks like a good test case we also should add.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

This looks like a good test case we also should add.

This comment has been minimized.

@GuyAglionby

GuyAglionby Mar 18, 2018

Contributor

This test fails with an error

21:58:35 TranslatorTester: Running ACL Test 3
21:58:35 TranslatorTester: Translating http://www.aclweb.org/anthology/P10-4014.bib
21:58:35 TranslatorTester: ACL Test 3: failed (Translation failed to initialize: TypeError: doc is null)

I have double checked that it works in Chrome though, is that sufficient?

@GuyAglionby

GuyAglionby Mar 18, 2018

Contributor

This test fails with an error

21:58:35 TranslatorTester: Running ACL Test 3
21:58:35 TranslatorTester: Translating http://www.aclweb.org/anthology/P10-4014.bib
21:58:35 TranslatorTester: ACL Test 3: failed (Translation failed to initialize: TypeError: doc is null)

I have double checked that it works in Chrome though, is that sufficient?

This comment has been minimized.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

Ah, okay. Yes, fine then.

@zuphilip

zuphilip Mar 18, 2018

Collaborator

Ah, okay. Yes, fine then.

@GuyAglionby

This comment has been minimized.

Show comment
Hide comment
@GuyAglionby

GuyAglionby Mar 18, 2018

Contributor

Thanks for all of your comments! I have made the requested changes.

I've seen the other site, but didn't realise it was changing to be the main version. I'll look into making a translator for that as well. Thanks for the heads up!

Edit: Just seen @adam3smith commented here saying he was working on a rewrite for the new site. I'll hold off working on it until I hear back from you; happy to work on it if you haven't started but equally happy if you have.

Contributor

GuyAglionby commented Mar 18, 2018

Thanks for all of your comments! I have made the requested changes.

I've seen the other site, but didn't realise it was changing to be the main version. I'll look into making a translator for that as well. Thanks for the heads up!

Edit: Just seen @adam3smith commented here saying he was working on a rewrite for the new site. I'll hold off working on it until I hear back from you; happy to work on it if you haven't started but equally happy if you have.

@zuphilip zuphilip referenced this pull request Mar 19, 2018

Closed

Update ACLWeb.js #946

@zuphilip zuphilip merged commit 9fba327 into zotero:master Mar 24, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zuphilip

This comment has been minimized.

Show comment
Hide comment
@zuphilip

zuphilip Mar 24, 2018

Collaborator

Thank you @GuyAglionby very much! I did some final changes and the PR is now merged 🎉 .

@adam3smith Did you start to work on the new site of ACL? Or can @GuyAglionby start from scratch?

Collaborator

zuphilip commented Mar 24, 2018

Thank you @GuyAglionby very much! I did some final changes and the PR is now merged 🎉 .

@adam3smith Did you start to work on the new site of ACL? Or can @GuyAglionby start from scratch?

@nschneid nschneid referenced this pull request Jul 30, 2018

Open

New ACL Anthology #1702

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment