Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bryn mawr #1115

Merged
merged 14 commits into from Aug 23, 2016

Conversation

@zuphilip
Copy link
Collaborator

commented Aug 22, 2016

This is the continuation of #1045 :

  • adapted the detectWeb
  • restructure the doWeb
  • use ZU.xpath and ZU.xpathText instead of doc.evaluate
  • try to use a more stable method for reviewAuthors
  • use url directly for extracting date
  • save the BMCR ID in extra

I tested it quite extensively and it seems to be stable now.

var items = {};
var found = false;
var rows = ZU.xpath(doc, '//*[@id="indexcontent"]//li//a');
Z.debug(rows.length);

This comment has been minimized.

Copy link
@adam3smith

adam3smith Aug 22, 2016

Collaborator

comment out (or if you think it's useful, add a descriptor, but I don't think so)


var title = ZU.xpathText(doc, '//h3/i');
item.title = "Review of: " + Zotero.Utilities.trimInternal(title);
var title = title.replace("(", "\\(").replace(")", "\\)");

This comment has been minimized.

Copy link
@adam3smith

adam3smith Aug 22, 2016

Collaborator

this isn't your code, but I find it confusing: it doesn't do anything useful, title isn't used again so it's pointless, and var isn't needed. If I'm right, delete. Otherwise remove var

if (m) {
item.extra = "BMCR ID: " + m[1] + "." + m[2] + "." + m[3];
item.date = m[1] + "-" + m[2];
}

This comment has been minimized.

Copy link
@Jmuccigr

Jmuccigr Aug 22, 2016

Contributor

This should actually work from 1998 on. For years between 1994 and 1997 inclusively, the match is

/(\d{2})\.(\d{2})\.(\d{2})/);

and then we've got:

item.date = "19" + m[1] + "-" + m[2];

See this page for the change in numbering method.

In earlier years (1990-1993), the year isn't included, but the URL ends with 01… where 01 = 1990, 02 is 1991, 03 is 1992, and 04 is 1993. There's no month indicator for those years, so it's just a year. That could be taken care of by checking m[1] in the 2-digit example just given and if < 5, add 1989 to it, then make that the date.

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 22, 2016

Please have a look at the new version.

@adam3smith You are absolutely right about the unnecessary line and the comment. I delete those.

@Jmuccigr I added the different cases depending on the year. However, it looks like the HTML is not always consistent in the older years and therefore the translator might or might not work with older years. It should work for 1999ff.

@Jmuccigr

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2016

I'm not surprised that it's inconsistent. I might be able to get the editors to fix that. Let me look into it. Which run of years do you mean?

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 22, 2016

Well, I tried to run the translators on the archives-by-year, e.g. http://bmcr.brynmawr.edu/1998/indexb.html which contains e.g. http://bmcr.brynmawr.edu/1998/98.1.02.html but I encountered more differences in the older years...

@Jmuccigr

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2016

Yeah, in 97 and 98 there are a few months where they didn't prefix a 0 on the month portion of the URL, so, e.g., they're 98-1-21 instead of 98-01-21. One example of this in the 1991 series beginning "02" as well. Might be easier to handle this here rather than asking them to fix a few dozen.

Three of them also have a "b" at the end of the number, all before 1999 when things got pretty regular. Checking for that with a b? should be OK, I think.

What do you think?

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 22, 2016

The url is okay and IMO we are currently handling all of them. But the content of http://bmcr.brynmawr.edu/1998/98.1.02.html is showing an error:

COST initialization failure: can't read "review_id": no such variable can't read "review_id": no such variable while executing "output "\n"" invoked from within "$translateSpec do startAction" invoked from within "translateHandler " (procedure "main" line 3) invoked from within "$COST(mainProcedure)" (in namespace eval "::" script line 5) invoked from within "namespace eval :: { if { [llength [info procs $COST(mainProcedure)]] } { set ::tcl_interactive 0 cost::EnsureInputFile $COST(mainProce..." (procedure "cost::commandLine" line 72) invoked from within "cost::commandLine"

(There are more inconsistencies, which I forgot...)

@Jmuccigr

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2016

Sorry, that was just an example, and not a real one. My bad. However, a bunch of the URLs in the 1998 listing at http://bmcr.brynmawr.edu/1998/ are not working. I'll see if I can find out something.

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 22, 2016

There is another, officially linked listing of the 1998 reviews: http://bmcr.brynmawr.edu/1998/indexb.html

@Jmuccigr

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2016

Let's see what they say.

return "multiple";
} else if (url.match(/[\d\-]+\.html$/)) {
} else if (url.match(/[\d\-]+\.html$/) && ZU.xpathText(doc, '//h3/i')) {

This comment has been minimized.

Copy link
@adam3smith

adam3smith Aug 23, 2016

Collaborator

use .search here too.

@adam3smith

This comment has been minimized.

Copy link
Collaborator

commented Aug 23, 2016

@zuphilip -- not sure I follow the last bits of the discussion. Is this ready to merge or should we wait to hear back from the editors?

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 23, 2016

Please have a look at the updated version.

The translator is ready to merge and works good for 1999ff and even in most cases also before. We don't have to wait to hear from the editors whether they correct some errors or non-standard design (also I appreciate any effort on that).

@adam3smith adam3smith merged commit 1f0634a into zotero:master Aug 23, 2016

@adam3smith

This comment has been minimized.

Copy link
Collaborator

commented Aug 23, 2016

Thanks!

@zuphilip zuphilip deleted the zuphilip:bryn-mawr branch Aug 23, 2016

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 23, 2016

Thanks to @Jmuccigr for starting with this and having patience to wait some time until we now have it merged. Thank you!

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Bryn mawr (zotero#1115)
This is the continuation of zotero#1045 :

    adapted the detectWeb
    restructure the doWeb
    use ZU.xpath and ZU.xpathText instead of doc.evaluate
    try to use a more stable method for reviewAuthors
    use url directly for extracting date
    save the BMCR ID in extra

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Bryn mawr (zotero#1115)
This is the continuation of zotero#1045 :

    adapted the detectWeb
    restructure the doWeb
    use ZU.xpath and ZU.xpathText instead of doc.evaluate
    try to use a more stable method for reviewAuthors
    use url directly for extracting date
    save the BMCR ID in extra
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.