Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some for each cases #896

Merged
merged 9 commits into from Jun 17, 2015

Conversation

@zuphilip
Copy link
Collaborator

commented Jun 1, 2015

  • These cases are easy, because for each was used to loop trough arrays (derived e.g. by functions like match, split, Zotero.RDF.getStatementsMatching).
  • Replace for each loops with for .. of.
Fix some for each cases
 * These cases are easy, because for each was used to loop trough arrays.
 * Replace for each loops with for .. of.
@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 1, 2015

I tried to perform all possible tests (however some translator seem broken). Please have a look at the suggested changes. I can update the lastUpdated afterwards in a bulk edit.

@simonster

This comment has been minimized.

Copy link
Member

commented Jun 1, 2015

let won't work in any browser but Firefox, and there's not really a good way to make it work on other browsers, since one can't simply replace let with var. for ... of will work in newer Chrome and Safari versions, but it would break IE. I could add a regexp to convert for ... of to ordinary for loops in IE as we do right now for for each, but I'd rather just use ordinary for loops to begin with. We could run that regexp on all the translators now and get rid of all the for each uses, but the code it generates is pretty ugly.

@aurimasv

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2015

Also, note that there are some intricate differences between for each and for of zotero/zotero#224 (comment)

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 1, 2015

Actually, I began with normal for before trying the "smarter" for ... of. I still have that work in my stash, e.g.

diff --git "a/C:\\Users\\zumstein\\AppData\\Local\\Temp\\TortoiseGit\\Age1E26.tmp\\Agencia del ISBN-cf42ad8-left.js" "b/C:\\Users\\zumstein\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\766q0b3g.zotero-developing\\zotero\\translators\\Agencia del ISBN.js"
index 170b43a..06df1cb 100644
--- "a/C:\\Users\\zumstein\\AppData\\Local\\Temp\\TortoiseGit\\Age1E26.tmp\\Agencia del ISBN-cf42ad8-left.js"   
+++ "b/C:\\Users\\zumstein\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\766q0b3g.zotero-developing\\zotero\\translators\\Agencia del ISBN.js" 
@@ -63,8 +63,9 @@ function doWeb(doc, url) {
        author = data['Autores'];
        if (author) {
            var authors = author.match(/\b.*,\s+\w+[^([]/g);
-           for each (aut in authors) {
-               item.creators.push(Zotero.Utilities.cleanAuthor(Zotero.Utilities.trimInternal(aut), "author", true));
+           for (var i=0; i<authors.lenght; i++) {
+               var aut = Zotero.Utilities.trimInternal(authors[i]);
+               item.creators.push(Zotero.Utilities.cleanAuthor(aut, "author", true));
            }
        }
        if (data['Publicacin']) item.publisher = Zotero.Utilities.trimInternal(data['Publicacin']);

Would that be nicer than the automatic replacement and compatible along all browsers?

@simonster

This comment has been minimized.

Copy link
Member

commented Jun 1, 2015

Yes, that would be better (except for the typo in length).

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 2, 2015

Okay, I can work on that further manually.

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 2, 2015

I just updated the first ones, i.e.

  • using a simple for (for all cases where the variable is an array and indexed by numbers).
  • use var instead of let
  • check that there is no other variable within the function scope wit the same name
  • run test cases whenever possible
  • fix some spacing
    I would like to continue in this way for these cases.
Update
(excluding the RDF translators, IEEE Xplore.js)
@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 2, 2015

Okay, here are the other cases, except the three RDF translators (and IEEE Xplore translator). Is this okay for you to review or do you wish some other action before?

Some more for eachs are in the RDF and TEI translators.

Excluding the RDF translators and IEEE Xplore.js
 * IEEE Xplore.js is dealt seperately in a commit
 * RDF translator (and TEI) needs more consideration to replace all `for eachs`
@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 3, 2015

I would suggest to sqash now to have one clean commit for you to review?

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 9, 2015

@simonster @aurimasv Any opinion here? How to proceed?

if(censusNos[i] == year) { censusYear = 1 } else {censusYear= 0 };
for(var j in censusNos) {
if(censusYear == 1) { censusNo = censusNos[j] };
if(censusNos[j] == year) { censusYear = 1 } else {censusYear= 0 };

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

All of this should just be a year -> census number map. The whole loop is unnecessary.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

I am not sure whether I should change something her now AND what I should change this exactly.

"translatorID": "0dda3f89-15de-4479-987f-cc13f1ba7999",
"label": "Ancestry.com US Federal Census",
"creator": "Elena Razlogova",
"target": "^https?://search.ancestry.com/(.*)usfedcen|1890orgcen|1910uscenindex",

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

this target regexp is wrong. Should be "^https?://search.ancestry.com/.*(usfedcen|1890orgcen|1910uscenindex)"

if (data['FechaEdicin']) item.date = Zotero.Utilities.trimInternal(data['FechaEdicin']);
item.complete();
}

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

we should probably avoid all of this whitespace change

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

I sometimes had problems to read the code because the whitespacing was wrong and therefore I tried to improve a little. Is this in general a bad idea or are you more concenrned about the git history?

if(temp.value.indexOf("Story")>-1 || temp.value=="Blog") {
scrape(doc,url);
}
}

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

undo all of the whitespace change above

@@ -95,4 +98,4 @@ var testCases = [
"items": "multiple"
}
]
/** END TEST CASES **/
/** END TEST CASES **/

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

since we're undoing whitespace, might as well undo this one and the one above.

InfoTrac.js Outdated
@@ -151,7 +151,7 @@ function extractCitation(url, elmts, title, doc) {

// ensure author is not already there
var add = true;
for each(var existingAuthor in newItem.creators) {
for (let existingAuthor of newItem.creators) {

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

can't use let in translators, since it's currently only supported in Firefox

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

Also this one lacks timestamp bump and is different from other changes.

for each (var aut in authors) {
aut = aut.replace(/[^\w^\s^\.]/g, "").replace(/\d/g, "");
for (var i=0; i<authors.length; i++) {
var aut = authors[i].replace(/[^\w^\s^\.]/g, "").replace(/\d/g, "");

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

[^\w^\s^\.] not entirely sure what this is supposed to mean, but it looks like this should be just var aut = authors[i].replace(/[^A-Z\s.]+/gi);

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

I don't understand. Replaced with empty string? Only this one replace or also the second replace from the line above? Should I really change something here? I just undo all space improvements ...

}/** BEGIN TEST CASES **/
}

/** BEGIN TEST CASES **/

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

undo whitespace

// fix capitalization
var words = author.split(" ");
for (var i in words) {
words[i] = words[i][0].toUpperCase() + words[i].substr(1).toLowerCase();
for (var k in words) {

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

this is also iterating an array

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

I didn't touch any of the for ... in except for variable renaming. I thought that for ... in is not deprecated and compatible among all browsers. Am I wrong?

@@ -165,8 +166,8 @@ function scrape(doc, url) {
if (metaTags["keywords"]) {
var keywords = metaTags["keywords"];
newItem.tags = keywords.split(",");
for (var i in newItem.tags) {
newItem.tags[i] = newItem.tags[i].replace(" ", ", ");
for (var t in newItem.tags) {

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

same

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

nit: Also, all of these can be i, unless they share the same scope.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

This may be a little too much caution, but yes they are in the same function which is the scope for var.

This comment has been minimized.

Copy link
@dstillman

dstillman Jun 10, 2015

Member

They should only be something other than i if they're nested.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

How about double variable declaration? I.e. something like

for (var i=0; ....) {

}
for (var i=0; ...) {

}

AFAIK this is still okay, but maybe you have a coding convention on these...

This comment has been minimized.

Copy link
@dstillman

dstillman Jun 10, 2015

Member

Yup, those should be i.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

Or

for (var i=0; ....) {

}
for (i=0; ...) {

}

This comment has been minimized.

Copy link
@dstillman

dstillman Jun 10, 2015

Member

No, var for both. (Doesn't really matter in translators, since they're in a sandbox, but otherwise if the first one gets removed the second becomes a global assignment, which would trigger an error in strict mode.)

PubMed.js Outdated
@@ -260,7 +260,7 @@ function doWeb(doc, url) {
//extract PMID from a context object
function getPMID(co) {
var coParts = co.split("&");
for each(part in coParts) {
for (let part of coParts) {

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

no let

@@ -278,7 +278,7 @@ function doExport() {
// tags
if(item.tags) {
var keywordTag = "";
for each(var tag in item.tags) {
for (let tag of item.tags) {

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

let

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

And timestamp

"translatorID": "6f5f1b24-7519-4314-880f-d7004fbcfe7e",
"label": "ReliefWeb",
"creator": "Michael Berkowitz",
"target": "http://(www.)?reliefweb.int/",

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

might as well tidy this up. "^https?://(www\\.)?reliefweb\\.int/"

sel_items[i.url] = i.title;
sel_items = Zotero.selectItems(sel_items);

for (var i in sel_items)
items[i].complete();
} else if (post_count == 1)
for each(var i in items)
for (let i of items)

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

let and timestamp

"translatorID": "e0234bcf-bc56-4577-aa94-fe86a27f6fd6",
"label": "The Globe and Mail",
"creator": "Adam Crymble",
"target": "http://www.theglobeandmail.com",

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

"^https?://www\\.theglobeandmail\\.com/"

for (var i in words) {
if (words[i] != "") {
words[i] = words[i][0].toUpperCase() + words[i].substr(1).toLowerCase();
for (var j in words) {

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

array

YouTube.js Outdated
@@ -86,7 +86,7 @@ function getData(ids){
var keywords;
if ((keywords = ZU.xpathText(doc, '//media:group/media:keywords', ns))) {
keywords = keywords.split(",");
for each(var tag in keywords){
for (let tag of keywords){

This comment has been minimized.

Copy link
@aurimasv

aurimasv Jun 10, 2015

Contributor

let and timestamp

@adam3smith

This comment has been minimized.

Copy link
Collaborator

commented Jun 10, 2015

just dropping in quickly to point out that some of these translators are entirely defunct (e.g. Ancestry afaik) so it doesn't make any sense to fix details of them.

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 10, 2015

@adam3smith When do you delete a translator completely from the repo? An "entirely defunct translator" sounds for me like we should delete that. When there is some interest in the future, it might be possible to write a new one.

@adam3smith

This comment has been minimized.

Copy link
Collaborator

commented Jun 10, 2015

we delete translators when sites don't exist anymore or translators are merged. We don't delete translators that are completely broken if there's any chance they'll be replaced by functioning versions in the future.
That's at least what we've been doing so far--I don't have a strong opinion on whether that's a good policy or whether it'd makes sense to delete more. Once consideration might be that even where sites change dramatically, mappings and clean-up code may still be useful.

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 10, 2015

Hm... it would at least be good to "mark" such completely broken translator. If "code may still be useful" then I suggest also to do these refactoring on this code as well.

zuphilip added some commits Jun 10, 2015

Exclude Embedded Metadata.js
 * I opened that just during the process in Scaffold which then updated the date.
Undo all spacing stuff
 * deleting all spacing stuff from before
 * delete also the test case in Agencia del ISBN.js because it seems that the url is not valid anymore
@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 10, 2015

@aurimasv I updated the PR as you requested. Please have a look at the new version.

var byline = doc.evaluate('//div[@class="article_author"]|//span[contains(@class,"article-byline")]', doc, null, XPathResult.ANY_TYPE, null);
var authors = byline.iterateNext().textContent.replace(/^by\s*/i, "").split(" and ");
for (var i=0; i<authors.length; i++) {
newItem.creators.push(Zotero.Utilities.cleanAuthor(authors[i], "author"));

This comment has been minimized.

Copy link
@zuphilip

zuphilip Jun 10, 2015

Author Collaborator

I fixed some of the regexp for the test case. Should I make a seperate pull request for that (i.e. exclude this translator here)?

@aurimasv aurimasv merged commit a393202 into zotero:master Jun 17, 2015

aurimasv added a commit that referenced this pull request Jun 17, 2015

@aurimasv

This comment has been minimized.

Copy link
Contributor

commented Jun 17, 2015

Thanks!! (though I probably should have squashed those commits on merge. Ooops)

@zuphilip zuphilip deleted the zuphilip:refactor-for-eachs branch Jun 17, 2015

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 17, 2015

Okay. Cool. Merging would have been done a cleaner git history, but in this way my stats (+9 commits) are evolving faster ;-)

Now, we only should have for each loops in the three RDF translators and in the TEI translator. They seem to be more tricky, because they involve cases with objects which are not arrays.

@aurimasv

This comment has been minimized.

Copy link
Contributor

commented Jun 17, 2015

Should be able to do a straightforward replacement from for each(var i in obj) { to

for (var j in obj) {
  var i = obj[j];

Though I haven't looked at those cases in particular

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.