Add Sacramento Bee #1363

owcz · Jul 8, 2017

tests text()/attr() polyfill from #1277

dstillman · Jul 8, 2017

dstillman reviewed Jul 8, 2017

View changes

dstillman · Jul 8, 2017

Sacramento Bee.js

+
+	// Authors
+	var authorMetadata = doc.querySelectorAll('.ng_byline_name');
+	if (authorMetadata) {


querySelectorAll() always returns a NodeList, even if it's empty, so you would want to use authorMetadata.length here.

dstillman · Jul 8, 2017

Sacramento Bee.js

+	item.title = attr(doc,'[property="og:title"]','content');
+	item.date = text(doc,'.published-date');
+	item.abstractNote = text(doc,'#content-body- p');
+	item.tags = attr(doc,'meta[name="keywords"]','content').split(", ");


attr() will return null if the selector isn't found, which would produce an error, so it'd be safer to assign it to a keywords variable and then do if (keywords) { item.tags = keywords.split(", "); }.

adam3smith · Jul 9, 2017

adam3smith requested changes Jul 9, 2017

View changes

Looks good. Some small things.

adam3smith · Jul 9, 2017

Sacramento Bee.js

+
+function scrape(doc, url) {
+	var item = new Zotero.Item("newspaperArticle");
+	item.websiteTitle = "The Sacramento Bee";


item.publicationTitle cf. https://aurimasv.github.io/z2csl/typeMap.xml#map-newspaperArticle

adam3smith · Jul 9, 2017

Sacramento Bee.js

+function detectWeb(doc, url) {
+	if (url.match(/article\d+/)) {
+		return "newspaperArticle";
+	} else if (url.match(/\/(news|sports|entertainment)\//)) {


I'd be more comfortable if you implement a version of a getSearchResults function (using querySelectorAll instead of ZU.xpath, of course). These matches cover a broad set of pages and we really want to avoid false positives. There's a reason @zuphilip puts them in all translators.

Also, use .search(/regex/)!= -1 for efficient tests of strings against regexs

.search is more efficient than .match?

yes, because it just has to return true/false vs. an actual match. See https://jsperf.com/exec-vs-match-vs-test-vs-search/5 (indexOf is dramatically more efficient, so you could also use that for this one since you don't really need a regex, just three separate strings)

mozilla actually makes this distinction explicitin in the docs

When you want to know whether a pattern is found and also its index in a string use search() (if you only want to know it exists, use the similar test() method on the RegExp prototype, which returns a boolean); for more information (but slower execution) use match() (similar to the regular expression exec() method).

I poked around and it looks like test() becomes (15%) more efficient than multiple indexOf()s when comparing for multiple strings

adam3smith · Jul 9, 2017

Sacramento Bee.js

+		return "multiple";
+	} else if (url.indexOf("/search/?q=") != -1) {
+		return "multiple";
+	} else return null;


(no need for an explicit return null or false here)

adam3smith · Jul 9, 2017

Sacramento Bee.js

+		item.tags = keywords.split(", ");
+	}
+	item.attachments.push({
+		title: "The Sacramento Bee snapshot",


capitalize "snapshot" per convention.

adam3smith · Jul 9, 2017

Sacramento Bee.js

+						"creatorType": "author"
+					}
+				],
+				"date": "January 08, 2015 1:09 PM",


use ZU.strToISO to avoid importing English dates into non-English locales.

adam3smith · Jul 9, 2017

adam3smith reviewed Jul 9, 2017

View changes

adam3smith · Jul 9, 2017

Sacramento Bee.js

+function detectWeb(doc, url) {
+	if (url.search(/article\d+/) != -1) {
+		return "newspaperArticle";
+	} else if (url.search(/(\/((news|sports|entertainment)\/)|(search\/\?q=))|sacbee\.com\/?$/) != -1  && getSearchResults(doc, true)) {


@owcz -- this here is half the reason we want the getSearchResults function (or something like it): by including it in detectWeb, you make sure you don't get false positives. Even if you're quite confident that you have the site covered as it is now, imagine they change their CMS -- with things as you had them, you could get false positives in all sorts of places, including on article pages where the generic metadata translator might perform OK otherwise.

adam3smith · Jul 9, 2017

This is good to merge, but I'll hold off for a little to see if the discussion in #1277 changes anything.

adam3smith · Jul 9, 2017

adam3smith reviewed Jul 9, 2017

View changes

adam3smith · Jul 9, 2017

Sacramento Bee.js

@@ -40,9 +40,9 @@
 function attr(doc,selector,attr,index){if(index>0){var elem=doc.querySelectorAll(selector).item(index);return elem?elem.getAttribute(attr):null}var elem=doc.querySelector(selector);return elem?elem.getAttribute(attr):null}function text(doc,selector,index){if(index>0){var elem=doc.querySelectorAll(selector).item(index);return elem?elem.textContent:null}var elem=doc.querySelector(selector);return elem?elem.textContent:null}

 function detectWeb(doc, url) {
-	if (url.search(/article\d+/) != -1) {
+	if (/article\d+/.test(url) != false) {


no need for != false -- that's what if does aready.

adam3smith · Jul 15, 2017

Cool, thanks!

Add Sacramento Bee

Loading status checks…

9d2fe33

fixes from review, handle more multis

Loading status checks…

3de67c9

owcz and others added some commits Jul 9, 2017

misc fixes & fixes per review

Loading status checks…

727cf9f

use getSearchResults in detectWeb

Loading status checks…

8bf1c9a

use test() in detectWeb
test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

Loading status checks…

2078956

rmv redundant check against false

Loading status checks…

6fc4601

adam3smith merged commit ae41654 into zotero:master Jul 15, 2017
1 check passed

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Add Sacramento Bee (zotero#1363)
* use test() in detectWeb test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

f61e1fe

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Add Sacramento Bee (zotero#1363)
* use test() in detectWeb test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

6fdfe8a

owcz deleted the owcz:sacbee branch Jul 8, 2018

zotero/translators

Add Sacramento Bee #1363

Add Sacramento Bee #1363

owcz commented Jul 8, 2017

dstillman reviewed Jul 8, 2017

View changes

This comment has been minimized.

This comment has been minimized.

adam3smith requested changes Jul 9, 2017

View changes

adam3smith left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

owcz and others added some commits Jul 9, 2017

adam3smith reviewed Jul 9, 2017

View changes

This comment has been minimized.

This comment has been minimized.

adam3smith commented Jul 9, 2017

adam3smith reviewed Jul 9, 2017

View changes

This comment has been minimized.

adam3smith merged commit `ae41654` into zotero:master Jul 15, 2017
1 check passed

1 check passed

This comment has been minimized.

adam3smith commented Jul 15, 2017

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

owcz deleted the owcz:sacbee branch Jul 8, 2018

zotero/translators

Join GitHub today

Add Sacramento Bee #1363

Conversation

owcz commented Jul 8, 2017

dstillman reviewed Jul 8, 2017 View changes

This comment has been minimized.

This comment has been minimized.

dstillman Jul 8, 2017 • edited

adam3smith requested changes Jul 9, 2017 View changes

adam3smith left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

owcz Jul 9, 2017 • edited

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

owcz and others added some commits Jul 9, 2017

adam3smith reviewed Jul 9, 2017 View changes

This comment has been minimized.

This comment has been minimized.

adam3smith commented Jul 9, 2017

adam3smith reviewed Jul 9, 2017 View changes

This comment has been minimized.

Hide details View details adam3smith merged commit ae41654 into zotero:master Jul 15, 2017 1 check passed

1 check passed

This comment has been minimized.

adam3smith commented Jul 15, 2017

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

owcz deleted the owcz:sacbee branch Jul 8, 2018

dstillman reviewed Jul 8, 2017

View changes

dstillman Jul 8, 2017 •

edited

adam3smith requested changes Jul 9, 2017

View changes

owcz Jul 9, 2017 •

edited

adam3smith reviewed Jul 9, 2017

View changes

adam3smith reviewed Jul 9, 2017

View changes

adam3smith merged commit `ae41654` into zotero:master Jul 15, 2017
1 check passed