Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate common name attributes upward #3524

Conversation

Projects
None yet
4 participants
@fbennett
Copy link
Member

commented May 23, 2018

Not sure how this one will be received ...

This is a more ambitious set of changes than #3517. It removes inheritable attributes from cs:names and cs:name nodes if they are the same in all calls originating from their cs:citation or cs:bibliography parent, and if they do not conflict across the two parents, and sets them on the parent instead.

The advantage of the reprocessed styles is a reduction in code replication. Also, since all inheritable attributes that remain on cs:names and cs:name nodes are consequential, it makes it easier to spot discrepancies, such as disparate application of initialize-with.

Here is the script used to apply the changes, with notes on the logic of the scan. (If this full pull request seems overly ambitious, the script can be applied to individual styles as well.)

(Code below is the working code, edited since the initial post here went up.)

// Safely migrate cs:names and cs:name attributes upward

// (1) Walk all macros in each of cs:citation and cs:bibliography
// (2) Check inheritable attributes on all cs:names and cs:name nodes found in the walk
// (3) Flag all attributes with identical values across all calls in the scope for migration
// (4) Forego migration of attribute if its migration values differ in cs:citation and cs:bibliography
//     cope, and there are cs:names-calling macros common to the two scopes
// (5) Forego migration of attribute if common value differs from an existing value set on parent
//     (cs:citation or cs:bibliography respectively)
//
// If run with no arguments as node ./migrateNamesAttributes.js, processes all independent styles.
// If run with style file name as sole argument, processes only that single file.
//
// The script currently touches only et-al-* attributes. See lines 78-88 below.

var fs = require('fs');
var xml = require('xmldom').DOMParser;
var xpath = require('xpath');
var enc = require('encode-decode');
var select = xpath.useNamespaces({"cs": "http://purl.org/net/xbiblio/csl"});

var singleFile = process.argv[2];

function EntityFixer() {
    this.dat = {};
    // Avoid chaos
    var lurkers = {
        "<": "{LESS-THAN}",
        ">": "{GREATER-THAN}",
        " ": "{NO-BREAK-SPACE}"
    }
    this.memo = function(txt) {
        for (var key in lurkers) {
            var rex = new RegExp(key, "g");
            txt = txt.replace(rex, lurkers[key]);
        }
        var m = txt.match(/(\&\#[0-9]+;)/g);
        if (m) {
            for (var memo of m) {
                // Avoid chaos again. Don't touch these!
                if (" " === memo) continue;
                if ("	" === memo) continue;
                this.dat[memo] = enc.htmlDecode(memo);
            }
        }
        return txt;
    }
    this.dememo = function(txt) {
        for (var key in this.dat) {
            var rex = new RegExp(this.dat[key], "g");
            txt = txt.replace(rex, key);
        }
        for (var key in lurkers) {
            var rex = new RegExp(lurkers[key], "g");
            txt = txt.replace(rex, key);
        }
        return txt;
    }
}

var AttributeCollector = function(doc) {
    this.doc = doc;
    this.parent_macro = [];
    this. acc = {
        citation: {
            names: {},
            name: {},
            seen_macros: {},
            macros: {}
        },
        bibliography: {
            names: {},
            name: {},
            seen_macros: {},
            macros: {}
        }
    };
//            "and": "and",
//            "delimiter-precedes-et-al": "delimiter-precedes-et-al",
//            "delimiter-precedes-last": "delimiter-precedes-last",
//            "initialize": "initialize",
//            "initialize-with": "initialize-with",
//            "name-as-sort-order": "name-as-sort-order",
//            "sort-separator": "sort-separator",
//            "form": "name-form",
//            "delimiter": "name-delimiter",

//            "delimiter": "names-delimiter"

    this.attrMap = {
        names: {
        },
        name: {
            "et-al-min": "et-al-min",
            "et-al-use-first": "et-al-use-first",
            "et-al-use-last": "et-al-use-last",
            "et-al-subsequent-min": "et-al-subsequent-min",
            "et-al-subsequent-use-first": "et-al-subsequent-use-first"
        }
    };
    this.processAttributes = function(seg, subseg, callback, node) {
        if (!node) {
            node = select("//cs:" + seg + "/cs:" + subseg, this.doc)[0];
            if (!node) return;
        }
        if (!this[seg + "Node"]) {
            this[seg + "Node"] = select("//cs:" + seg, this.doc)[0];
        }
        // Accumulate attributes on cs:names / cs:name
        var tags = ["names", "name"];
        for (var tag of tags) {
            var nxNodes = select(".//cs:" + tag, node);
            for (var nxNode of nxNodes) {
                callback.call(this, seg, subseg, tag, nxNode);
            }
        }
        // Drill down on either cs:key or cs:text macros
        // (the former will appear on cs:sort, and the latter never appear there)
        var macroNodes = select(".//cs:key[@macro]", node);
        if (macroNodes.length === 0) {
            macroNodes = select(".//cs:text[@macro]", node);
        }
        for (var macroNode of macroNodes) {
            var macroName = macroNode.getAttribute("macro");
            if (this.acc[seg].seen_macros[macroName]) continue;
            this.acc[seg].seen_macros[macroName] = true;
            var macroSourceNode = select(".//cs:macro[@name='" + macroName + "']", this.doc)[0];
            this.parent_macro.push(macroName);
            this.processAttributes(seg, subseg, callback, macroSourceNode);
            this.parent_macro.pop();
        }
    }
    this.cleanup = function() {
        // Strip no-migrate values
        for (var seg of ["citation", "bibliography"]) {
            for (var tag of ["names", "name"]) {
                var deletes = [];
                for (var key in this.acc[seg][tag]) {
                    if (!this.acc[seg][tag][key]) {
                        deletes.push(key);
                    }
                }
                for (var del of deletes) {
                    delete this.acc[seg][tag][del];
                    if (this.falsifyEtAl[del]) {
                        delete this.acc[seg][tag][this.falsifyEtAl[del]];
                    }
                }
            }
        }

        // Block migration if there is a conflict with existing parent values
        for (var seg of ["citation", "bibliography"]) {
            for (var tag of ["names", "name"]) {
                var deletes = [];
                for (var attr in this.acc[seg][tag]) {
                    var higherAttr = this.attrMap[tag][attr];
                    var higherVal = this[seg + "Node"].getAttribute(higherAttr);
                    if (higherVal && higherVal !== this.acc[seg][tag][attr]) {
                        deletes.push(attr);
                    }
                }
                for (var del of deletes) {
                    delete this.acc[seg][tag][del];
                    if (this.falsifyEtAl[del]) {
                        delete this.acc[seg][tag][this.falsifyEtAl[del]];
                    }
                }
            }
        }

        // Note that name-as-sort-order should be ignored
        // in sort context, since it is always forced to
        // "always".
        
        // In each segment, check each attribute marked for migration, to see if
        // (a) any macro is used in both segments, and (b) the migration value differs
        // from that in the other segment. If the answer to both is "yes," delete the
        // attribute migration marker from BOTH segments.

        var hasCommon = false;
        for (var key in this.acc.citation.macros) {
            if (this.acc.bibliography.macros[key]) {
                hasCommon = true;
                break;
            }
        }
        if (hasCommon) {
            var deletes = [];
            for (var key in this.acc.citation.names) {
                var cVal = this.acc.citation.names[key];
                if (this.acc.bibliography.names[key] !== cVal) {
                    deletes.push(key);
                }
            }
            for (var del of deletes) {
                delete this.acc.citation.names[del];
                delete this.acc.bibliography.names[del];
            }
            var deletes = [];
            for (var key in this.acc.citation.name) {
                var cVal = this.acc.citation.name[key];
                if (this.acc.bibliography.name[key] !== cVal) {
                    deletes.push(key);
                }
            }
            for (var del of deletes) {
                delete this.acc.citation.name[del];
                delete this.acc.bibliography.name[del];
                if (this.falsifyEtAl[del]) {
                    delete this.acc.citation.name[this.falsifyEtAl[del]];
                    delete this.acc.bibliography.name[this.falsifyEtAl[del]];
                }
            }
        }
        this.acc.citation.macros = {};
        this.acc.bibliography.macros = {};
        this.acc.citation.seen_macros = {};
        this.acc.bibliography.seen_macros = {};
    }
    this.finish = function(fn) {
        var writeFile = false;
        for (var seg of ["citation", "bibliography"]) {
            for (var tag of ["names", "name"]) {
                if (this[seg + "Node"]) {
                    for (var attr in this.acc[seg][tag]) {
                        writeFile = true;
                        var higherAttr = this.attrMap[tag][attr];
                        this[seg + "Node"].setAttribute(higherAttr, this.acc[seg][tag][attr])
                    }
                }
            }
        }
        if (writeFile) {
            var txt = entityFixer.dememo(this.doc.toString());
            //console.log(txt);
            fs.writeFileSync(fn, txt + "\n")
        }
    }
    
    this.falsifyEtAl = {
        "et-al-min": "et-al-use-first",
        "et-al-use-first": "et-al-min",
        "et-al-subsequent-min": "et-al-subsequent-use-first",
        "et-al-subsequent-use-first": "et-al-subsequent-min"
    }
}

function collector(seg, subseg, tag, nxNode) {
    var parent_macro = this.parent_macro.slice(-1)[0];
    if (parent_macro) {
        this.acc[seg].macros[parent_macro] = true;
    }
    if (tag === "names" && nxNode.parentNode.tagName === "substitute" && nxNode.attributes.length === 1) return;
    for (var attr in this.attrMap[tag]) {
        if (subseg === "sort" && attr === "name-as-sort-order") continue;
        var val = nxNode.getAttribute(attr);
        var accVal = this.acc[seg][tag][attr];
        if (accVal === undefined) {
            this.acc[seg][tag][attr] = val;
        } else if (accVal !== val) {
            this.acc[seg][tag][attr] = false;
        }
    }
}

function editor(seg, subseg, tag, nxNode) {
    if (tag === "names" && nxNode.parentNode.tagName === "substitute" && nxNode.attributes.length === 1) return;
    for (var attr in this.acc[seg][tag]) {
        nxNode.removeAttribute(attr);
    }
}

for (var fn of fs.readdirSync('.')) {
    var writeToFile = false;
    var entityFixer = new EntityFixer();
    if (fn.slice(-4) !== ".csl") continue;
    if (singleFile && fn !== singleFile) continue;
    
    console.log(fn);
    var txt = fs.readFileSync(fn).toString();
    txt = entityFixer.memo(txt);
    var doc = new xml().parseFromString(txt);
    var attributeCollector = new AttributeCollector(doc);

    // Collect common attributes
    attributeCollector.processAttributes("citation", "sort", collector);
    attributeCollector.processAttributes("citation", "layout", collector);
    attributeCollector.processAttributes("bibliography", "sort", collector);
    attributeCollector.processAttributes("bibliography", "layout", collector);

    //console.log(JSON.stringify(attributeCollector.acc, null, 2));
    
    // Reset for rerun of walk
    attributeCollector.cleanup();

    //console.log(JSON.stringify(attributeCollector.acc, null, 2));

    attributeCollector.processAttributes("citation", "sort", editor);
    attributeCollector.processAttributes("citation", "layout", editor);
    attributeCollector.processAttributes("bibliography", "sort", editor);
    attributeCollector.processAttributes("bibliography", "layout", editor);

    attributeCollector.finish(fn);
}
@csl-bot

This comment has been minimized.

Copy link

commented May 23, 2018

Awesome! You just created a pull request to the Citation Styles Language styles repository. One of our human volunteers will try to get in touch soon (usually within a week). In the meantime, I will run some automated checks. You should be notified of the results in a few minutes.

If you haven't done so yet, please make sure your style validates and follows all our other Style Requirements.

To update this pull request, visit the "Files changed" tab above, and click on the pencil icon (see below) in the top-right corner of your style to start editing.

If you have any questions, please leave a comment and we'll get back to you. While we usually respond in English, feel free to write in whatever language you're most comfortable.

@csl-bot

This comment has been minimized.

Copy link

commented May 23, 2018

😟 There are some issues with your submission. Please check the test report for details.

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 23, 2018

Oops. It seems there is a problem with validation ...

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 23, 2018

It's actually a problem with the Travis checks, not with the styles. Travis is demanding that if one of et-al-min or et-al-use-first is present on a node, its partner must also be present. That is not required by the schema. The first style caught by Travis, gallia-prehistoire.csl, sets a common value of et-al-use-first="1" in cs:bibliography, with values of et-al-min="3" or et-al-min="10" depending on the name variable rendered. With the et-al-use-first attribute alone moved to cs:bibliography, the style still validates fine against the schema.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 146ae2a to 4af4fb1 May 24, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 24, 2018

😟 There are some issues with your submission. Please check the test report for details.

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 24, 2018

Oh! I jumped to conclusions there, as I so often do.

There is a specific problem with the gallia-prehistoire.csl style. It's an edge case that trips up Travis when the edits are applied. The style has et-al-use-first="1" on the cs:style node, without its partner. That isn't a problem with the style in its current form, though, because the style sets et-al-use-first and et-al-min attributes everywhere anyway.

When the script performs the migration, the name="author" macro is called only from cs:citation, so its attributes are migrated to cs:citation. So far, so good, and Travis would be fine with that.

However (ahem) the same macro is called from cs:bibliography with the attributes names-min="3" and names-use-first="3". Travis doesn't treat those attributes as equivalent to et-al-min and et-al-use-first, so what it sees is only the single attribute that was set on cs:style, and it blows up—but the style should work fine with those settings.

Removing et-al-use-first="1" from the cs:style node should allow Travis checks to pass and have no effect on the output of the style. Let's see how that works out ...

(First draft of this post edited after more careful inspection.)

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 4af4fb1 to de7ebcc May 24, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 24, 2018

😟 There are some issues with your submission. Please check the test report for details.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from de7ebcc to 7034200 May 24, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 24, 2018

😟 There are some issues with your submission. Please check the test report for details.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 7034200 to 5ebef65 May 24, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 24, 2018

😟 There are some issues with your submission. Please check the test report for details.

@adam3smith

This comment has been minimized.

Copy link
Member

commented May 24, 2018

The concern here is readability. We've been adding the name attributes to cs:name because it makes sense to think about them where they names are called (it's also where the visual editor takes you when you click on a name). So I'm wondering if there really is sufficient benefit in avoiding the relatively minor code duplication.
This isn't a strong position, though, so I'm perfectly happy to be convinced. Curious what @rmzelle and @POBrien333 think

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 24, 2018

Yes---I'm not pushing at all. It's just something that I started tinkering with, and wanted to finish. When it's done and working, I can bundle the script in a project so that it's available for use if desired.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 5ebef65 to d9217f8 May 24, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 24, 2018

😟 There are some issues with your submission. Please check the test report for details.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from d9217f8 to 96d3367 May 24, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 24, 2018

😟 There are some issues with your submission. Please check the test report for details.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 96d3367 to 4bd39d0 May 24, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 24, 2018

😃 Your submission passed all our automated tests.

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 25, 2018

Well, that was an interesting journey. Tests finally pass!

The script touches 1604 styles in all. The repeated test failures above were triggered by a series of sloppy-coding bugs that failed to preserve the pairing of et-al-min and et-al-use-first on individual nodes. With that fixed, all is well.

Caution advised, but these should produce output identical to that of the originals. As to whether moving these attributes actually makes maintenance easier, that is one for active maintainers to call. I just wanted to give the script a proper test run once I had begun coding it.

(If this one isn't adopted, feel free to close the pull request at will!)

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 25, 2018

As a final thought, this could be extended to reverse the migration, removing all inheritable attributes and placing them always directly on cs:names and cs:name nodes instead. If that would make behavior in the WYSIWYG editor less confusing for users, it could be done.

@POBrien333

This comment has been minimized.

Copy link
Contributor

commented May 25, 2018

As far as I understand, this script looks for the et-al settings in cs-name and if they're the same it actually puts it straight up in the first lines?
I generally put them under the <citation and <bibliography line and that's where I look first. And I do hate when the et-al settings are under cs-names.

@adam3smith

This comment has been minimized.

Copy link
Member

commented May 25, 2018

I agree on the et-al attributes -- I prefer those on cs:citation and cs:bibliography, too. So I'd definitely be in favor of a modified version of this that only affects et als.

This goes further, though, and moves all inheritable elements there, i.e. also things like initialize-with, and="text" etc.

@csl-bot

This comment has been minimized.

Copy link

commented May 25, 2018

😃 Your submission passed all our automated tests.

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 25, 2018

@adam3smith @POBrien333 @rmzelle Here you go. I've pushed a changeset limited to the et-al-* attributes. It touches just 30 styles, so manual review of the changes is feasible this time around.

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 25, 2018

As a side-note, more styles would be affected if et-al-min and et-al-use-first were allowed to move independently—many styles apply a common value for et-al-use-first, but differing values of et-al-min. Travis throws an error if the two attributes are not set as a pair on individual nodes, though, so the script currently treats them as a unit.

@csl-bot

This comment has been minimized.

Copy link

commented May 26, 2018

😃 Your submission passed all our automated tests.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from a1703a6 to 076c373 May 26, 2018

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 26, 2018

I've run a little poke-it-and-see test, and I was wrong about the Travis logic. Either of et-al-min or et-al-use-first can be set solo on a node, so long as the partner can be acquired by inheritance—so the Travis test is robust, and migrating either of those individually should be perfectly okay, barring bugs in the migration script. I'll try to fix that and make an additional checkin, so that you can see which arrangement you like best.

@csl-bot

This comment has been minimized.

Copy link

commented May 26, 2018

😃 Your submission passed all our automated tests.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 45b0454 to 076c373 May 26, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 26, 2018

😟 There are some issues with your submission. Please check the test report for details.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 26eff65 to 076c373 May 26, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 26, 2018

😟 There are some issues with your submission. Please check the test report for details.

@csl-bot

This comment has been minimized.

Copy link

commented May 26, 2018

😃 Your submission passed all our automated tests.

@fbennett fbennett force-pushed the fbennett:migrate-common-name-attributes-upward branch from 8a5609f to 076c373 May 26, 2018

@csl-bot

This comment has been minimized.

Copy link

commented May 26, 2018

😃 Your submission passed all our automated tests.

@fbennett

This comment has been minimized.

Copy link
Member Author

commented May 26, 2018

There we go. The third check-in touches just six additional styles, so there was a high degree of editorial consistency already (and no, that is not surprising!). Two styles were very slightly edited to avoid Travis errors: lethaia.csl and paleobiology.csl. These seem to differ only slightly, and both set a value on cs:citation that was overridden in the sole cs:name call within that scope. This blocked the migration of one element of the pair, and Travis blew up on that macro when (I think) inspecting it through a call incs:key. Aligning the values allowed the migration to go through for both attributes, and all six styles pass with flying colors.

@csl-bot

This comment has been minimized.

Copy link

commented Jul 1, 2018

😃 Your submission passed all our automated tests.

@adam3smith adam3smith merged commit 69304f7 into citation-style-language:master Jul 1, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@adam3smith

This comment has been minimized.

Copy link
Member

commented Jul 1, 2018

Cool! CC @POBrien333 -- no more unnecessary et als on names!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.