Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about sort_ConditionalMacroDates.txt #31

Closed
jgm opened this issue Jun 25, 2020 · 9 comments
Closed

Question about sort_ConditionalMacroDates.txt #31

jgm opened this issue Jun 25, 2020 · 9 comments

Comments

@jgm
Copy link

@jgm jgm commented Jun 25, 2020

I'm baffled by this one -- there must be something I'm not understanding.

We have

    <sort>
      <key macro="issued-date" sort="ascending"/>
    </sort>

and the issued-date macro looks like this:

  <macro name="issued-date">
    <group prefix="[" suffix="]">
    <text variable="title"/>
    <choose>
      <if type="book">
        <date variable="issued">
          <date-part name="year" form="long"/>
        </date>
      </if>
      <else>
        <date variable="issued">
          <date-part name="year" form="long" suffix="-"/>
          <date-part name="month" form="numeric" suffix="-"/>
          <date-part name="day" form="numeric"/>
        </date>
      </else>
    </choose>
    </group>
  </macro>

Note the <text variable="title"/> which appears before the date. So the output of this macro for the various items should be something like

item-1:  title 220021025
item-2:  title 32002
item-3:  title 120021025

And that should give (for an ascending sort): item-3, item-1, item-2, resulting in:

Title 1
Title 2
Title 3

Instead, the expected result is

>>===== RESULT =====>>
<div class="csl-bib-body">
  <div class="csl-entry">Title 3</div>
  <div class="csl-entry">Title 1</div>
  <div class="csl-entry">Title 2</div>
</div>
<<===== RESULT =====<<

Here we get sorting on the date, apparently ignoring the title (and indeed, removing the line with <text variable="title"/> makes no difference to citeproc.js).

There is no way an ascending sort that looks at titles before dates could give this result. So I'm confused! What am I missing?

@adam3smith
Copy link
Member

@adam3smith adam3smith commented Jun 25, 2020

This one doesn't make any sense to me either.
Some quick tests show that:

  1. citeproc-js indeed behaves this way (as one would hope given the test)
  2. This is unrelated to the numbers in the title: Title A, Title B, Title C behave the exact same way
<sort>
  <key variable="title"/>
  <key macro="issued-date"/>
</sort>

Sorts as

Title 1
Title 2
Title 3

as expected (with or without the title being part of the issued-date variable).

It'd be great if @fbennett could chime in, but I'd currently call this as an incorrect test and suggest changing it (likely by just removing the title variable from the macro since the test is about dates)

@fbennett
Copy link
Member

@fbennett fbennett commented Jun 25, 2020

Ah, I remember this one. Rintze and I discussed the case in 2013, back in the BitBucket days. In the discussion Rintze also questioned the discarding of the non-date text from the macro when generating keys, and noted that doing so is an off-spec invention.

As I note in the linked thread, intermixing date and string values in a single key can't be made to produce intuitive sort results. Dates are a bit special: keys for non-date variables are simple strings or numbers that can be derived directly from the variable; but date keys must generate a machine-readable sort key that takes account of the date-parts element of the rendered form. As a result, there are cases in which a macro containing a condition must be used to produce the date key (APA is mentioned in the discussion).

The need to express date keys in a macro can be addressed in one of two ways. One would be to assume that such macros will always contain exactly one date. The other would be to extract a date from an existing macro, and ignore everything else it contains. citerproc-js adopts the latter approach. While it (obviously) can create confusion, it has the advantage that, if a style designer mistakenly assumes that a macro combining two elements (creator and date, say) will treat the contents as two atomic keys, the attempt will throw bad sort results immediately. (There is an example of this in the repo, in the initial history of deutsche-sprache.csl, where the initial checkin of the style is followed immediately by an amendment to the sort keys.)

So that's the reasoning behind it. On a quick check of the repository (based on macro names alone), it looks like a few styles would need to be amended to add and call date-only macros if the former approach is preferred.

beltz-padagogik.csl
berlin-school-of-economics-and-law-international-marketing-management.csl
biuletyn-polskiego-towarzystwa-jezykoznawczego.csl
deutsche-sprache.csl
hochschule-hannover-soziale-arbeit.csl
keel-ja-kirjandus.csl
zeitschrift-fur-qualitative-forschung.csl
wirtschaftsuniversitat-wien-abteilung-fur-bildungswissenschaft.csl
wirtschaftsuniversitat-wien-unternehmensrechnung-und-controlling.csl
wirtschaftsuniversitat-wien-wirtschaftspadagogik.csl
@jgm
Copy link
Author

@jgm jgm commented Jun 26, 2020

As I note in the linked thread, intermixing date and string values in a single key can't be made to produce intuitive sort results

I still don't understand this, even after reading the linked discussion. It's perfectly coherent to sort first on a text field, then on a structured date field (whose fields are determined by some kind of condition).

@fbennett
Copy link
Member

@fbennett fbennett commented Jun 27, 2020

The test is muddled, and should be divided into two separate fixtures for clarity. The reason for boiling the macro down to the date alone is that sorting on a string concatenation of title and date can produce different results than sorting atomically on the title and then on the date. Things can get messed up if the title ends in a number. Here are three entries sorted by title, then by date:

Title     | 2002-12-01
Title 1   | 2002-11-01
Title Two | 2002-10-01

If the same entries are sorted on a concatenation of title and date, the order comes out differently:

Title 1 2002-11-01
Title 2002-12-01
Title Two 2002-10-01

It's not a common case, but it does come up. Date extraction went into the processor in the early days, and I don't have the records on it to hand, but I'm pretty sure it was prompted by a user complaint about a badly sorted bibliography.

@adam3smith
Copy link
Member

@adam3smith adam3smith commented Jun 27, 2020

Thanks -- the reason now makes more sense to me, though I'm still not clear what's particular about dates here:

Title | 99 Red balloons
Title 1 | 1 Flew over the cuckoo's nest

Also reverses sort order when concatenated into a single string.

My inclination would therefore be to have processors/tests behave according to spec and to discourage using multi-variable macros for sorting, potentially even in the specs (discourage rather than forbid because there may be cases for them, though only rarely).

@fbennett
Copy link
Member

@fbennett fbennett commented Jun 27, 2020

That's the best for processor simplicity. The styles listed above should be amended to assure consistent results, though.

@adam3smith
Copy link
Member

@adam3smith adam3smith commented Jun 27, 2020

The styles listed above should be amended to assure consistent results, though.

I'm working on that -- seems like a good idea anyway in case other processors decide(d) to ignore the test & follow the letter of the specs and it won't affect output given current citeproc-js behavior, so we don't need to coordinate timing.

Will also create a PR for changing the test by removing the title from the issued-date macro.

@bdarcus would you create a PR for the specs that adds something along the lines of:

Except in rare cases, macros that render more than one element should be avoided as sort keys, as they can lead to unexpected sorting in edge cases. Instead, use their elements as individual, consecutive sort keys.

(wrote this quickly, happy for edits)

@jgm I think that should settle this, you can assume that the test will change -- thanks for bringing this up.

@bdarcus
Copy link
Member

@bdarcus bdarcus commented Jun 27, 2020

Shouldn't we just remove this test?

The file name, and purpose, of the test would change too, if we remove the macro call, and there should be other tests that cover this sorting?

If not, then I'm misunderstanding the resolution, and so what the change would be.

@adam3smith
Copy link
Member

@adam3smith adam3smith commented Jun 27, 2020

I interpret the name & content of the test to test sorting by a macro where the date format is conditional on the item type and the test would still do this without the title in the macro.

Edit:
This appears to be Frank's view as well, see above:

The test is muddled, and should be divided into two separate fixtures for clarity.

So instead of two separate fixtures, we remove one of the things the fixture tests for

adam3smith added a commit to citation-style-language/styles that referenced this issue Jun 27, 2020
So styles don't rely on the behavior described in 
citation-style-language/test-suite#31
bdarcus added a commit that referenced this issue Jun 27, 2020
Except in rare cases, macros that render more than one element should 
be avoided as sort keys, as they can lead to unexpected sorting in edge 
cases. Instead, use their elements as individual, consecutive sort keys.

Closes #31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.