New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate literal types #155

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@dhimmel
Contributor

dhimmel commented Aug 10, 2018

Closes #154

I verified the following executed without error (but have not done further testing):

import jsonref
import jsonschema
url = 'https://github.com/dhimmel/schema/raw/01d112e0ba606c4f61fcc9e27e4f623f5aad8da6/csl-data.json'
schema = jsonref.load_uri(url, jsonschema=True)
jsonschema.Draft3Validator.check_schema(schema)

Another question is whether there is a larger simplification that can be made. Should type remain a list in these instances?

@rmzelle

This comment has been minimized.

Member

rmzelle commented Aug 10, 2018

the following executed without error

Sorry for being dense, but that is a Python script, right?

Another question is whether there is a larger simplification that can be made. Should type remain a list in these instances?

What exactly are you proposing to change? What would the alternative be?

@dhimmel

This comment has been minimized.

Contributor

dhimmel commented Aug 10, 2018

Sorry for being dense, but that is a Python script, right?

Ah yes!

What exactly are you proposing to change? What would the alternative be?

Refer to this section of the current schema:

schema/csl-data.json

Lines 74 to 128 in 4846e02

"id": "name-variable",
"type": [
{
"properties": {
"family" : {
"type": "string"
},
"given" : {
"type": "string"
},
"dropping-particle" : {
"type": "string"
},
"non-dropping-particle" : {
"type": "string"
},
"suffix" : {
"type": "string"
},
"comma-suffix" : {
"type": [
"string",
"number",
"boolean"
]
},
"static-ordering" : {
"type": [
"string",
"number",
"boolean"
]
},
"literal" : {
"type": "string"
},
"parse-names" : {
"type": [
"string",
"number",
"boolean"
]
}
},
"additionalProperties" : false
},
{
"properties": {
"literal" : {
"type": "string"
}
},
"additionalProperties" : false
}
]

This is defining name-variable, but type is an array. Previously, the array had multiple options, but post this PR it will only have one. Therefore should type remain an array or be refactored somehow? I'm not very familiar with how JSON Schemas are definited so bear with me.

@rmzelle

This comment has been minimized.

Member

rmzelle commented Dec 4, 2018

@adam3smith, @fbennett, could you give this a quick look?

The problem was that we defined the "literal" field for names and dates twice in csl-data.json, which apparently trips up some JSON validators. This pull request eliminates this redundancy. I'm assuming that "literal" names, as well as "literal" and "raw" dates, are all mutually exclusive with the other fields ("family", "given", "dropping-particle", "non-dropping-particle", "suffix", "comma-suffix", "static-ordering", and "parse-names" for names; "date-parts", "season", and "circa" for dates).

See https://github.com/Juris-M/citeproc-js/blob/master/manual/citeproc-doc.rst, https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html, and https://github.com/citation-style-language/test-suite (search for "raw" or "literal" to find the relevant tests). Some tests will need to be touched up (such as deleting empty "date-parts" fields for dates that have a "raw" string), but I didn't come across any cases that are truly incompatible with mutual exclusion.

@dhimmel

This comment has been minimized.

Contributor

dhimmel commented Dec 4, 2018

I'm assuming that "literal" names, as well as "literal" and "raw" dates, are all mutually exclusive with the other fields

I think it could be a bit over-constraining to have the schema enforce that literal and raw are mutually exclusive with other fields for dates/names. For example, let's say a resource returns a literal author name and an automated name parsing program infers the family and given name. In this case, the user may want to include literal, family, and given in the CSL JSON, since literal is the original data and family and given are inferred (and therefore possibly incorrect).

What is the benefit to enforcing mutual exclusivity? Is there real harm to having both literal/raw and parsed values? Because on the other hand, I do imagine there will be several CSL JSON files in the wild that do include both literal and raw values.

@rmzelle

This comment has been minimized.

Member

rmzelle commented Dec 4, 2018

What is the benefit to enforcing mutual exclusivity?

It would eliminate the need to dictate what the CSL processor should do if both literal/raw and parsed fields are provided.

@rmzelle

This comment has been minimized.

Member

rmzelle commented Dec 4, 2018

And I understand the trade-off between offering clearer input data for the CSL processor, or increasing flexibility in storing metadata. CSL JSON wasn't really designed as a general metadata exchange format, so I'm leaning more towards the former. Happy to switch back to your original proposal if we conclude the exclusivity requirements is too restrictive.

@dhimmel

This comment has been minimized.

Contributor

dhimmel commented Dec 5, 2018

It would eliminate the need to dictate what the CSL processor should do if both literal/raw and parsed fields are provided.

I imagined most style specifications would implement an if-else operation, but perhaps this is not the case? @adam3smith is it common to display both literal and parsed author information if both exist?

CSL JSON wasn't really designed as a general metadata exchange format

Indeed, although it does get used this way to some extent.

Since the current schema allows for literal and parsed values together (although perhaps this was not the original intention), no existing users require the schema to become more constraining. However, if it does become more constraining it could break existing usages.

Happy to switch back to your original proposal if we conclude the exclusivity requirements is too restrictive.

I'm interested to get another opinion on the matter. CC @adam3smith, @fbennett.

@rmzelle

This comment has been minimized.

Member

rmzelle commented Dec 5, 2018

is it common to display both literal and parsed author information if both exist?

No, and I don't see a way to combine these types of data and still generate properly formatted names (for a single author; it's of course no problem to have a mix of names that are each either literal or parsed). So I assume that any CSL processor would just ignore one type of data if both are provided. But unless we rule out this situation, there will be uncertainty what CSL processors will do unless we put the required behavior in the specification.

We can also just accept your original proposal, and file a separate ticket on this mutual exclusivity issue, if we can't come to a quick decision.

@adam3smith

This comment has been minimized.

Member

adam3smith commented Dec 6, 2018

I think I'm with Daniel on this based on the general principle of not breaking previously valid CSL JSON (and I don't think it matter hugely either way in terms of functionality)

@adam3smith

This comment has been minimized.

Member

adam3smith commented Dec 6, 2018

In other words, let's do this:

We can also just accept your original proposal, and file a separate ticket on this mutual exclusivity issue, if we can't come to a quick decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment