Remove duplicate literal types #155

dhimmel · Aug 10, 2018

Closes #154

I verified the following executed without error (but have not done further testing):

import jsonref
import jsonschema
url = 'https://github.com/dhimmel/schema/raw/01d112e0ba606c4f61fcc9e27e4f623f5aad8da6/csl-data.json'
schema = jsonref.load_uri(url, jsonschema=True)
jsonschema.Draft3Validator.check_schema(schema)

Another question is whether there is a larger simplification that can be made. Should type remain a list in these instances?

rmzelle · Aug 10, 2018

the following executed without error

Sorry for being dense, but that is a Python script, right?

Another question is whether there is a larger simplification that can be made. Should type remain a list in these instances?

What exactly are you proposing to change? What would the alternative be?

dhimmel · Aug 10, 2018

Sorry for being dense, but that is a Python script, right?

Ah yes!

What exactly are you proposing to change? What would the alternative be?

Refer to this section of the current schema:

schema/csl-data.json

Lines 74 to 128 in 4846e02

    
           "id": "name-variable", 
        
           "type": [ 
        
               { 
        
                   "properties": { 
        
                       "family" : { 
        
                           "type": "string"  
        
                       }, 
        
                       "given" : { 
        
                           "type": "string"  
        
                       }, 
        
                       "dropping-particle" : { 
        
                           "type": "string"  
        
                       }, 
        
                       "non-dropping-particle" : { 
        
                           "type": "string"  
        
                       }, 
        
                       "suffix" : { 
        
                           "type": "string"  
        
                       }, 
        
                       "comma-suffix" : { 
        
                           "type": [ 
        
                               "string", 
        
                               "number", 
        
                               "boolean"  
        
                           ]  
        
                       }, 
        
                       "static-ordering" : { 
        
                           "type": [ 
        
                               "string", 
        
                               "number", 
        
                               "boolean"  
        
                           ]  
        
                       }, 
        
                       "literal" : { 
        
                           "type": "string"  
        
                       }, 
        
                       "parse-names" : { 
        
                           "type": [ 
        
                               "string", 
        
                               "number", 
        
                               "boolean"  
        
                           ]  
        
                       }  
        
                   }, 
        
                   "additionalProperties" : false  
        
               }, 
        
               { 
        
                   "properties": { 
        
                       "literal" : { 
        
                           "type": "string"  
        
                       }  
        
                   }, 
        
                   "additionalProperties" : false  
        
               }  
        
           ]

This is defining name-variable, but type is an array. Previously, the array had multiple options, but post this PR it will only have one. Therefore should type remain an array or be refactored somehow? I'm not very familiar with how JSON Schemas are definited so bear with me.

rmzelle · Dec 4, 2018

@adam3smith, @fbennett, could you give this a quick look?

The problem was that we defined the "literal" field for names and dates twice in csl-data.json, which apparently trips up some JSON validators. This pull request eliminates this redundancy. I'm assuming that "literal" names, as well as "literal" and "raw" dates, are all mutually exclusive with the other fields ("family", "given", "dropping-particle", "non-dropping-particle", "suffix", "comma-suffix", "static-ordering", and "parse-names" for names; "date-parts", "season", and "circa" for dates).

See https://github.com/Juris-M/citeproc-js/blob/master/manual/citeproc-doc.rst, https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html, and https://github.com/citation-style-language/test-suite (search for "raw" or "literal" to find the relevant tests). Some tests will need to be touched up (such as deleting empty "date-parts" fields for dates that have a "raw" string), but I didn't come across any cases that are truly incompatible with mutual exclusion.

dhimmel · Dec 4, 2018

I'm assuming that "literal" names, as well as "literal" and "raw" dates, are all mutually exclusive with the other fields

I think it could be a bit over-constraining to have the schema enforce that literal and raw are mutually exclusive with other fields for dates/names. For example, let's say a resource returns a literal author name and an automated name parsing program infers the family and given name. In this case, the user may want to include literal, family, and given in the CSL JSON, since literal is the original data and family and given are inferred (and therefore possibly incorrect).

What is the benefit to enforcing mutual exclusivity? Is there real harm to having both literal/raw and parsed values? Because on the other hand, I do imagine there will be several CSL JSON files in the wild that do include both literal and raw values.

rmzelle · Dec 4, 2018

What is the benefit to enforcing mutual exclusivity?

It would eliminate the need to dictate what the CSL processor should do if both literal/raw and parsed fields are provided.

rmzelle · Dec 4, 2018

And I understand the trade-off between offering clearer input data for the CSL processor, or increasing flexibility in storing metadata. CSL JSON wasn't really designed as a general metadata exchange format, so I'm leaning more towards the former. Happy to switch back to your original proposal if we conclude the exclusivity requirements is too restrictive.

dhimmel · Dec 5, 2018

It would eliminate the need to dictate what the CSL processor should do if both literal/raw and parsed fields are provided.

I imagined most style specifications would implement an if-else operation, but perhaps this is not the case? @adam3smith is it common to display both literal and parsed author information if both exist?

CSL JSON wasn't really designed as a general metadata exchange format

Indeed, although it does get used this way to some extent.

Since the current schema allows for literal and parsed values together (although perhaps this was not the original intention), no existing users require the schema to become more constraining. However, if it does become more constraining it could break existing usages.

Happy to switch back to your original proposal if we conclude the exclusivity requirements is too restrictive.

I'm interested to get another opinion on the matter. CC @adam3smith, @fbennett.

rmzelle · Dec 5, 2018

is it common to display both literal and parsed author information if both exist?

No, and I don't see a way to combine these types of data and still generate properly formatted names (for a single author; it's of course no problem to have a mix of names that are each either literal or parsed). So I assume that any CSL processor would just ignore one type of data if both are provided. But unless we rule out this situation, there will be uncertainty what CSL processors will do unless we put the required behavior in the specification.

We can also just accept your original proposal, and file a separate ticket on this mutual exclusivity issue, if we can't come to a quick decision.

adam3smith · Dec 6, 2018

I think I'm with Daniel on this based on the general principle of not breaking previously valid CSL JSON (and I don't think it matter hugely either way in terms of functionality)

adam3smith · Dec 6, 2018

In other words, let's do this:

We can also just accept your original proposal, and file a separate ticket on this mutual exclusivity issue, if we can't come to a quick decision.

Remove duplicate literal types
Refs #154

01d112e

Update csl-data.json

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

9075e00

Update csl-data.json

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

eade14e

citation-style-language/schema

Remove duplicate literal types #155

Remove duplicate literal types #155

dhimmel commented Aug 10, 2018

This comment has been minimized.

rmzelle commented Aug 10, 2018

This comment has been minimized.

dhimmel commented Aug 10, 2018

This comment has been minimized.

rmzelle commented Dec 4, 2018

This comment has been minimized.

dhimmel commented Dec 4, 2018

This comment has been minimized.

rmzelle commented Dec 4, 2018

This comment has been minimized.

rmzelle commented Dec 4, 2018

This comment has been minimized.

dhimmel commented Dec 5, 2018

This comment has been minimized.

rmzelle commented Dec 5, 2018

This comment has been minimized.

adam3smith commented Dec 6, 2018

This comment has been minimized.

adam3smith commented Dec 6, 2018

citation-style-language/schema

Join GitHub today

Remove duplicate literal types #155

Conversation

dhimmel commented Aug 10, 2018

This comment has been minimized.

rmzelle commented Aug 10, 2018

This comment has been minimized.

dhimmel commented Aug 10, 2018

This comment has been minimized.

rmzelle commented Dec 4, 2018

This comment has been minimized.

dhimmel commented Dec 4, 2018

This comment has been minimized.

rmzelle commented Dec 4, 2018

This comment has been minimized.

rmzelle commented Dec 4, 2018

This comment has been minimized.

dhimmel commented Dec 5, 2018

This comment has been minimized.

rmzelle commented Dec 5, 2018

This comment has been minimized.

adam3smith commented Dec 6, 2018

This comment has been minimized.

adam3smith commented Dec 6, 2018