New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memento headers and commas #239

Open
acoburn opened this Issue Oct 12, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@acoburn
Member

acoburn commented Oct 12, 2018

The Memento specification describes two optional parameters in the associated link headers: from and until. These parameters, if included, MUST be formatted according to RFC 1123. For example: until="Wed, 20 Jan 2010 09:34:33 GMT"

Generating this value is no problem. The problem is that many downstream libraries that parse Link headers don't expect a comma to appear inside a link, even though putting it inside a quoted parameter is legit. This is actually a pretty significant problem from a practical perspective, and I would like to suggest that the from and until parameters be removed from the memento headers by default.

I will introduce a configuration property that can be set to enable those parameters, but I would like the default for them to be turned off.

@acoburn acoburn added the area/http label Oct 12, 2018

@acoburn acoburn added this to the 0.8.0 Release milestone Oct 12, 2018

@ajs6f

This comment has been minimized.

Show comment
Hide comment
@ajs6f

ajs6f Oct 12, 2018

Member

Shouldn't we be engaging with the Memento community as well about something like this?

Member

ajs6f commented Oct 12, 2018

Shouldn't we be engaging with the Memento community as well about something like this?

@acoburn

This comment has been minimized.

Show comment
Hide comment
@acoburn

acoburn Oct 12, 2018

Member

That's a really good idea. I'll send a question to their dev list.

Member

acoburn commented Oct 12, 2018

That's a really good idea. I'll send a question to their dev list.

@ajs6f

This comment has been minimized.

Show comment
Hide comment
@ajs6f

ajs6f Oct 12, 2018

Member

Yeah, @hvdsomp is very responsive and I guarantee you'll get some good discussion.

Member

ajs6f commented Oct 12, 2018

Yeah, @hvdsomp is very responsive and I guarantee you'll get some good discussion.

acoburn added a commit that referenced this issue Oct 12, 2018

@acoburn acoburn referenced a pull request that will close this issue Oct 12, 2018

Open

Suppress timemap link parameters by default #240

acoburn added a commit that referenced this issue Oct 14, 2018

@martinklein0815

This comment has been minimized.

Show comment
Hide comment
@martinklein0815

martinklein0815 Oct 15, 2018

Thanks for the ping, @ajs6f and @acoburn! I am replying on behalf of @hvdsomp and other members of the Memento crew.

Suppressing the optional "from" and "until" parameters in a timemap link is definitely a pragmatic solution but one that comes with the loss of relevant information - the time span covered in the timemap.
The preferable solution, in our opinion, is to use/promote the use of HTTP link header parsers that do "the right thing", meaning the recognition of commas in headers and their recognition in context. This would likely also help for parsing other headers that hold a datetime such as Memento datetime, Last-Modified, and Date. From our experience, the popular Python-based "requests" library is a good example for a library that distinguishes between commas to separate link headers and commas used within quotes (like for the Memento datetime). For the following link header:

Link: https://github.com/leeper/pdfcount; rel="original", https://scholarlyorphans.org/memento/https://github.com/leeper/pdfcount; rel="timegate", https://scholarlyorphans.org/memento/timemap/link/https://github.com/leeper/pdfcount; rel="timemap"; type="application/link-format", https://scholarlyorphans.org/memento/20180828012815/https://github.com/leeper/pdfcount; rel="memento"; datetime="Tue, 28 Aug 2018 01:28:15 GMT"; collection="memento"

the parsed and formatted output is:
{
'original': {'url': 'https://github.com/leeper/pdfcount', 'rel': 'original'},
'timegate': {'url': 'https://scholarlyorphans.org/memento/https://github.com/leeper/pdfcount', 'rel': 'timegate'},
'timemap': {'url': 'https://scholarlyorphans.org/memento/timemap/link/https://github.com/leeper/pdfcount', 'rel': 'timemap', 'type': 'application/link-format'},
'memento': {'url': 'https://scholarlyorphans.org/memento/20180828012815/https://github.com/leeper/pdfcount', 'rel': 'memento', 'datetime': 'Tue, 28 Aug 2018 01:28:15 GMT', 'collection': 'memento'}
}

and for the link header with from and until parameters:

Link: http://a.example.org/; rel="original timegate", http://a.example.org/?version=all&style=timemap ; rel="timemap"; type="application/link-format" ; from="Tue, 15 Sep 2000 11:28:26 GMT" ; until="Wed, 20 Jan 2010 09:34:33 GMT"

it outputs:
{
'original timegate': {'url': 'http://a.example.org/', 'rel': 'original timegate'},
'timemap': {'url': 'http://a.example.org/?version=all&style=timemap', 'rel': 'timemap', 'type': 'application/link-format', 'from': 'Tue, 15 Sep 2000 11:28:26 GMT', 'until': 'Wed, 20 Jan 2010 09:34:33 GMT'}
}

We are not aware of any many Java-based libraries that perform equally well. We developed our own Java-based library to correctly parse headers and we are happy to share the code base. Our implementation in based on input from here:
https://jar-download.com/artifacts/org.jboss.resteasy.mobile/resteasy-mobile/1.0.0/source-code/org/jboss/resteasy/plugins/delegates/LinkHeaderDelegate.java

Another example of a project that adopted the same code base to implement a link header parser is:
https://github.com/temenostech/IRIS/blob/master/interaction-core/src/main/java/com/temenos/interaction/core/hypermedia/LinkHeaderDelegate.java

Regardless, we would be interested in seeing our custom implementation integrated into off-the-shelf Java parsers - can you see a collaborative path to make this happen?

martinklein0815 commented Oct 15, 2018

Thanks for the ping, @ajs6f and @acoburn! I am replying on behalf of @hvdsomp and other members of the Memento crew.

Suppressing the optional "from" and "until" parameters in a timemap link is definitely a pragmatic solution but one that comes with the loss of relevant information - the time span covered in the timemap.
The preferable solution, in our opinion, is to use/promote the use of HTTP link header parsers that do "the right thing", meaning the recognition of commas in headers and their recognition in context. This would likely also help for parsing other headers that hold a datetime such as Memento datetime, Last-Modified, and Date. From our experience, the popular Python-based "requests" library is a good example for a library that distinguishes between commas to separate link headers and commas used within quotes (like for the Memento datetime). For the following link header:

Link: https://github.com/leeper/pdfcount; rel="original", https://scholarlyorphans.org/memento/https://github.com/leeper/pdfcount; rel="timegate", https://scholarlyorphans.org/memento/timemap/link/https://github.com/leeper/pdfcount; rel="timemap"; type="application/link-format", https://scholarlyorphans.org/memento/20180828012815/https://github.com/leeper/pdfcount; rel="memento"; datetime="Tue, 28 Aug 2018 01:28:15 GMT"; collection="memento"

the parsed and formatted output is:
{
'original': {'url': 'https://github.com/leeper/pdfcount', 'rel': 'original'},
'timegate': {'url': 'https://scholarlyorphans.org/memento/https://github.com/leeper/pdfcount', 'rel': 'timegate'},
'timemap': {'url': 'https://scholarlyorphans.org/memento/timemap/link/https://github.com/leeper/pdfcount', 'rel': 'timemap', 'type': 'application/link-format'},
'memento': {'url': 'https://scholarlyorphans.org/memento/20180828012815/https://github.com/leeper/pdfcount', 'rel': 'memento', 'datetime': 'Tue, 28 Aug 2018 01:28:15 GMT', 'collection': 'memento'}
}

and for the link header with from and until parameters:

Link: http://a.example.org/; rel="original timegate", http://a.example.org/?version=all&style=timemap ; rel="timemap"; type="application/link-format" ; from="Tue, 15 Sep 2000 11:28:26 GMT" ; until="Wed, 20 Jan 2010 09:34:33 GMT"

it outputs:
{
'original timegate': {'url': 'http://a.example.org/', 'rel': 'original timegate'},
'timemap': {'url': 'http://a.example.org/?version=all&style=timemap', 'rel': 'timemap', 'type': 'application/link-format', 'from': 'Tue, 15 Sep 2000 11:28:26 GMT', 'until': 'Wed, 20 Jan 2010 09:34:33 GMT'}
}

We are not aware of any many Java-based libraries that perform equally well. We developed our own Java-based library to correctly parse headers and we are happy to share the code base. Our implementation in based on input from here:
https://jar-download.com/artifacts/org.jboss.resteasy.mobile/resteasy-mobile/1.0.0/source-code/org/jboss/resteasy/plugins/delegates/LinkHeaderDelegate.java

Another example of a project that adopted the same code base to implement a link header parser is:
https://github.com/temenostech/IRIS/blob/master/interaction-core/src/main/java/com/temenos/interaction/core/hypermedia/LinkHeaderDelegate.java

Regardless, we would be interested in seeing our custom implementation integrated into off-the-shelf Java parsers - can you see a collaborative path to make this happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment