Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mandatory absolute URI for anchor #117

Open
hvdsomp opened this issue Jan 30, 2019 · 11 comments

Comments

@hvdsomp
Copy link
Collaborator

commented Jan 30, 2019

For a link in the HTTP Link header, the following holds:

  • If the link has no anchor, then the URI of the resource that delivered the HTTP Link header (responding resource) is the anchor
  • If the link has an anchor and its value is a relative URI, then the URI of the resource that delivered the HTTP Link header (responding resource) is the baseURI for the anchor

Applying the above to a linkset yields:

  • If a link in a linkset has no anchor, then the URI of the link set resource (resource that delivered the linkset) is the anchor
  • If a link in a linkset has an anchor and its value is a relative URI, then the URI of the link set resource (resource that delivered the linkset) is the baseURI for the anchor

The current linkset I-D has explicit cautionary language with this regard, anticipating that the above behavior is most likely not what implementers would want to achieve.

In addition, contrary to a typical use of links in an HTTP header (follow your nose during an HTTP navigation session), links in a link set may be used in a standalone manner, meaning disconnected from the link set resource that - as per the above - is supposed to provide URI/baseURI for anchors. In such standalone uses, the information about the link set resources that provided the linkset may no longer be available. As such, it would be good if linksets would be self-contained, i.e. be explicit with regard to what the anchor of each link is.

The above tries to make the point that it would be beneficial to simultaneously:

  • Avoid the risk of misinterpretation of link anchors, which is warned for by explicit language in the I-D
  • Make linksets self-contained regarding link anchors

I see two possible approaches:

  1. Require that each link in a linkset has an anchor with an absolute URI
  2. In serializations, require an information element similar to BASE in HTML to express a BaseURI for link anchors

Note that (2) could be achieved in JSON, especially when following proposal #103 by @BigBlueHat, which is the plan. But, it could not be achieved for the application/linkset serialization since it is a direct mapping of the Link header syntax.

@hvdsomp hvdsomp added the linkset label Jan 30, 2019

@dret

This comment has been minimized.

Copy link
Owner

commented Mar 6, 2019

it seems this one currently has been resolved to require that anchors MUST be absolute. that makes sense as explained by @hvdsomp, but introduces a processing model that should be made very explicit in the spec: for serializing into the media types, all links must be parsed, and all anchors must be resolved to absolute URIs.

@dret

This comment has been minimized.

Copy link
Owner

commented Mar 6, 2019

@hvdsomp

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 1, 2019

I was working on trying to get this "ignore relative URIs" into a new version of the I-D, and I am afraid it just does not make sense to me. I explained the reasons already in the above, but I'll repeat:

  • Standalone: Link sets will be used in standalone manners, be merged with other linkset, etc. So, one way or another, for these link sets to be usable, the contained links will eventually need to be expressed with absolute URIs for anchors and hrefs.
  • Confusion: Following existing conventions, if a link in a linkset has no anchor, then the URI of the link set resource (resource that delivered the linkset) is the anchor. And, most likely, that is not what is intended. Most likely what is intended is that the URI of the origin resource is supposed to be the anchor.
  • Confusion: Following existing conventions, if a link in a linkset has an anchor and its value is a relative URI, then the URI of the link set resource (resource that delivered the linkset) is the baseURI for the anchor. And, most likely, that is not what is intended. Most likely what is intended is that URI is relative to that of the origin resource.
  • Confusion: Following existing conventions, if a link in a linkset has an href with as value a relative URI, then the URI of the link set resource (resource that delivered the linkset) is the baseURI for the href. And, most likely, that is not what is intended. Most likely what is intended is that URI is relative to that of the origin resource.

As such, the confusion introduced by allowing relative URIs as such is rather significant and definitely error prone. But additionally, by allowing relative URIs, we are basically requiring the client to do the following:

  1. If you know "somehow" know the baseURL to which the URIs are relative, write them out as absolute URIs yourself, because we know you will need to store the links in absolute URI terms.
  2. If you don't know the baseURL, ignore all relative URIs.

Based on the above, I am not sure what the "allow relative URIs and ignore them" adds as value. Agreed that relative URIs are meaningful when they are part of an HTTP interaction when there is no doubt about what their baseURL is. But link sets will be used outside of HTTP interactions and not providing absolute URIs leads to confusion and more work for the client.

@stain

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

I think allowing relative URIs will open up a minefield which we know developers are not ver ygood at handling (and particularly bad at generating).

I see the point of this linkset media type is precisely to decouple the expression of the links from the resource that have the links.

If the anchors (and targets) are allowed to be relative it opens for many confusion points:

  • Relative to the original resource, or relative to the linkset? All Linked Data and Web practices for base URI (e.g. as in CSS) says it should be the second, but developers may expect the first ("hey, you sent me here!")
  • False assumptions based on accidentally relative co-location - e.g. both in same folder, both use /bar - but a later move breaks those clients
  • Possibly duplicates with different relative paths (./foo vs foo vs /foo vs //example.com/foo vs http://example.com/foo) -- how are these nodes merged or looked up? Sounds like you need a full graph store! (This applies for both text and JSON variants)
  • Can extension relation types be relative URIs?
  • Does the relative URI <> ("" in JSON) mean the linkset itself?

I think it should be possible to use a linkset file detached from its origin - e.g. save it to disk or do a simple lookup - without having to further process other HTTP headers (beyond Content-Type) to interpret it - e.g. having to deal with Content-Location on the linkset resource.

So I think a core principle should be that a linkset file is possible to save and reuse standalone without having to process anything first. This would be the case for both application/linkset and application/linkset+json variants.

In that sense the motivation is the same as the simplicity of the N-Triples format where absolute URIs can be treated as opaque strings.

Counter arguments

sorry, have to be devil's advocate here..

Always having absolute URIs in the linkset means the client has to be more diligent of recording the URI they want to look up, e.g. http://example.com/foo vs http://www.example.com/foo vs https://www.example.com/foo vs https://www.example.com/foo.html vs https://www.example.com/foo.html?query vs https://www.example.com/foo.html?query#baz

Always using absolute URIs means the clients always have to calculate the absolute URI of the requested document (after redirections) rather than guessing because of neighbouring folders. In practice, check the Content-Location according to rfc7231 section 3.1.4.1 so they can find it again in the linkset. However they have to do that anyway also if relative URIs are allowed as you don't know if absolute URIs are used or not.

It might be a HTTP server wants to read a neighbouring linkset file (e.g. ./.linkset.json) to produce the Link headers. In this case it would be much better if the paths are relative so they work well however it is served. Same argument for someone storing a linkset file outside the web, e.g. GitHub repo.

Q: Is http://example.com/foo/../bar/ a valid absolute URI?

Parsing as JSON-LD

it depends if we say the URIs "SHOULD" be absolute or MUST be absolute. In the first case normal JSON-LD parsing with the Content-Location etc of the linkset will work well. In the second case a conforming parser should instead parse it with @context: { .., "@base": "invalid:///"} to avoid accidental relative URIs. These can then be filtered out from the triples.

@dret

This comment has been minimized.

Copy link
Owner

commented Aug 1, 2019

@dret

This comment has been minimized.

Copy link
Owner

commented Aug 1, 2019

@hvdsomp

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 1, 2019

The point made several times above is that the relative links will not be ignored because clients will process them the way they are used to process links. Hence the repeated use of "Confusion" above.

  • Regarding "faithful": When mandating absolute URIs the media type remains faithful to the link header model. Link headers can represent absolute URIs.
  • Regarding round tripping: The round tripping that may be required is between documents in application/linkset and documents in application/linkset+json. There is no need to roundtrip between links in a Link header and links in link set documents. Because those links are provided by different parties: the origin resource and the link set resource, respectively. The former in the context of an HTTP interaction (hence relative URIs are OK and interpreted relative to the origin's URL), the latter not (hence relative URIs are not OK because they must be interpreted relative to the link set resource, which is most likely what is not intended).
  • Regarding "shifting the work": Indeed. But the link set resource is all about providing links. It's its sole reason of being. It was invented for that purpose; it's a new entity in the ecology. So, let it do the work rather telling existing entities to behave differently and ignore relative URIs.
@dret

This comment has been minimized.

Copy link
Owner

commented Aug 1, 2019

@hvdsomp

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 1, 2019

@dret

This comment has been minimized.

Copy link
Owner

commented Aug 1, 2019

@hvdsomp

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.