Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upmandatory absolute URI for anchor #117
Comments
hvdsomp
added
the
linkset
label
Jan 30, 2019
This comment has been minimized.
This comment has been minimized.
it seems this one currently has been resolved to require that anchors MUST be absolute. that makes sense as explained by @hvdsomp, but introduces a processing model that should be made very explicit in the spec: for serializing into the media types, all links must be parsed, and all anchors must be resolved to absolute URIs. |
This comment has been minimized.
This comment has been minimized.
On 2019-01-30 14:06, Herbert Van de Sompel wrote:
I see two possible approaches:
1. Require that each link in a linkset has an anchor with a complete
(non-relative) URI
2. In serializations, require an information element similar to BASE in
HTML to express a BaseURI for link anchors
Note that (2) could be achieved in JSON, especially when following
proposal #103 <#103> by @BigBlueHat
<https://github.com/BigBlueHat>, which is the plan. But, it could not be
achieved for the application/linkset serialization since it is a direct
mapping of the Link header syntax.
i think there is a third possibility to say that links with non-absolute
anchors MUST be ignored if there is no well-defined context for the
linkset. we could remain agnostic as to how this context is established.
this would have the advantage of not creating different data models for
native and JSON, and to still allow full round-trip fidelity from link
headers to linksets and back.
|
BigBlueHat
referenced this issue
Mar 13, 2019
Open
Value of `application/linkset` vs. existing `message/http` #120
This comment has been minimized.
This comment has been minimized.
I was working on trying to get this "ignore relative URIs" into a new version of the I-D, and I am afraid it just does not make sense to me. I explained the reasons already in the above, but I'll repeat:
As such, the confusion introduced by allowing relative URIs as such is rather significant and definitely error prone. But additionally, by allowing relative URIs, we are basically requiring the client to do the following:
Based on the above, I am not sure what the "allow relative URIs and ignore them" adds as value. Agreed that relative URIs are meaningful when they are part of an HTTP interaction when there is no doubt about what their baseURL is. But link sets will be used outside of HTTP interactions and not providing absolute URIs leads to confusion and more work for the client. |
This comment has been minimized.
This comment has been minimized.
I think allowing relative URIs will open up a minefield which we know developers are not ver ygood at handling (and particularly bad at generating). I see the point of this linkset media type is precisely to decouple the expression of the links from the resource that have the links. If the anchors (and targets) are allowed to be relative it opens for many confusion points:
I think it should be possible to use a linkset file detached from its origin - e.g. save it to disk or do a simple lookup - without having to further process other HTTP headers (beyond So I think a core principle should be that a linkset file is possible to save and reuse standalone without having to process anything first. This would be the case for both In that sense the motivation is the same as the simplicity of the N-Triples format where absolute URIs can be treated as opaque strings. Counter argumentssorry, have to be devil's advocate here.. Always having absolute URIs in the linkset means the client has to be more diligent of recording the URI they want to look up, e.g. Always using absolute URIs means the clients always have to calculate the absolute URI of the requested document (after redirections) rather than guessing because of neighbouring folders. In practice, check the It might be a HTTP server wants to read a neighbouring linkset file (e.g. Q: Is http://example.com/foo/../bar/ a valid absolute URI? Parsing as JSON-LDit depends if we say the URIs "SHOULD" be absolute or |
This comment has been minimized.
This comment has been minimized.
On 2019-08-01 07:40, Stian Soiland-Reyes wrote:
I see the point of this linkset media type is precisely to decouple the
expression of the links from the resource that have the links.
the point of the media type is to be as faithful as possible to the link
header model and format, and to allow round-tripping without introducing
complex processing rules. we should simply say that relative URIs are
meaningless without a context and should be ignored unless the context
can be preserved/determined in some shape or form.
|
This comment has been minimized.
This comment has been minimized.
On 2019-08-01 05:45, Herbert Van de Sompel wrote:
* Standalone: Link sets will be used in standalone manners, be merged
with other linkset, etc. So, one way or another, for these link sets
to be usable, the contained links will eventually need to be
expressed with absolute URIs for anchors and hrefs.
not if the context is preserved. how that's done is not for us to define
or determine, but linksets with relative URIs are perfectly fine
standalone when the context URI is preserved.
* Confusion: Following existing conventions, if a link in a linkset
has no anchor, then the URI of the link set resource (resource that
delivered the linkset) is the anchor. And, most likely, that is not
what is intended. Most likely what is intended is that the URI of
the origin resource is supposed to be the anchor.
true. i am not sure how to best deal with this. no matter how strong we
state that this interpretation is wrong, people will still do it.
* Confusion: Following existing conventions, if a link in a linkset
has an anchor and its value is a relative URI, then the URI of the
link set resource (resource that delivered the linkset) is the
baseURI for the anchor. And, most likely, that is not what is
intended. Most likely what is intended is that URI is relative to
that of the origin resource.
that's the same as the above, right?
* Confusion: Following existing conventions, if a link in a linkset
has an href with as value a relative URI, then the URI of the link
set resource (resource that delivered the linkset) is the baseURI
for the href. And, most likely, that is not what is intended. Most
likely what is intended is that URI is relative to that of the
origin resource.
that's the same as the above, right?
As such, the confusion introduced by allowing relative URIs as such is
rather significant and definitely error prone. But additionally, by
allowing relative URIs, we are basically requiring the client to do the
following:
1. If you know "somehow" know the baseURL to which the URIs are
relative, write them out as absolute URIs yourself, because we know
you will need to store the links in absolute URI terms.
no need to write them out. just standard URI resolution. but yes, you
need to know the URI to resolve against.
2. If you don't know the baseURL, ignore all relative URIs.
yes.
Based on the above, I am not sure what the "allow relative URIs and
ignore them" adds as value.
allowing the format to be roundtrip-able and not adding a processing
model that no doubt will be ignored by some implementations as well.
Agreed that relative URIs are meaningful
when they are part of an HTTP interaction when there is no doubt about
what their baseURL is. But link sets will be used outside of HTTP
interactions and not providing absolute URIs leads to confusion and more
work for the client.
you just shift the work differently, right? either require work to
resolve/filter upfront, or do it later. and in any case, even if we
allow relative, implementations would still be free to resolve if they
want to.
in fact, maybe that could be a good way to address this conundrum: have
a section on relative URIs and their issues. then add one sub-section on
what that means for direct reuse (context needs to be preserved or
relative URIs need to be updated), and one sub-section on resolving all
URIs and what that means (no more roundtrip-ability, allowing linksets
to be standalone).
|
This comment has been minimized.
This comment has been minimized.
The point made several times above is that the relative links will not be ignored because clients will process them the way they are used to process links. Hence the repeated use of "Confusion" above.
|
This comment has been minimized.
This comment has been minimized.
On 2019-08-01 08:44, Herbert Van de Sompel wrote:
* Regarding "shifting the work": Indeed. But the link set resource is
all about providing links. It's its sole reason of being. It was
invented for that purpose; it's a new entity in the ecology. So, let
it do the work rather telling existing entities to behave
differently and ignore relative URIs.
it's not about being lazy. it's about documenting semantics. if you
process application/linkset, you know the media type. if the media type
tells you to be mindful of relative URIs, that's what you need to do:
only resolve those if you know the original context, and ignore them
otherwise. that's perfectly well-defined and in fact allows more lazy
behavior (which is typically what happens in real life regardless of
what specs are trying to say).
asked the other way around: if you categorically disallow relative URIs
(which makes linkset incongruent with link headers), what rule do you
define if there is one in a linkset? and keep in mind that regardless of
what we are discussing here, you will find plenty of those in the wild,
because people are lazy.
|
This comment has been minimized.
This comment has been minimized.
If one finds a relative URI in a format that looks a lot like other document formats with links, lazy as one is, one will do what is typically done in those cases: use the URL of the responding resource as base for the link. And hence end up with a whole bunch of unintended links. This will happen because parties that consume these links will feel they don’t need to read the spec. The link set documents, in both formats, are rather self explanatory from a consumption perspective, after all.
In order to avoid the misinterpretation of links with relative URIs, define the format so not to allow relative URIs. Parties that publish link sets will need to read the spec whichever way (what’s intuitive to consume is not intuitive to create, definitely not for the JSON) and hence will learn they need to write absolute URIs. Which prevents the above problem from happening. If it happens anyhow, the document is in violation of the format spec.
I explained in a previous comment why I think the “lack of congruence” argument is IMO a red herring.
… On Aug 1, 2019, at 17:51, Erik Wilde ***@***.***> wrote:
On 2019-08-01 08:44, Herbert Van de Sompel wrote:
> * Regarding "shifting the work": Indeed. But the link set resource is
> all about providing links. It's its sole reason of being. It was
> invented for that purpose; it's a new entity in the ecology. So, let
> it do the work rather telling existing entities to behave
> differently and ignore relative URIs.
it's not about being lazy. it's about documenting semantics. if you
process application/linkset, you know the media type. if the media type
tells you to be mindful of relative URIs, that's what you need to do:
only resolve those if you know the original context, and ignore them
otherwise. that's perfectly well-defined and in fact allows more lazy
behavior (which is typically what happens in real life regardless of
what specs are trying to say).
asked the other way around: if you categorically disallow relative URIs
(which makes linkset incongruent with link headers), what rule do you
define if there is one in a linkset? and keep in mind that regardless of
what we are discussing here, you will find plenty of those in the wild,
because people are lazy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This comment has been minimized.
This comment has been minimized.
On 2019-08-01 09:22, Herbert Van de Sompel wrote:
If one finds a relative URI in a format that looks a lot like other
document formats with links, lazy as one is, one will do what is
typically done in those cases: use the URL of the responding resource as
base for the link. And hence end up with a whole bunch of unintended
links. This will happen because parties that consume these links will
feel they don’t need to read the spec. The link set documents, in both
formats, are rather self explanatory from a consumption perspective,
after all.
all understood, but my question was what you'd define in the spec about
processing relative URIs if you disallow them. semantics are undefined?
they must be ignored? the whole linkset must be ignored?
the point is: undoubtedly these linkset will exist en masse, if we say
"linkset is like a link header, but not quite". assuming that everybody
will resolve/normalize is unrealistic. defining what to do would be
useful. what would you define?
In order to avoid the misinterpretation of links with relative URIs,
define the format so not to allow relative URIs.
i do understand this. but we're not on a green field here. we started
out with the mission to define a media type for link header field
payloads, and not something new.
Parties that publish
link sets will need to read the spec whichever way (what’s intuitive to
consume is not intuitive to create, definitely not for the JSON) and
hence will learn they need to write absolute URIs. Which prevents the
above problem from happening. If it happens anyhow, the document is in
violation of the format spec.
but now what? these things will exist en masse, even if you tell people
that when they do what they will do that they do the wrong thing.
I explained in a previous comment why I think the “lack of congruence”
argument is IMO a red herring.
what's red herring about it? i'd like to be able to simply store the
contents of a link header field. you're telling me i can't do that. i
think that's a discussion that can be had.
|
This comment has been minimized.
This comment has been minimized.
You state: we started out with the mission to define a media type for link header field payloads, and not something new.
That is really incorrect. We started on a mission to provide links by reference (instead of by value in the Link header of the origin resource) and provide motivations in the I-D why this is valuable. And we decided to use the format of the Link header as the basis, because that’s what exists. And then we started understanding that there is something different about providing links by reference:
- relative URIs can be misinterpreted
- sets of links will be used in standalone manners, disconnected from HTTP interactions
… On Aug 1, 2019, at 18:34, Erik Wilde ***@***.***> wrote:
On 2019-08-01 09:22, Herbert Van de Sompel wrote:
> If one finds a relative URI in a format that looks a lot like other
> document formats with links, lazy as one is, one will do what is
> typically done in those cases: use the URL of the responding resource as
> base for the link. And hence end up with a whole bunch of unintended
> links. This will happen because parties that consume these links will
> feel they don’t need to read the spec. The link set documents, in both
> formats, are rather self explanatory from a consumption perspective,
> after all.
all understood, but my question was what you'd define in the spec about
processing relative URIs if you disallow them. semantics are undefined?
they must be ignored? the whole linkset must be ignored?
the point is: undoubtedly these linkset will exist en masse, if we say
"linkset is like a link header, but not quite". assuming that everybody
will resolve/normalize is unrealistic. defining what to do would be
useful. what would you define?
> In order to avoid the misinterpretation of links with relative URIs,
> define the format so not to allow relative URIs.
i do understand this. but we're not on a green field here. we started
out with the mission to define a media type for link header field
payloads, and not something new.
> Parties that publish
> link sets will need to read the spec whichever way (what’s intuitive to
> consume is not intuitive to create, definitely not for the JSON) and
> hence will learn they need to write absolute URIs. Which prevents the
> above problem from happening. If it happens anyhow, the document is in
> violation of the format spec.
but now what? these things will exist en masse, even if you tell people
that when they do what they will do that they do the wrong thing.
> I explained in a previous comment why I think the “lack of congruence”
> argument is IMO a red herring.
what's red herring about it? i'd like to be able to simply store the
contents of a link header field. you're telling me i can't do that. i
think that's a discussion that can be had.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
hvdsomp commentedJan 30, 2019
•
edited
For a link in the HTTP Link header, the following holds:
Applying the above to a linkset yields:
The current linkset I-D has explicit cautionary language with this regard, anticipating that the above behavior is most likely not what implementers would want to achieve.
In addition, contrary to a typical use of links in an HTTP header (follow your nose during an HTTP navigation session), links in a link set may be used in a standalone manner, meaning disconnected from the link set resource that - as per the above - is supposed to provide URI/baseURI for anchors. In such standalone uses, the information about the link set resources that provided the linkset may no longer be available. As such, it would be good if linksets would be self-contained, i.e. be explicit with regard to what the anchor of each link is.
The above tries to make the point that it would be beneficial to simultaneously:
I see two possible approaches:
Note that (2) could be achieved in JSON, especially when following proposal #103 by @BigBlueHat, which is the plan. But, it could not be achieved for the application/linkset serialization since it is a direct mapping of the Link header syntax.