Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upMake it possible to restrict the number of memento-related headers #402
Comments
acoburn
added
memento
area/http
labels
May 15, 2019
acoburn
added this to To Do
in Trellis Linked Data server
via automation
May 15, 2019
acoburn
added this to the 0.9.0 Release milestone
May 15, 2019
This comment has been minimized.
This comment has been minimized.
mjgiarlo
commented
May 15, 2019
@acoburn Like you say, the TimeMap should contain the full set. I wonder whether in that case it would be defensible to return 0 Memento versions and assume the client will GET the TimeMap and act on it should the client need access to versions? I don't have strong inclinations though. Any of the ideas you propose would likely work for our purposes. |
This comment has been minimized.
This comment has been minimized.
Can we call on the Memento gang for this? @azaroth42, @hvdsomp, @phonedude, any thoughts? (And thanks in advance!) Has this problem come up for Memento sites (seems like it must have at some point)? |
This comment has been minimized.
This comment has been minimized.
mjgiarlo
commented
May 15, 2019
|
This comment has been minimized.
This comment has been minimized.
@mjgiarlo or possibly just the first and last version URLs. Given the Java types involved, that would be an exceedingly cheap operation, and putting these values in Link headers is nice for LDP clients, even if it is only a small subset. |
This comment has been minimized.
This comment has been minimized.
mjgiarlo
commented
May 15, 2019
@acoburn That too would be |
This comment has been minimized.
This comment has been minimized.
azaroth42
commented
May 15, 2019
When I was working on Memento, our systems only ever included: First, Last, Timemap, Timegate, Prev, Next. For very thorough archives, such as content management systems with full version persistence like wikipedia, the number of mementos is overwhelming, and the timemap has the full list. HTTP header links are good for navigation, not for data management :) |
This comment has been minimized.
This comment has been minimized.
hvdsomp
commented
May 15, 2019
What Rob said, below.
The one absolutely crucial link is the “timegate” link.
The “timemap” link is very very nice to have but could also be retrieved via the TimeGate. I wouldn’t ditch it on Mementos though.
The “first/last/next/prev” links serve machine navigation purposes but are also handy in UIs, see for example the Time Travel search results. Whether you want these really depends on use cases you want to serve.
The “memento” links are icing on the cake. We used to include a single “memento” link to recursively point at the Memento in which the link is embedded, ie the Memento links to itself to say it’s a Memento. But that’s really redundant because the correct way to figure whether a resource is a Memento is by checking whether it has a Memento-Datetime header. Other than that, if one wants an overview of Mementos, as Rob said, the TimeMap is the place to go.
So, bottom line: a minimum of 1 (preferably 2) and a maximum of 6 Memento protocol links will do.
Cheers
Herbert
Cheers
Herbert
… On May 15, 2019, at 17:42, Rob Sanderson ***@***.***> wrote:
When I was working on Memento, our systems only ever included: First, Last, Timemap, Timegate, Prev, Next.
For very thorough archives, such as content management systems with full version persistence like wikipedia, the number of mementos is overwhelming, and the timemap has the full list. HTTP header links are good for navigation, not for data management :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
acoburn commentedMay 15, 2019
•
edited
Over time, the number of Memento headers will continue to grow for a given resource. It turns out that certain HTTP proxies might encounter issues with such a large number of (uncompressed) link headers. Given that it will always be possible to retrieve a TimeMap resource with the complete list of Memento URLs for a resource, it seems sensible to (a) provide a default limit on the number of Memento headers produced in GET responses and (b) make that limit configurable.
What would folks think about a scenario in which, by default, only the most recent 20 Memento versions are listed in the headers. Alternatively, it would also be possible to include the first memento and the last, say, 19 version URLs. The number 20 is somewhat arbitrary here, but it seems like an OK value, unless folks think it should be lower (or higher).
/cc @mjgiarlo @jermnelson @jmartin-sul