Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upQuestion: Parallelization and Memento (in particular) #289
Comments
This comment has been minimized.
This comment has been minimized.
@atz I know little about the RDB backend, but from a theoretical POV, I'm not sure your reading of Memento is correct. Keep in mind that the Memento RFC does not define write behavior, so this is an area that we (and several other projects) are currently exploring with muck boots and mosquito veils. I understand the timestamp of a Memento to be entirely determined by the server, not by client behavior in any way, so I encourage you to bring this up at the Memento Google group and let us know what results. @hvdsomp is usually very willing to answer questions and @martinklein0815 has been so kind as to check in on and speak to issues here at Trellis. |
This comment has been minimized.
This comment has been minimized.
The application that is built for the Trellis is structured in a way that the core of it just presents a Java API and there can be different implementations of those APIs (here, the file-based Memento storage is just a simple implementation). The interfaces are, generally, pretty simple (the resource layer is the most complex), and the memento interface is one of the more simple to implement. While file-based persistence is useful in certain contexts, it is definitely problematic in other contexts. One thing you might want to take a look at is the trellis-ext-aws project, which implements an S3-based MementoService, along with an S3-based BinaryService and a SNS-based NotificationService. That project is still maturing, but for multi-node (especially cloud-based) deployments, that kind of structure (avoiding files) will be much more appropriate. |
This comment has been minimized.
This comment has been minimized.
And to the point about parallelization, the Trellis HTTP layer is built in such a way that it is entirely stateless, which means that -- so long as the persistence layer(s) are external to a given web node -- there can be an arbitrary number of web nodes running. For that sort of deployment strategy, a distributed (persistence) backend clearly makes the most sense, and that's where things are heading with the Cassandra-based backend. |
This comment has been minimized.
This comment has been minimized.
atz
commented
Nov 9, 2018
I definitely looked at
Right, but with parallelization, what the RFC defines as one (logical) server is actually several. The presumption that they all always have exactly the same system time is impossible. Maybe Trellis' Memento writes happening only downstream from a queue or DB means it isn't as much a problem as I perceived? That is a level of detailed implementation knowledge that I can't speak to. |
This comment has been minimized.
This comment has been minimized.
@atz I was thinking of the client - server relationship between the frontend(s)(running Trellis) and the backend(s) (running some kind of persistence service, e.g. a database of some kind, like Cassandra). If, as you quite rightly write, we cannot speak of a single timestamp there, there is no way to specify the behavior to the original (HTTP) client. If you look at ResourceService you will see that Trellis (very intentionally) does not assume synchronicity, and in large part that is to avoid putting constraints on the implementation. |
acoburn
added
the
question
label
Nov 14, 2018
This comment has been minimized.
This comment has been minimized.
@atz I'm not sure where we are at with this question, and I don't want to leave you hanging. As we've described above, Trellis very intentionally hands questions of time directly to persistence and webapp instances are always share-nothing, so there is no synchronization at the HTTP layer (and never will be). In that sense:
(as you wrote) is done. That's how we roll now. If there are ambiguities or gaps you see here, can you write a little more about them? These are difficult questions and I'm sure we haven't got all the corners and edges covered. |
This comment has been minimized.
This comment has been minimized.
martinklein0815
commented
Nov 19, 2018
My apologies for the late response! |
This comment has been minimized.
This comment has been minimized.
atz
commented
Nov 19, 2018
•
That's enough to answer my question, thanks! |
atz commentedNov 9, 2018
To what extent can Trellis be parallelized? I.E., multiple systems running against the same persistence. Using
trellis-ext-db
seems to get part of the way there.My outstanding question regards
/opt/trellis/data/mementos
. In the dockerized Trellis docs (with external DB), it seems to be the only persistence that needs both:My understanding of Memento is limited. The RFC defines a bunch of interactions between client and server, but my question is "below" the server level, since the client should be unaware of whether the app is parallelized or not. My reading is that the RFC certainly would be violated if two systems write to the same resource/memento and because their clocks differ, the second write is stamped as appearing before the first. There doesn't seem to be any flexibility on "sameness" of server system(s) perception of time. Is that correct?
If so, then using a common DB and delegating the perception of
NOW
to the DB could help resolve the question. I read in another issue that theMementoService
can be configured to reuse the same external relational DB as the app. Two questions:MementoService
delegate time-perceiving logic to the DB?I could be missing something more architectural (like designating a single instance as TimeGate), or 100 other things, so any feedback is appreciated.