Semantics for runtime
disks
mount points are confusing
#672
Labels
K-clarification
(Kind) Clarifications regarding the WDL specification.
S05-in-progress
(State) A task that is in progress.
T-requirements-hints
(Topic) Issues related to requirements or hints.
The
disks
key in theruntime
section is documented as providing "persistent volumes" at certain mount point paths. Those paths are specified as being "in the host environment" in WDL 1.2, and I think "on the host machine" in 1.1.wdl/SPEC.md
Lines 5072 to 5088 in 664adc3
If the mount point is a host side path, then where is the storage expected to be mounted in the container where the WDL
command
section gets run? If the storage is meant to be mounted at the given path in the container, then why does that path need to also exist on the host?Is this meant to just let the WDL task get access to a particular directory on the host, by mounting that path on the host into the container at the same path?
What kind of "persistence" specifically is supposed to be available? If two tasks run one after the other, and they both have a
disks
entry with a given mount point and size, should the second task be guaranteed to see files there written by the first task? Or can the "persistent" volume be a fresh empty directory for each task? Or is some kind of opportunistic sharing expected?If a task requests a 100 GiB persistent volume, does it have to deal with the possibility that, upon being mounted, it already had 50 GiB of files in it left over from previous tasks and only has 50 GiB free space?
If two tasks run at the same time, can they ever share a persistent volume? Or does a task get exclusive ownership of a persistent volume while it is mounted?
We're trying to implement this in Toil in DataBiosphere/toil#5001 and so far we've come up with an implementation that just mounts the specified path from the host into the container. But I think it really makes more sense to mount fresh empty directories with the given amount of reserved space into the container instead, since that matches what I would imagine a workflow would actually want. But that completely ignores the "persistent" part of the spec.
Are there any workflow examples that use mount points, beyond the test examples in the spec that just measure their size? What kind of behavior do they expect w.r.t. persistence or the relationship between in-container and host-side paths?
The text was updated successfully, but these errors were encountered: