This resolves an ordering issue that prevented quotas from being
applied. XFS quotas are applied recursively, but only for
subdirectories created after the quota is applied; if we create
`_data` before the quota, and then use `_data` for all data in
the volume, the quota will never be used by the volume.
Also, add a test that volume quotas are working as designed using
an XFS formatted loop device in the system tests. This should
prevent any further regressions on basic quota functionality,
such as quotas being shared between volumes.
Fixes#25368
Signed-off-by: Matt Heon <mheon@redhat.com>
This will appease the higher-level quota logic. Basically, to
find a free quota ID to prevent reuse, we will iterate through
the contents of the directory and check the quota IDs of all
subdirectories, then use the first free ID found that is larger
than the base ID (the one set on the base directory). Problem:
our volumes use a two-tier directory structure, where the volume
has an outer directory (with the name of the actual volume) and
an inner directory (always named _data). We were only setting the
quota on _data, meaning the outer directory did not have an ID,
and the ID-choosing logic thus never detected that any IDs had
been allocated and always chose the same ID.
Setting the ID on the outer directory with PROJINHERIT set makes
the ID allocation logic work properly, and guarantees children
inherit the ID - so _data and all contents of the volume get the
ID as we'd expect.
No tests as we don't have a filesystem in our CI that supports
XFS quotas (setting it on / needs kernel flags added).
Fixes https://issues.redhat.com/browse/RHEL-18038
Signed-off-by: Matt Heon <mheon@redhat.com>
We cannot get first the volume lock and the container locks. Other code
paths always have to first lock the container and the lock the volumes,
i.e. to mount/umount them. As such locking the volume fust can always
result in ABBA deadlocks.
To fix this move the lock down after the container removal. The removal
code is racy regardless of the lock as the volume lcok on create is no
longer taken since commit 3cc9db8626 due another deadlock there.
Fixes#23613
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Moving from Go module v4 to v5 prepares us for public releases.
Move done using gomove [1] as with the v3 and v4 moves.
[1] https://github.com/KSubedi/gomove
Signed-off-by: Matt Heon <mheon@redhat.com>
We had something like 6 different boolean options (removing a
container turns out to be rather complicated, because there are a
million-odd things that want to do it), and the function
signature was getting unreasonably large. Change to a struct to
clean things up.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This allows for accurate reporting of dependency removal, but the
work is still incomplete: pods can be removed, but do not report
the containers they removed as part of said removal. Will add
this in a subsequent commit.
Major note: I made ignoring no-such-container errors automatic
once it has been determined that a container did exist in the
first place. I can't think of any case where this would not be a
TOCTOU - IE, no reason not to ignore them. The `--ignore` option
to `podman rm` should still retain meaning as it will ignore
errors from containers that didn't exist in the first place.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This is the initial stage of implementation. The current API
functions but does not report the additional containers and pods
removed. This is necessary to properly display results to the
user after `podman rm --all`.
The existing remove-dependencies code has been removed in favor
of this more native solution.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Two main changes:
- The transient state tests relied on BoltDB paths, change to
make them agnostic
- The volume code in SQLite wasn't retrieving and setting the
volume plugin for volumes that used one.
Signed-off-by: Matt Heon <mheon@redhat.com>
Currently Podman prevents SELinux container separation,
when running within a container. This PR adds a new
--security-opt label=nested
When setting this option, Podman unmasks and mountsi
/sys/fs/selinux into the containers making /sys/fs/selinux
fully exposed. Secondly Podman sets the attribute
run.oci.mount_context_type=rootcontext
This attribute tells crun to mount volumes with rootcontext=MOUNTLABEL
as opposed to context=MOUNTLABEL.
With these two settings Podman inside the container is allowed to set
its own SELinux labels on tmpfs file systems mounted into its parents
container, while still being confined by SELinux. Thus you can have
nested SELinux labeling inside of a container.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This ignores the create request if the named volume already exists.
It is very useful when scripting stuff.
Signed-off-by: Alexander Larsson <alexl@redhat.com>