384 Commits

Author SHA1 Message Date
4d56292e7a libpod: mount safely subpaths
add a function to securely mount a subpath inside a volume.  We cannot
trust that the subpath is safe since it is beneath a volume that could
be controlled by a separate container.  To avoid TOCTOU races between
when we check the subpath and when the OCI runtime mounts it, we open
the subpath, validate it, bind mount to a temporary directory and use
it instead of the original path.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-03-31 19:48:03 +02:00
2718f54a29 Merge pull request #17729 from rhatdan/selinux
Support running nested SELinux container separation
2023-03-15 12:07:03 -04:00
2d1f4a8bff cgroupns: private cgroupns on cgroupv1 breaks --systemd
On cgroup v1 we need to mount only the systemd named hierarchy as
writeable, so we configure the OCI runtime to mount /sys/fs/cgroup as
read-only and on top of that bind mount /sys/fs/cgroup/systemd.

But when we use a private cgroupns, we cannot do that since we don't
know the final cgroup path.

Also, do not override the mount if there is already one for
/sys/fs/cgroup/systemd.

Closes: https://github.com/containers/podman/issues/17727

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-03-14 12:34:52 +01:00
01fd5bcc30 libpod: remove error stutter
the error is already clear.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-03-14 12:34:52 +01:00
ad8a96ab95 Support running nested SELinux container separation
Currently Podman prevents SELinux container separation,
when running within a container. This PR adds a new
--security-opt label=nested

When setting this option, Podman unmasks and mountsi
/sys/fs/selinux into the containers making /sys/fs/selinux
fully exposed. Secondly Podman sets the attribute
run.oci.mount_context_type=rootcontext

This attribute tells crun to mount volumes with rootcontext=MOUNTLABEL
as opposed to context=MOUNTLABEL.

With these two settings Podman inside the container is allowed to set
its own SELinux labels on tmpfs file systems mounted into its parents
container, while still being confined by SELinux. Thus you can have
nested SELinux labeling inside of a container.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-03-13 14:21:12 -04:00
0999991b20 add support for limiting tmpfs size for systemd-specific mnts
* add tests
* add documentation for --shm-size-systemd
* add support for both pod and standalone run

Signed-off-by: danishprakash <danish.prakash@suse.com>
2023-02-14 14:56:09 +05:30
0bc3d35791 libpod: move NetNS into state db instead of extra bucket
This should simplify the db logic. We no longer need a extra db bucket
for the netns, it is still supported in read only mode for backwards
compat. The old version required us to always open the netns before we
could attach it to the container state struct which caused problem in
some cases were the netns was no longer valid.

Now we use the netns as string throughout the code, this allow us to
only open it when needed reducing possible errors.

[NO NEW TESTS NEEDED] Existing tests should cover it and it is only a
flake so hard to reproduce the error.

Fixes #16140

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2022-12-16 18:30:12 +01:00
51c376c8a1 libpod: Factor out the call to PidFdOpen from (*Container).WaitForExit
This allows us to add a simple stub for FreeBSD which returns -1,
leading WaitForExit to fall back to the sleep loop approach.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-10-14 13:24:32 +01:00
36cfd05a7d libpod: Move platform-specific bind mounts to a per-platform method
This adds a new per-platform method makePlatformBindMounts and moves the
/etc/hostname mount. This file is only needed on Linux.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-12 16:11:25 +01:00
2c63b8439b Fix stutters
Podman adds an Error: to every error message.  So starting an error
message with "error" ends up being reported to the user as

Error: error ...

This patch removes the stutter.

Also ioutil.ReadFile errors report the Path, so wrapping the err message
with the path causes a stutter.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2022-09-10 07:52:00 -04:00
f75c3181bf podman: skip /sys/fs/cgroup/systemd if not present
skip adding the /sys/fs/cgroup/systemd bind mount if it is not already
present on the host.

[NO NEW TESTS NEEDED] requires a system without systemd.

Closes: https://github.com/containers/podman/issues/15647

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-09-07 15:33:08 +02:00
a3aecf0f26 libpod: Factor out setting volume atime to container_internal_linux.go
It turns out that field names in syscall.Stat_t are platform-specific.
An alternative to this could change fixVolumePermissions to use
unix.Lstat since unix.Stat_t uses the same mmember name for Atim on both
Linux and FreeBSD.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:20:50 +01:00
7a1abd03c5 libpod: Move miscellaneous file handlling to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:20:50 +01:00
212b11c34c libpod: Factor out handling of slirp4netns and net=none
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:20:50 +01:00
eab4291d99 libpod: Move functions related to /etc bind mounts to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:20:50 +01:00
b3989be768 libpod: Move getRootNetNsDepCtr to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:50 +01:00
7518a9136a libpod: Move functions related to checkpoints to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:49 +01:00
be5d1261b4 libpod: Move mountNotifySocket to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:49 +01:00
71e2074e83 libpod: Move getUserOverrides, lookupHostUser to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:49 +01:00
232eea5a00 libpod: Move isWorkDirSymlink, resolveWorkDir to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:49 +01:00
0889215d83 libpod: Use platform-specific mount type for volume mounts
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:49 +01:00
c1a86a8c4c libpod: Factor out platform-specific sections from generateSpec
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:49 +01:00
e101f4350b libpod: Move getOverlayUpperAndWorkDir and generateSpec to container_internal_common.go
[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-09-05 10:17:49 +01:00
d82a41687e Add container GID to additional groups
Mitigates a potential permissions issue. Mirrors Buildah PR #4200
and CRI-O PR #6159.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2022-09-02 15:51:36 -04:00
1572420c3f libpod: Move uses of unix.O_PATH to container_internal_linux.go
The O_PATH flag is a recent addition to the open syscall and is not
present in darwin or in FreeBSD releases before 13.1. The constant is
not present in the FreeBSD version of x/sys/unix since that package
supports FreeBSD 12.3 and later.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-08-17 12:55:41 +01:00
5d7778411a libpod: Move rootless network setup details to container_internal_linux.go
This removes a use of state.NetNS which is a linux-specific field defined
in container_linux.go from the generic container_internal.go, allowing
that to build on non-linux platforms.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-08-17 12:55:32 +01:00
92bbae40de Merge pull request #15248 from vrothberg/RUN-1606
kube play: sd-notify integration
2022-08-11 15:44:55 +00:00
3fc126e152 libpod: allow the notify socket to be passed programatically
The notify socket can now either be specified via an environment
variable or programatically (where the env is ignored).  The
notify mode and the socket are now also displayed in `container inspect`
which comes in handy for debugging and allows for propper testing.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2022-08-10 21:10:17 +02:00
658960c97b build(deps) bump CDI dependency from 0.4.0 to 0.5.0
bump github.com/container-orchestrated-devices/container-device-interface from 0.4.0 to 0.5.0

This requires that the cdi.Registry be instantiated with AutoRefresh disabled for CLI clients.

[NO NEW TESTS NEEDED]

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-10 10:49:42 +02:00
dd2b794061 libpod: create /etc/passwd if missing
create the /etc/passwd and /etc/group files if they are missing in the
image.

Closes: https://github.com/containers/podman/issues/14966

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-07-21 17:58:16 +02:00
377057b400 [CI:DOCS] Improve language. Fix spelling and typos.
* Correct spelling and typos.

* Improve language.

Co-authored-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
2022-07-11 21:59:32 +02:00
251d91699d libpod: switch to golang native error wrapping
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.

[NO NEW TESTS NEEDED]

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-07-05 16:06:32 +02:00
ed2afb2059 Merge pull request #14732 from dfr/criu
Add missing criu symbols to criu_unsupported.go
2022-06-27 17:47:06 +00:00
4c5788bac6 Fix spelling of GetCriuVersion
Signed-off-by: Doug Rabson <dfr@rabson.org>
2022-06-27 12:57:44 +01:00
2792e598c7 podman cgroup enhancement
currently, setting any sort of resource limit in a pod does nothing. With the newly refactored creation process in c/common, podman ca now set resources at a pod level
meaning that resource related flags can now be exposed to podman pod create.

cgroupfs and systemd are both supported with varying completion. cgroupfs is a much simpler process and one that is virtually complete for all resource types, the flags now just need to be added. systemd on the other hand
has to be handeled via the dbus api meaning that the limits need to be passed as recognized properties to systemd. The properties added so far are the ones that podman pod create supports as well as `cpuset-mems` as this will
be the next flag I work on.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2022-06-24 15:39:15 -04:00
aa4279ae15 Fix spelling "setup" -> "set up" and similar
* Replace "setup", "lookup", "cleanup", "backup" with
  "set up", "look up", "clean up", "back up"
  when used as verbs. Replace also variations of those.

* Improve language in a few places.

Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
2022-06-22 18:39:21 +02:00
2827140907 [CI:DOCS] "setup" -> "set up" in source code comments
* Replace "setup", "lookup" with "set up", "look up"
  when used as verbs.

Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
2022-06-19 12:18:08 +02:00
e084f0ee1e Merge pull request #14585 from Luap99/nolint
golangci-lint: enable nolintlint
2022-06-14 18:58:53 +00:00
41528739ce golangci-lint: enable nolintlint
The nolintlint linter does not deny the use of `//nolint`
Instead it allows us to enforce a common nolint style:
- force that a linter name must be specified
- do not add a space between `//` and `nolint`
- make sure nolint is only used when there is actually a problem

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2022-06-14 16:29:42 +02:00
fcfcd4cdb1 container: do not create .containerenv with -v SRC:/run
if /run is on a volume do not create the file /run/.containerenv as it
would leak outside of the container.

Closes: https://github.com/containers/podman/issues/14577

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-06-14 10:49:19 +02:00
b4c981893d Merge pull request #14220 from Luap99/resolvconf
use resolvconf package from c/common/libnetwork
2022-06-07 18:00:34 -04:00
fef40e2ad3 Merge pull request #14483 from jakecorrenti/restart-privelaged-containers-after-host-device-change
Privileged containers can now restart if the host devices change
2022-06-07 15:48:36 -04:00
90d80cf81e use resolvconf package from c/common/libnetwork
Podman and Buildah should use the same code the generate the resolv.conf
file. This mostly moved the podman code into c/common and created a
better API for it so buildah can use it as well.

[NO NEW TESTS NEEDED] All existing tests should continue to pass.

Fixes #13599 (There is no way to test this in CI without breaking the
hosts resolv.conf)

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2022-06-07 15:17:04 +02:00
8533ea0004 Privileged containers can now restart if the host devices change
If a privileged container is running, stops, and the devices on the host
change, such as a USB device is unplugged, then a container would no
longer start. Previously, the devices from the host were only being
added to the container once: when the container was created. Now, this
happens every time the container starts.

I did this by adding a boolean to the container config that indicates
whether to mount all of the devices or not, which can be set via an option.

During spec generation, if the `MountAllDevices` option is set in the
container config, all host devices are added to the container.

Additionally, a couple of functions from `pkg/specgen/generate/config_linux.go`
were moved into `pkg/util/utils_linux.go` as they were needed in
multiple packages.

Closes #13899

Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
2022-06-06 14:14:22 -04:00
aadae49ad3 overlay-volumes: add support for non-volatile upperdir,workdir for anonymous volumes
Similar feature was added for named overlay volumes here: https://github.com/containers/podman/pull/12712
Following PR just mimics similar feature for anonymous volumes.

Often users want their anonymous overlayed volumes to be `non-volatile` in nature
that means that same `upper` dir can be re-used by one or more
containers but overall of nature of volumes still have to be overlay
so work done is still on a overlay not on the actual volume.

Following PR adds support for more advanced options i.e custom `workdir`
and `upperdir` for overlayed volumes. So that users can re-use `workdir`
and `upperdir` across new containers as well.

Usage

```console
podman run -it -v /some/path:/data:O,upperdir=/path/persistant/upper,workdir=/path/persistant/work alpine sh
```

Signed-off-by: Aditya R <arajan@redhat.com>
2022-06-06 18:58:42 +05:30
9fcfea7643 First batch of resolutions to FIXMEs
Most of these are no longer relevant, just drop the comments.

Most notable change: allow `podman kill` on paused containers.
Works just fine when I test it.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2022-05-25 13:28:04 -04:00
1dcd1c970d Merge pull request #14308 from n1hility/root-cgroup
Support running podman under a root v2 cgroup
2022-05-25 08:53:15 -04:00
5d37d80ff9 Use containers/common/pkg/util.StringToSlice
[NO NEW TESTS NEEDED] Just code cleanup for better reuse

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2022-05-23 12:16:54 -04:00
94e82121bf Support running podman under a root v2 cgroup
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
2022-05-21 09:28:52 -05:00
b22143267b linter: enable unconvert linter
Detects unneccessary type conversions and helps in keeping the code base
cleaner.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2022-05-19 13:59:15 +02:00