This seems to have been added as part of the cleanup of our
handling of OOM files, but code was never added to remove it, so
we leaked a single directory with an exit file and OOM file per
container run. Apparently have been doing this for a while - I'd
guess since March of '23 - so I'm surprised more people didn't
notice.
Fixes#25291
Signed-off-by: Matt Heon <mheon@redhat.com>
the kernel checks that both the uid and the gid are mapped inside the
user namespace, not only the uid:
/**
* privileged_wrt_inode_uidgid - Do capabilities in the namespace work over the inode?
* @ns: The user namespace in question
* @idmap: idmap of the mount @inode was found from
* @inode: The inode in question
*
* Return true if the inode uid and gid are within the namespace.
*/
bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
struct mnt_idmap *idmap,
const struct inode *inode)
{
return vfsuid_has_mapping(ns, i_uid_into_vfsuid(idmap, inode)) &&
vfsgid_has_mapping(ns, i_gid_into_vfsgid(idmap, inode));
}
for this reason, improve the check for hasCurrentUserMapped to verify
that the gid is also mapped, and if it is not, use an intermediate
mount for the container rootfs.
Closes: https://github.com/containers/podman/issues/24159
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This started off as an attempt to make `podman stop` on a
container started with `--rm` actually remove the container,
instead of just cleaning it up and waiting for the cleanup
process to finish the removal.
In the process, I realized that `podman run --rmi` was rather
broken. It was only done as part of the Podman CLI, not the
cleanup process (meaning it only worked with attached containers)
and the way it was wired meant that I was fairly confident that
it wouldn't work if I did a `podman stop` on an attached
container run with `--rmi`. I rewired it to use the same
mechanism that `podman run --rm` uses, so it should be a lot more
durable now, and I also wired it into `podman inspect` so you can
tell that a container will remove its image.
Tests have been added for the changes to `podman run --rmi`. No
tests for `stop` on a `run --rm` container as that would be racy.
Fixes#22852
Fixes RHEL-39513
Signed-off-by: Matt Heon <mheon@redhat.com>
There are two major problems with UpdateContainerStatus()
First, it can deadlock when the the state json is to big as it tries to
read stderr until EOF but it will never hit EOF as long as the runtime
process is alive. This means if the runtime json is to big to git into
the pipe buffer we deadlock ourselves.
Second, the function modifies the container state struct and even adds
and exit code to the db however when it is called from the stop() code
path we will be unlocked here.
While the first problem is easy to fix the second one not so much. And
when we cannot update the state there is no point in reading the from
runtime in the first place as such remove the function as it does more
harm then good.
And add some warnings the the functions that might be called unlocked.
Fixes#22246
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We bind ports to ensure there are no conflicts and we leak them into
conmon to keep them open. However we bound the ports after the network
was set up so it was possible for a second network setup to overwrite
the firewall configs of a previous container as it failed only later
when binding the port. As such we must ensure we bind before the network
is set up.
This is not so simple because we still have to take care of
PostConfigureNetNS bool in which case the network set up happens after
we launch conmon. Thus we end up with two different conditions.
Also it is possible that we "leak" the ports that are set on the
container until the garbage collector will close them. This is not
perfect but the alternative is adding special error handling on each
function exit after prepare until we start conmon which is a lot of work
to do correctly.
Fixes https://issues.redhat.com/browse/RHEL-50746
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
when a --rootfs is specified with idmap, always use the specified
rootfs since we need a new mount on top of the original directory.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the current user is not mapped into the new user namespace, use an
intermediate mount to allow the mount point to be accessible instead
of opening up all the parent directories for the mountpoint.
Closes: https://github.com/containers/podman/issues/23028
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
with the new mount API is available, the OCI runtime doesn't require
that each parent directory for a bind mount must be accessible.
Instead it is opened in the initial user namespace and passed down to
the container init process.
This requires that the kernel supports the new mount API and that the
OCI runtime uses it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
I believe the previous code meant to use cmd.Run instead of cmd.Start.
The issue is that cmd.Start returns before the command has finished
executing, so the conditional body checking for the stderr of the
command never gets executed.
Raise the cmd.Start up into it's own conditional, which is checking for
whether the process could be started. Then we consume stderr, check for
some specific strings in the output, and then finally continue on with
the rest of the code.
Signed-off-by: Keith Johnson <kj@ubergeek42.com>
Conmon writes the exit file and oom file (if container
was oom killed) to the persist directory. This directory
is retained across reboots as well.
Update podman to create a persist-dir/ctr-id for the exit
and oom files for each container to be written to. The oom
state of container is set after reading the files
from the persist-dir/ctr-id directory.
The exit code still continues to read the exit file from
the exits directory.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Moving from Go module v4 to v5 prepares us for public releases.
Move done using gomove [1] as with the v3 and v4 moves.
[1] https://github.com/KSubedi/gomove
Signed-off-by: Matt Heon <mheon@redhat.com>
Use the new rootlessnetns logic from c/common, drop the podman code
here and make use of the new much simpler API.
ref: https://github.com/containers/common/pull/1761
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
add a new option --preserve-fd that allows to specify a list of FDs to
pass down to the container.
It is similar to --preserve-fds but it allows to specify a list of FDs
instead of the maximum FD number to preserve.
--preserve-fd and --preserve-fds are mutually exclusive.
It requires crun since runc would complain if any fd below
--preserve-fds is not preserved.
Closes: https://github.com/containers/podman/issues/20844
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
All `[]string`s in containers.conf have now been migrated to attributed
string slices which require some adjustments in Buildah and Podman.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
commit 7ade9721020468438e822b16ed7a65380cc7fbd2 introduced the change
that caused an issue in crun since it forces the root user session
instead of the system one when DBUS_SESSION_BUS_ADDRESS is set.
I am addressing it in crun, but for the time being, let's also not
pass the variable down to conmon since the assumption is that when
running as root the containers must be created on the system bus.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Pass the _entire_ environment to conmon instead of selectively enabling
only specific variables. The main reasoning is to make sure that conmon
and the podman-cleanup callback process operate in the exact same
environment than the initial podman process. Some configuration files
may be passed via environment variables. Podman not passing those down
to conmon has led to subtle and hard to debug issues in the past, so
passing all down will avoid such kinds of issues in the future.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in #19938.
There were multiple issues that I try to summarize below:
- c/common loaded the graphroot from c/storage to set the defaults for
static and volume dir. That ignored Podman's --root flag and
surfaced in #19938 and other bugs. c/common does not set the
defaults anymore which gives Podman the ability to detect when the
user/admin configured a custom directory (not empty value).
- When parsing the CLI, Podman (ab)uses containers.conf structures to
set the defaults but also to override them in case the user specified
a flag. The --root flag overrode the static dir which is wrong and
broke a couple of use cases. Now there is a dedicated field for in
the "PodmanConfig" which also includes a containers.conf struct.
- The defaults for static and volume dir and now being set correctly
and adhere to --root.
- The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
cleanup process. I believe that _all_ env variables should be passed
to conmon to avoid such subtle bugs.
Overall I find that the code and logic is scattered and hard to
understand and follow. I refrained from larger refactorings as I really
just want to get #19938 fixed and then go back to other priorities.
https://github.com/containers/common/pull/1659 broke three pkg/machine
tests. Those have been commented out until getting fixed.
Fixes: #19938
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The --syslog flag has not been passed to the cleanup process (i.e.,
conmon's exit args) complicating debugging quite a bit.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Under some circumstances podman tries to kill a container
using signal 37, for which unix.SignalName() returns "".
Not helpful. So, when that happens, show "(signal number)".
Signed-off-by: Ed Santiago <santiago@redhat.com>
when the "kill" command fails, print the stderr from the OCI runtime
only after we check the container state.
It also simplifies the code since we don't have to hard code the error
messages we want to ignore.
Closes: https://github.com/containers/podman/issues/18452
[NO NEW TESTS NEEDED] it fixes a flake.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The attach API used to always return the Content-Type
`vnd.docker.raw-stream`, however docker api v1.42 added the
`vnd.docker.multiplexed-stream` type when no tty was used.
Follow suit and return the same header for docker api v1.42 and libpod
v4.7.0. This technically allows clients to make a small optimization as
they no longer need to inspect the container to see if they get a raw or
multiplexed stream.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When conmon is started it blocks and waits for us to signal it to start
via pipe. This works but when conmon exits before it waits for the start
message it causes podman to fail with `write child: broken pipe`. This
error is meaningless to podman users.
The real error is that conmon failed so we should not return early if we
fail to send the start message to conmon. Instead ignore the EPIPE error
case as it is safe to assume to the conmon died and for other errors we
make sure to kill conmon so that the following wait() call does not hang
forever. This also fixes problems with having conmon zombie processes
leaked as wait() was never called.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We need to actually check the output not just exit codes. While doing
this it was clear that the first test was not checking what it should
be so I had to remove the quotes from the arg.
Also this check did not work with remote testing at all, we must set the
env then restart the server as the env for conmon must be set on the
server obviously.
Also we can only match the conmon error messages on the local client.
Lastly this test requires the journald driver but we cannot use the in
container tests so skip it there.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Because it will cause memory leak if we do not stop timer when the function has completed.
[NO NEW TESTS NEEDED]
Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
Compat api for containers/stop should take -1 value
Add support for `podman stop --time -1`
Add support for `podman restart --time -1`
Add support for `podman rm --time -1`
Add support for `podman pod stop --time -1`
Add support for `podman pod rm --time -1`
Add support for `podman volume rm --time -1`
Add support for `podman network rm --time -1`
Fixes: https://github.com/containers/podman/issues/17542
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Most of the code moved there so if from there and remove it here.
Some extra changes are required here. This is a bit of a mess. The pipe
handling makes this a bit more difficult.
[NO NEW TESTS NEEDED] This is just a rework, existing tests must pass.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Conmon very early dups the std streams with /dev/null, therefore all
errors it reports go nowhere. When you run podman with debug level we
set --syslog and we can see the error in the journal. This should be
the default. We have a lot of weird failures in CI that could be caused
by conmon and we have access to the journal in the cirrus tasks so that
should make debugging much easier.
Conmon still uses the same logging level as podman so it will not spam
the journal and only log warning and errors by default.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
podman info prints the network information about binary path,
package version, program version and DNS information.
Fixes: #18443
Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
There are certain messages logged by OCI runtimes when killing a
container that has already stopped that we really do not care
about when stopping a container. Due to our architecture, there
are inherent races around stopping containers, and so we cannot
guarantee that *we* are the people to kill it - but that doesn't
matter because Podman only cares that the container has stopped,
not who delivered the fatal signal.
Unfortunately, the OCI runtimes don't understand this, and log
various warning messages when the `kill` command is invoked on a
container that was already dead. These cause our tests to fail,
as we now check for clean STDERR when running Podman. To work
around this, capture STDERR for the OCI runtime in a buffer only
for stopping containers, and go through and discard any of the
warnings we identified as spurious.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The documentation says
> The new Buffer takes ownership of buf, and the
> caller should not use buf after this call.
so use the more directly applicable, and simpler, bytes.Reader instead, to avoid this potentially risky use.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Kill is a fast syscall, so we can reduce the sleep time from 100ms to
10ms in hope to speed things up a bit.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>