75 Commits

Author SHA1 Message Date
ce8813dc8d Remove persist directory when cleaning up Conmon files
This seems to have been added as part of the cleanup of our
handling of OOM files, but code was never added to remove it, so
we leaked a single directory with an exit file and OOM file per
container run. Apparently have been doing this for a while - I'd
guess since March of '23 - so I'm surprised more people didn't
notice.

Fixes #25291

Signed-off-by: Matt Heon <mheon@redhat.com>
2025-02-11 14:51:34 -05:00
e46ae46f18 libpod: hasCurrentUserMapped checks for gid too
the kernel checks that both the uid and the gid are mapped inside the
user namespace, not only the uid:

/**
 * privileged_wrt_inode_uidgid - Do capabilities in the namespace work over the inode?
 * @ns: The user namespace in question
 * @idmap: idmap of the mount @inode was found from
 * @inode: The inode in question
 *
 * Return true if the inode uid and gid are within the namespace.
 */
bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
				 struct mnt_idmap *idmap,
				 const struct inode *inode)
{
	return vfsuid_has_mapping(ns, i_uid_into_vfsuid(idmap, inode)) &&
	       vfsgid_has_mapping(ns, i_gid_into_vfsgid(idmap, inode));
}

for this reason, improve the check for hasCurrentUserMapped to verify
that the gid is also mapped, and if it is not, use an intermediate
mount for the container rootfs.

Closes: https://github.com/containers/podman/issues/24159

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-10-04 16:17:04 +02:00
458ba5a8af Fix podman stop and podman run --rmi
This started off as an attempt to make `podman stop` on a
container started with `--rm` actually remove the container,
instead of just cleaning it up and waiting for the cleanup
process to finish the removal.

In the process, I realized that `podman run --rmi` was rather
broken. It was only done as part of the Podman CLI, not the
cleanup process (meaning it only worked with attached containers)
and the way it was wired meant that I was fairly confident that
it wouldn't work if I did a `podman stop` on an attached
container run with `--rmi`. I rewired it to use the same
mechanism that `podman run --rm` uses, so it should be a lot more
durable now, and I also wired it into `podman inspect` so you can
tell that a container will remove its image.

Tests have been added for the changes to `podman run --rmi`. No
tests for `stop` on a `run --rm` container as that would be racy.

Fixes #22852
Fixes RHEL-39513

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-08-20 09:51:18 -04:00
ddece758a4 libpod: remove UpdateContainerStatus()
There are two major problems with UpdateContainerStatus()
First, it can deadlock when the the state json is to big as it tries to
read stderr until EOF but it will never hit EOF as long as the runtime
process is alive. This means if the runtime json is to big to git into
the pipe buffer we deadlock ourselves.
Second, the function modifies the container state struct and even adds
and exit code to the db however when it is called from the stop() code
path we will be unlocked here.

While the first problem is easy to fix the second one not so much. And
when we cannot update the state there is no point in reading the from
runtime in the first place as such remove the function as it does more
harm then good.

And add some warnings the the functions that might be called unlocked.

Fixes #22246

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-16 15:34:16 +02:00
803ef5c16f Merge pull request #23384 from edsantiago/root-namespace
CI: enable root user namespaces
2024-08-01 10:32:16 +00:00
77081df8cd libpod: bind ports before network setup
We bind ports to ensure there are no conflicts and we leak them into
conmon to keep them open. However we bound the ports after the network
was set up so it was possible for a second network setup to overwrite
the firewall configs of a previous container as it failed only later
when binding the port. As such we must ensure we bind before the network
is set up.

This is not so simple because we still have to take care of
PostConfigureNetNS bool in which case the network set up happens after
we launch conmon. Thus we end up with two different conditions.

Also it is possible that we "leak" the ports that are set on the
container until the garbage collector will close them. This is not
perfect but the alternative is adding special error handling on each
function exit after prepare until we start conmon which is a lot of work
to do correctly.

Fixes https://issues.redhat.com/browse/RHEL-50746

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-30 14:39:08 +02:00
b59918e536 libpod: force rootfs for OCI path with idmap
when a --rootfs is specified with idmap, always use the specified
rootfs since we need a new mount on top of the original directory.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-07-27 19:25:10 +02:00
49eb5af301 libpod: intermediate mount if UID not mapped into the userns
if the current user is not mapped into the new user namespace, use an
intermediate mount to allow the mount point to be accessible instead
of opening up all the parent directories for the mountpoint.

Closes: https://github.com/containers/podman/issues/23028

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
08a8429459 libpod: avoid chowning the rundir to root in the userns
so it is possible to remove the code to make the entire directory
world accessible.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
c81f075f43 libpod: do not chmod bind mounts
with the new mount API is available, the OCI runtime doesn't require
that each parent directory for a bind mount must be accessible.
Instead it is opened in the initial user namespace and passed down to
the container init process.

This requires that the kernel supports the new mount API and that the
OCI runtime uses it.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
120660e239 fix: close resource file
Signed-off-by: guoguangwu <guoguangwug@gmail.com>
2024-04-18 09:29:05 +08:00
83671f95d8 Properly parse stderr when updating container status
I believe the previous code meant to use cmd.Run instead of cmd.Start.
The issue is that cmd.Start returns before the command has finished
executing, so the conditional body checking for the stderr of the
command never gets executed.

Raise the cmd.Start up into it's own conditional, which is checking for
whether the process could be started. Then we consume stderr, check for
some specific strings in the output, and then finally continue on with
the rest of the code.

Signed-off-by: Keith Johnson <kj@ubergeek42.com>
2024-03-25 10:15:23 -04:00
950f612b56 logging: new mode -l passthrough-tty
it works in a similar way to passthrough but it allows to be used also
on a TTY.

conmon support: https://github.com/containers/conmon/pull/465

Closes: https://github.com/containers/podman/issues/20767

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-02-28 17:23:59 +01:00
667311c7d5 Use persist dir for oom file
Conmon writes the exit file and oom file (if container
was oom killed) to the persist directory. This directory
is retained across reboots as well.
Update podman to create a persist-dir/ctr-id for the exit
and oom files for each container to be written to. The oom
state of container is set after reading the files
from the persist-dir/ctr-id directory.
The exit code still continues to read the exit file from
the exits directory.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2024-02-12 09:13:39 -05:00
72f1617fac Bump Go module to v5
Moving from Go module v4 to v5 prepares us for public releases.

Move done using gomove [1] as with the v3 and v4 moves.

[1] https://github.com/KSubedi/gomove

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-02-08 09:35:39 -05:00
2a2d0b0e18 chore: delete obsolete // +build lines
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2024-01-04 11:53:38 +02:00
03d411abc0 libpod: split out cgroups call into linux specific file
So that we do not cause compile error on freebsd.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-12-07 11:24:47 +01:00
a687c38860 use rootless netns from c/common
Use the new rootlessnetns logic from c/common, drop the podman code
here and make use of the new much simpler API.

ref: https://github.com/containers/common/pull/1761

[NO NEW TESTS NEEDED]

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-12-07 11:24:46 +01:00
01d397a658 podman: new option --preserve-fd
add a new option --preserve-fd that allows to specify a list of FDs to
pass down to the container.

It is similar to --preserve-fds but it allows to specify a list of FDs
instead of the maximum FD number to preserve.

--preserve-fd and --preserve-fds are mutually exclusive.

It requires crun since runc would complain if any fd below
--preserve-fds is not preserved.

Closes: https://github.com/containers/podman/issues/20844

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-12-05 10:16:41 +01:00
cd21973f47 pkg/util: use code from c/storage
[NO NEW TESTS NEEDED] no new functionalities are added

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-11-23 21:36:42 +01:00
e966c86d98 container.conf: support attributed string slices
All `[]string`s in containers.conf have now been migrated to attributed
string slices which require some adjustments in Buildah and Podman.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-10-27 12:44:33 +02:00
bad25da92e libpod: add !remote tag
This should never be pulled into the remote client.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-24 12:11:34 +02:00
19c870da0d Merge pull request #20425 from giuseppe/podman-do-not-leak-DBUS_SESSION_BUS_ADDRESS-into-conmon
libpod: skip DBUS_SESSION_BUS_ADDRESS in conmon
2023-10-21 18:36:02 +00:00
29273cda10 lint: fix warnings found by perfsprint
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-20 16:27:46 +02:00
03947ab031 libpod: skip DBUS_SESSION_BUS_ADDRESS in conmon
commit 7ade9721020468438e822b16ed7a65380cc7fbd2 introduced the change
that caused an issue in crun since it forces the root user session
instead of the system one when DBUS_SESSION_BUS_ADDRESS is set.

I am addressing it in crun, but for the time being, let's also not
pass the variable down to conmon since the assumption is that when
running as root the containers must be created on the system bus.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-10-20 16:06:51 +02:00
7ade972102 libpod: pass entire environment to conmon
Pass the _entire_ environment to conmon instead of selectively enabling
only specific variables.  The main reasoning is to make sure that conmon
and the podman-cleanup callback process operate in the exact same
environment than the initial podman process.  Some configuration files
may be passed via environment variables.  Podman not passing those down
to conmon has led to subtle and hard to debug issues in the past, so
passing all down will avoid such kinds of issues in the future.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-26 16:48:52 +02:00
6293ec2e2d fix handling of static/volume dir
The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in #19938.

There were multiple issues that I try to summarize below:

 - c/common loaded the graphroot from c/storage to set the defaults for
   static and volume dir.  That ignored Podman's --root flag and
   surfaced in #19938 and other bugs.  c/common does not set the
   defaults anymore which gives Podman the ability to detect when the
   user/admin configured a custom directory (not empty value).

 - When parsing the CLI, Podman (ab)uses containers.conf structures to
   set the defaults but also to override them in case the user specified
   a flag.  The --root flag overrode the static dir which is wrong and
   broke a couple of use cases.  Now there is a dedicated field for in
   the "PodmanConfig" which also includes a containers.conf struct.

 - The defaults for static and volume dir and now being set correctly
   and adhere to --root.

 - The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
   cleanup process.  I believe that _all_ env variables should be passed
   to conmon to avoid such subtle bugs.

Overall I find that the code and logic is scattered and hard to
understand and follow.  I refrained from larger refactorings as I really
just want to get #19938 fixed and then go back to other priorities.

https://github.com/containers/common/pull/1659 broke three pkg/machine
tests.  Those have been commented out until getting fixed.

Fixes: #19938
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-25 14:14:30 +02:00
639eb52c89 Merge pull request #20062 from vrothberg/syslog-fix
pass --syslog to the cleanup process
2023-09-20 11:57:33 -04:00
4652a2623f pass --syslog to the cleanup process
The --syslog flag has not been passed to the cleanup process (i.e.,
conmon's exit args) complicating debugging quite a bit.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-20 15:37:07 +02:00
73dc72f80d vendor of containers/common
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-09-20 08:39:49 -04:00
70cf9740f1 StopContainer: display signal num when name unknown
Under some circumstances podman tries to kill a container
using signal 37, for which unix.SignalName() returns "".
Not helpful. So, when that happens, show "(signal number)".

Signed-off-by: Ed Santiago <santiago@redhat.com>
2023-09-07 14:13:14 -06:00
8bda49608f Merge pull request #19696 from Luap99/api-stream-format
api docs: document stream format
2023-08-28 19:43:24 +02:00
4b347609d6 oci: print stderr only after checking state
when the "kill" command fails, print the stderr from the OCI runtime
only after we check the container state.

It also simplifies the code since we don't have to hard code the error
messages we want to ignore.

Closes: https://github.com/containers/podman/issues/18452

[NO NEW TESTS NEEDED] it fixes a flake.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-08-28 09:22:48 +02:00
7c9c969815 API attach: return vnd.docker.multiplexed-stream header
The attach API used to always return the Content-Type
`vnd.docker.raw-stream`, however docker api v1.42 added the
`vnd.docker.multiplexed-stream` type when no tty was used.

Follow suit and return the same header for docker api v1.42 and libpod
v4.7.0. This technically allows clients to make a small optimization as
they no longer need to inspect the container to see if they get a raw or
multiplexed stream.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-08-24 16:22:28 +02:00
c726cf8106 libpod: improve conmon error handling
When conmon is started it blocks and waits for us to signal it to start
via pipe. This works but when conmon exits before it waits for the start
message it causes podman to fail with `write child: broken pipe`. This
error is meaningless to podman users.

The real error is that conmon failed so we should not return early if we
fail to send the start message to conmon. Instead ignore the EPIPE error
case as it is safe to assume to the conmon died and for other errors we
make sure to kill conmon so that the following wait() call does not hang
forever. This also fixes problems with having conmon zombie processes
leaked as wait() was never called.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-08-17 15:32:59 +02:00
8f85aaf07f fixup "podman logs with non ASCII log tag" tests
We need to actually check the output not just exit codes. While doing
this it was clear that the first test was not checking what it should
be so I had to remove the quotes from the arg.

Also this check did not work with remote testing at all, we must set the
env then restart the server as the env for conmon must be set on the
server obviously.
Also we can only match the conmon error messages on the local client.

Lastly this test requires the journald driver but we cannot use the in
container tests so skip it there.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-08-17 15:30:59 +02:00
ff66f31ddd libpod: correctly pass env so alternative locales work
in addition to b6167cedb2
we also need to pass LANG. Do so, and add a test to verify

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2023-08-17 12:15:08 +02:00
406c480535 Merge pull request #19533 from hangscer8/fix_waitPidStop_timer
Stop timer in function waitPidStop
2023-08-08 06:59:20 -04:00
282594e58f Stop timer in function waitPidStop
Because it will cause memory leak if we do not stop timer when the function has completed.

[NO NEW TESTS NEEDED]

Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
2023-08-08 16:31:27 +08:00
1e54539432 Add support for passing container stop timeout as -1 (infinite)
Compat api for containers/stop should take -1 value

Add support for `podman stop --time -1`
Add support for `podman restart --time -1`
Add support for `podman rm --time -1`
Add support for `podman pod stop --time -1`
Add support for `podman pod rm --time -1`
Add support for `podman volume rm --time -1`
Add support for `podman network rm --time -1`

Fixes: https://github.com/containers/podman/issues/17542

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-08-04 08:36:45 -04:00
49e0bde2bf Merge pull request #18946 from Luap99/slirp4netns
use slirp4netns code from c/common
2023-06-22 16:15:18 +02:00
a699ed0ebf StopContainer(): ignore one more conmon warning
Resolves: #18865

[NO NEW TESTS NEEDED] -- it's a flake

Signed-off-by: Ed Santiago <santiago@redhat.com>
2023-06-22 05:02:19 -06:00
614c962c23 use libnetwork/slirp4netns from c/common
Most of the code moved there so if from there and remove it here.

Some extra changes are required here. This is a bit of a mess. The pipe
handling makes this a bit more difficult.

[NO NEW TESTS NEEDED] This is just a rework, existing tests must pass.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-22 11:16:13 +02:00
f07aa1bfdc make lint: enable wastedassign
Because we shouldn't waste assigns.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-06-19 14:14:48 +02:00
60a5a59475 make lint: enable mirror
Helpful reports to avoid unnecessary allocations.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-06-19 14:11:12 +02:00
13c2aca219 libpod: make conmon always log to syslog
Conmon very early dups the std streams with /dev/null, therefore all
errors it reports go nowhere. When you run podman with debug level we
set --syslog and we can see the error in the journal. This should be
the default. We have a lot of weird failures in CI that could be caused
by conmon and we have access to the journal in the cirrus tasks so that
should make debugging much easier.

Conmon still uses the same logging level as podman so it will not spam
the journal and only log warning and errors by default.

[NO NEW TESTS NEEDED]

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-14 13:54:57 +02:00
6f821634ad libpod: Podman info output more network information
podman info prints the network information about binary path,
package version, program version and DNS information.

Fixes: #18443

Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
2023-06-13 11:19:29 +09:00
2ebc9004f4 Ignore spurious warnings when killing containers
There are certain messages logged by OCI runtimes when killing a
container that has already stopped that we really do not care
about when stopping a container. Due to our architecture, there
are inherent races around stopping containers, and so we cannot
guarantee that *we* are the people to kill it - but that doesn't
matter because Podman only cares that the container has stopped,
not who delivered the fatal signal.

Unfortunately, the OCI runtimes don't understand this, and log
various warning messages when the `kill` command is invoked on a
container that was already dead. These cause our tests to fail,
as we now check for clean STDERR when running Podman. To work
around this, capture STDERR for the OCI runtime in a buffer only
for stopping containers, and go through and discard any of the
warnings we identified as spurious.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-08 09:19:47 -04:00
e9356ba206 Don't use bytes.NewBuffer to read data
The documentation says
> The new Buffer takes ownership of buf, and the
> caller should not use buf after this call.

so use the more directly applicable, and simpler, bytes.Reader instead, to avoid this potentially risky use.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-04-14 22:40:47 +02:00
4faa139b78 waitPidStop: reduce sleep time to 10ms
Kill is a fast syscall, so we can reduce the sleep time from 100ms to
10ms in hope to speed things up a bit.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-01-19 12:31:37 +01:00