Commit Graph

3963 Commits

Author SHA1 Message Date
Paul Holzinger
99a14332ef healthcheck: make sure to always show health_status events
This fixes a regression caused by commit 7e6e267329, unfortunately this
was not caught during review as for some reason this works fine rootless
and only fails as root.

Because we set the systemd log level to notice in order to hide the unit
started/stopped messages to prevent spamming the journal the issue is
that this now also causes systemd to ignore the events we write to
journald as we also send them as info level.

To fix this we simply send health_status events now on notice level. I
decided against sending all events on notice as I think info is fine for
them. Whenever the notice level is right is of course debatable but
given it may contain the unhealthy message I think having this a notice
should be ok.

The main reason this made it through testing is because we do not rely
on the systemd unit to fire healthchecks in the tests as this is flaky.
There is one test were we rely on it though and I added a check there
to make sure events are displayed correctly when trigger via systemd.

Fixes #20342

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-12 15:02:32 +02:00
Paul Holzinger
3cc9db8626 libpod: fix deadlock while parallel container create
When containers are created with a named volume it can deadlock because
the create logic tried to lock all volumes in a loop, this is fine if it
only ever creates a single container at any given time. However because
we multiple containers can be created at the same time they can cause a
deadlock between the volumes. This is because the order of the loop is
not stable, in fact it is based on the order of how the volumes were
specified on the cli.

So if you create two containers at the same time with
`-v vol1:/dir2 -v vol2:/dir2` and the other one with
`-v vol2:/dir2 -v vol1:/dir1` then there is chance for a deadlock.

Now one solution could be to order the volumes to prevent the issue but
the reason for holding the lock is dubious. The goal was to prevent the
volume from being removed in the meantime. However that could still
have happend before we acquired the lock so it didn't protect against
that.

Both boltdb and sqlite already prevent us from adding a container with
volumes that do not exists due their internal consistency checks.
Sqlite even uses FOREIGN KEY relationships so the schema will prevent us
from doing anything wrong.

The create code currently first checks if the volume exists and if not
creates it. I have checked that the db will guarantee that this will not
work:
Boltdb: `no volume with name test2 found in database when adding container xxx: no such volume`
Sqlite: `adding container volume test2 to database: FOREIGN KEY constraint failed`

Keep in mind that this error is normally not seen, only if the volume is
removed between the volume exists check and adding the container in the
db this messages will be seen wich is an acceptable race and a
pre-existing condition anyway.

[NO NEW TESTS NEEDED] Race condition, hard to test in CI.

Fixes #20313

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-11 11:40:35 +02:00
Paul Holzinger
29ae516006 use sqlite as default database
Use sqlite as default but for upgrades it will still use boltdb to avoid
breaking anyone. This is done by checking if the boltdb file already
exists and if it does then we have to use it.

I added a e2e test to check the new logic and removed the system test
for it, the problem with the system test is that we share the storage
dir there so all following commands without --db-backend would try to
use boltdb as a single --db-backend boltdb command will create the file
and then all folllwing commands will use it because of the backwards
compat. In e2e tests each test uses their own --root so it is not an
issue there.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-10 17:11:28 +02:00
Paul Holzinger
8a52e638e6 vendor latest c/common
Includes the default db backend changes.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-10 17:08:04 +02:00
Giuseppe Scrivano
8ac2aa7938 container: always check if mountpoint is mounted
when running as a service, the c.state.Mounted flag could get out of
sync if the container is cleaned up through the cleanup process.

To avoid this, always check if the mountpoint is really present before
skipping the mount.

[NO NEW TESTS NEEDED]

Closes: https://github.com/containers/podman/issues/17042

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-10-09 17:20:22 +02:00
openshift-ci[bot]
3b39d4b082 Merge pull request #20132 from cgiradkar/Issue-17856
Change log level for health_check
2023-10-04 17:59:32 +00:00
Chetan Giradkar
7e6e267329 Filter health_check and exec events for logging in console
Podman server logs are mostly full of healthcheck output, making them hard to navigate. Hence, made healthcheck service to run with LogLevelMax=notice, this would remove the normal output, inclusive the started/stopped messages from systemd itself.

Fixes #17856

Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
2023-10-04 14:50:15 +01:00
Ygal Blum
979c77f10e Volume create - fast exit when ignore is set and volume exists
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
2023-10-01 16:54:24 +03:00
Wolfgang Pross
bfbd0c8960 move IntelRdtClosID to HostConfig
Signed-off-by: Wolfgang Pross <wolfgang.pross@intel.com>
2023-09-27 16:44:13 +00:00
Wolfgang Pross
40d3c3b9b0 Add Intel RDT support
Add --rdt-class=COS to the create and run command to enable the
assignment of a container to a Class of Service (COS). The COS
represents a part of the cache based on the Cache Allocation Technology
(CAT) feature that is part of Intel's Resource Director Technology
(Intel RDT) feature set. By assigning a container to a COS, all PID's of
the container have only access to the cache space defined for this COS.
The COS has to be pre-configured based on the resctrl kernel driver.
cat_l2 and cat_l3 flags in /proc/cpuinfo represent CAT support for cache
level 2 and 3 respectively.

Signed-off-by: Wolfgang Pross <wolfgang.pross@intel.com>
2023-09-27 16:44:13 +00:00
Valentin Rothberg
7ade972102 libpod: pass entire environment to conmon
Pass the _entire_ environment to conmon instead of selectively enabling
only specific variables.  The main reasoning is to make sure that conmon
and the podman-cleanup callback process operate in the exact same
environment than the initial podman process.  Some configuration files
may be passed via environment variables.  Podman not passing those down
to conmon has led to subtle and hard to debug issues in the past, so
passing all down will avoid such kinds of issues in the future.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-26 16:48:52 +02:00
Valentin Rothberg
6293ec2e2d fix handling of static/volume dir
The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in #19938.

There were multiple issues that I try to summarize below:

 - c/common loaded the graphroot from c/storage to set the defaults for
   static and volume dir.  That ignored Podman's --root flag and
   surfaced in #19938 and other bugs.  c/common does not set the
   defaults anymore which gives Podman the ability to detect when the
   user/admin configured a custom directory (not empty value).

 - When parsing the CLI, Podman (ab)uses containers.conf structures to
   set the defaults but also to override them in case the user specified
   a flag.  The --root flag overrode the static dir which is wrong and
   broke a couple of use cases.  Now there is a dedicated field for in
   the "PodmanConfig" which also includes a containers.conf struct.

 - The defaults for static and volume dir and now being set correctly
   and adhere to --root.

 - The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
   cleanup process.  I believe that _all_ env variables should be passed
   to conmon to avoid such subtle bugs.

Overall I find that the code and logic is scattered and hard to
understand and follow.  I refrained from larger refactorings as I really
just want to get #19938 fixed and then go back to other priorities.

https://github.com/containers/common/pull/1659 broke three pkg/machine
tests.  Those have been commented out until getting fixed.

Fixes: #19938
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-25 14:14:30 +02:00
Paul Holzinger
af2665c28a pod rm: do not log error if anonymous volume is still used
This is not really an error, if the anonymous volume is still used then
this likely means it was transferred to another container with
--volumes-from. This is what the user wants and it is not like the user
can act on the logged error anyway. Once the last user of the volume is
removed it will be removed correctly.

see https://github.com/containers/podman/pull/19637

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-09-22 14:44:14 +02:00
OpenShift Merge Robot
639eb52c89 Merge pull request #20062 from vrothberg/syslog-fix
pass --syslog to the cleanup process
2023-09-20 11:57:33 -04:00
OpenShift Merge Robot
c5be3aecde Merge pull request #20056 from Luap99/fast-net-ls
compat API: speed up network list
2023-09-20 16:43:00 +02:00
Valentin Rothberg
4652a2623f pass --syslog to the cleanup process
The --syslog flag has not been passed to the cleanup process (i.e.,
conmon's exit args) complicating debugging quite a bit.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-20 15:37:07 +02:00
Daniel J Walsh
73dc72f80d vendor of containers/common
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-09-20 08:39:49 -04:00
Paul Holzinger
8e5adde0b3 compat API: speed up network list
The network list compat API requires us to include all containers with
their ip addresses for the selected networks. Because we have no network
-> container mapping in the db we have to go through all containers
every time. However the old code did it in the most ineffective way
possible, it quered the containers from the db for each individual
network. The of course is extremely expensive. Now the other expensive
call is calling Inspect() on the container each time. Inspect does for
more than we need.

To fix this we fist query containers only once for the API call, then
replace the inspect call with directly accessing the network status.
This will speed things up a lot!
The reported scenario includes 100 containers and 25 networks,
previously it took 1.5s for the API call not it takes 24ms, that is a
more than a 62x improvement. (tested with curl)

[NO NEW TESTS NEEDED] We have no timing tests.

Fixes #20035

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-09-20 13:08:42 +02:00
Paul Holzinger
befdb41995 libpod: remove unused ContainerState() fucntion
First that function claims to deep copy but then actually return the
original state so it does not work correctly, given that there are no
users just remove it instead of fixing it.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-09-20 11:01:09 +02:00
OpenShift Merge Robot
1e43fae5ad Merge pull request #19873 from rst0git/update-checkpointctl
vendor: update github.com/checkpoint-restore/checkpointctl to 1.1.0
2023-09-14 15:22:02 +02:00
Daniel J Walsh
b1e3e8d972 Run codespell on code
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-09-14 06:13:23 -04:00
Radostin Stoyanov
9b17d6cb06 vendor: update checkpointctl to v1.1.0
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2023-09-12 08:41:02 +01:00
danishprakash
cdcf18b862 kube: add DaemonSet support for generate
Signed-off-by: danishprakash <danish.prakash@suse.com>
2023-09-12 10:30:57 +05:30
OpenShift Merge Robot
cbb955811c Merge pull request #19245 from mheon/fix_19237
Ensure HC events fire after logs are written
2023-09-11 19:47:37 +02:00
OpenShift Merge Robot
325736fcb7 Merge pull request #19914 from umohnani8/term
Add support for kube TerminationGracePeriodSeconds
2023-09-11 19:24:18 +02:00
Giuseppe Scrivano
19bd9b33dd libpod: move oom_score_adj clamp to init
commit 8b4a79a744 introduced
oom_score_adj clamping when the container oom_score_adj value is lower
than the current one in a rootless environment.  Move the check to
init() time so it is performed every time the container starts and not
only when it is created.  It is more robust if the oom_score_adj value
is changed for the current user session.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-11 17:04:37 +02:00
Matt Heon
925794c6aa Ensure HC events fire after logs are written
HC events were firing as part of the `exec` call, before it had
even been decided whether the HC succeeded or failed. As such,
the status was not going to be correct any time there was a
change (e.g. the first event after a container went healthy to
unhealthy would still read healthy). Move the event into the
actual Healthcheck function and throw it in a defer to make sure
it happens at the very end, after logs are written.

Ignores several conditions that did not log previously (container
in question does not have a healthcheck, or an internal failure
that should not really happen).

Still not a perfect solution. This relies on the HC log being
written, when instead we could just get the status straight from
the function writing the event - so if we fail to write the log,
we can still report a bad status. But if the log wasn't written,
we're in bad shape regardless - `podman ps` would disagree with
the event written, for example.

Fixes #19237

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-09-11 08:02:46 -04:00
Urvashi Mohnani
d9a85466a0 Add support for kube TerminationGracePeriodSeconds
Add support to kube play to support the TerminationGracePeriodSeconds
fiels by sending the value of that to podman's stopTimeout.
Add support to kube generate to generate TerminationGracePeriodSeconds
if stopTimeout is set for a container (will ignore podman's default).

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2023-09-10 16:41:24 -04:00
Giuseppe Scrivano
b8f6a12d01 libpod: create the cgroup pod before containers
When a container is created and it is part of a pod, we ensure the pod
cgroup exists so limits can be applied on the pod cgroup.

Closes: https://github.com/containers/podman/issues/19175

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-08 14:58:48 +02:00
Giuseppe Scrivano
5de8f4aba0 libpod: allow cgroup path without infra container
a pod can use cgroups without an infra container.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-08 14:58:48 +02:00
Giuseppe Scrivano
5121c9eb0e libpod: check if cgroup exists before creating it
do not create the pod cgroup if it already exists.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-08 14:58:48 +02:00
Giuseppe Scrivano
38209ef49d libpod: refactor platformMakePod signature
accept only the resources to be used by the pod, so that the function
can more easily be used by a successive patch.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-08 14:58:48 +02:00
Giuseppe Scrivano
627ac1c96b libpod: destroy pod cgroup on pod stop
When the pod is stopped, we need to destroy the pod cgroup, otherwise
it is leaked.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-08 14:58:48 +02:00
Giuseppe Scrivano
556db46a68 libpod: refactor code to new function
move the code to remove the pod cgroup to a separate function.  It is
a preparation for the next patch.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-08 14:58:44 +02:00
Daniel J Walsh
6ee8f73d41 Merge pull request #19885 from rhatdan/kube
Add support for kube  securityContext.procMount
2023-09-08 06:56:05 -04:00
Ed Santiago
70cf9740f1 StopContainer: display signal num when name unknown
Under some circumstances podman tries to kill a container
using signal 37, for which unix.SignalName() returns "".
Not helpful. So, when that happens, show "(signal number)".

Signed-off-by: Ed Santiago <santiago@redhat.com>
2023-09-07 14:13:14 -06:00
Daniel J Walsh
b83485022d Add support for kube securityContext\.procMount
Fixes: https://github.com/containers/podman/issues/19881

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-09-07 09:49:11 -04:00
Valentin Rothberg
589867d716 podman: don't restart after kill
Also add a new `StoppedByUser` field to the container-inspect state
which can be useful during debugging and is now also used in the
regression test.  Note that I moved the `false` check one test above
such that we can compare the previous Podman version which should just
be stuck in the `wait $ctr` command since it will continue restarting.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-07 15:18:02 +02:00
Matt Heon
71549c642f Ignore spurious container-removal errors
When removing a container's dependency, getting an error that the
container has already been removed (ErrNoSuchCtr and
ErrCtrRemoved) should not be fatal. We wanted the container gone,
it's gone, no need to error out.

[NO NEW TESTS NEEDED] This is a race and thus hard to test for.

Fixes #18874

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-09-05 14:35:37 -04:00
Giuseppe Scrivano
702709a916 libpod: do not parse --hostuser in base 8
fix the parsing of --hostuser to treat the input in base 10.

Closes: https://github.com/containers/podman/issues/19800

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-08-31 12:34:58 +02:00
OpenShift Merge Robot
8bda49608f Merge pull request #19696 from Luap99/api-stream-format
api docs: document stream format
2023-08-28 19:43:24 +02:00
Giuseppe Scrivano
4b347609d6 oci: print stderr only after checking state
when the "kill" command fails, print the stderr from the OCI runtime
only after we check the container state.

It also simplifies the code since we don't have to hard code the error
messages we want to ignore.

Closes: https://github.com/containers/podman/issues/18452

[NO NEW TESTS NEEDED] it fixes a flake.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-08-28 09:22:48 +02:00
OpenShift Merge Robot
4ff21cf1ac Merge pull request #19568 from umohnani8/infra-name
Add infra-name annotations to kube gen/play
2023-08-25 15:23:47 +02:00
Daniel J Walsh
6f284dbd46 podman exec should set umask to match container
Fixes: https://github.com/containers/podman/issues/19713

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-08-24 13:20:06 -04:00
Urvashi Mohnani
52ed7fce2a Add infra-name annotations to kube gen/play
Add io.podman.annotations.infra.name annotation to kube play so
users can set the name of the infra container created.
When a pod is created with --infra-name set, the generated
kube yaml will have an infraName annotation set that will
be used when playing the generated yaml with podman.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2023-08-24 11:29:56 -04:00
Paul Holzinger
7c9c969815 API attach: return vnd.docker.multiplexed-stream header
The attach API used to always return the Content-Type
`vnd.docker.raw-stream`, however docker api v1.42 added the
`vnd.docker.multiplexed-stream` type when no tty was used.

Follow suit and return the same header for docker api v1.42 and libpod
v4.7.0. This technically allows clients to make a small optimization as
they no longer need to inspect the container to see if they get a raw or
multiplexed stream.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-08-24 16:22:28 +02:00
OpenShift Merge Robot
1bb96a87c9 Merge pull request #19687 from dfr/freebsd-netstat
libpod: sum per-interface network stats for FreeBSD
2023-08-21 19:49:56 -02:30
OpenShift Merge Robot
f727428b52 Merge pull request #19663 from rhatdan/ramfs
Add support for ramfs as well as tmpfs in volume mounts
2023-08-21 16:51:06 -02:30
Doug Rabson
3d00744d29 libpod: sum per-interface network stats for FreeBSD
This sums the metric values from all interfaces similar to the Linux
version.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2023-08-21 16:00:41 +01:00
OpenShift Merge Robot
375eb045ca Merge pull request #19661 from dfr/freebsd-var-run
libpod: use /var/run instead of /run on FreeBSD
2023-08-21 12:24:50 -02:30