Commit Graph

6792 Commits

Author SHA1 Message Date
Ed Santiago
fbbfd07463 kube SIGINT system test: fix race in timeout handling
Up to now this test has been run using:

    PODMAN_TIMEOUT=2 run_podman kube play ...

...and this gives podman time to start the pod before getting
the signal.

When run in parallel, under heavy load, the above command seems
to time out before podman has gotten its act together. Weird
things happen, like weird exit status and (most crucially)
zombie containers.

Solution: wait for container to actually start before we kill it.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-11-07 11:01:08 -07:00
Paul Holzinger
22152a2f9c test/buildah-bud: build new inet helper
Added in https://github.com/containers/buildah/pull/5783

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-07 10:39:15 +01:00
Paul Holzinger
fb3a0e93a8 test/system: add regression test for TZDIR local issue
Regression test for #23550. Setting the TZDIR env should make no
difference for the local timezone as this is not a real timezone name
that is resolved from that directory.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-07 10:39:15 +01:00
openshift-merge-bot[bot]
aac206e9c5 Merge pull request #24412 from Sativarsainath-26/network-events
Fix: To print create and remove network in podman events
2024-11-06 18:33:18 +00:00
Daniel J Walsh
6346a11b09 AdditionalSupport for SubPath volume mounts
Add support for inspecting Mounts which include SubPaths.

Handle SubPaths for kubernetes image volumes.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-11-06 10:10:26 -05:00
openshift-merge-bot[bot]
a358d83ce9 Merge pull request #24437 from lambinoo/feat-split-pod-container-start-24401
Add key to control if a container can get started by its pod
2024-11-05 15:04:16 +00:00
Sainath Sativar
c23d9c6f23 Log network creation and removal events in Podman
This commit resolves an issue where network creation and removal events were not being logged in `podman events`. A new function has been introduced in the `events` package to ensure consistent logging of network lifecycle events. This update will allow users to track network operations more effectively through the event log, improving visibility and aiding in debugging network-related issues.

Fixes: #24032
Signed-off-by: Sainath Sativar <Sativar.sainath@gmail.com>
2024-11-05 11:58:47 +00:00
openshift-merge-bot[bot]
c8af2f2c1e Merge pull request #24334 from rhatdan/quadlet
Honor users requests in quadlet files
2024-11-05 09:45:11 +00:00
Farya L. Maerten
2597eeae70 Add key to control if a container can get started by its pod
By default today, the container is always started if its pod is also
started. This prevents to create custom with systemd where containers in
a pod could be started through their `[Install]` section.

We add a key `StartWithPod=`, enabled by default, that enables one to
disable that behavior.

This prevents the pod service from changing the state of the container
service.

Fixes #24401

Signed-off-by: Farya L. Maerten <me@ltow.me>
2024-11-05 08:39:23 +01:00
Daniel J Walsh
c6be5a6684 Honor users requests in quadlet files
Fixes: https://github.com/containers/podman/issues/24322

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-11-04 15:08:26 -05:00
openshift-merge-bot[bot]
df41725d61 Merge pull request #24461 from edsantiago/stop-trap-timeout
CI: systests: workaround for parallel podman-stop flake
2024-11-04 18:56:59 +00:00
openshift-merge-bot[bot]
0f25d9ee15 Merge pull request #24406 from Luap99/event-api-response
fix API issue about missing the status code in the events and logs endpoints
2024-11-04 18:54:14 +00:00
Ed Santiago
2c01264568 CI: systests: workaround for parallel podman-stop flake
Just bump up a timeout when running parallel, because of high load.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-11-04 10:45:14 -07:00
openshift-merge-bot[bot]
2279a77303 Merge pull request #24403 from Luap99/tools-vendor
go.mod vendor: ensure we never have the toolchain directive set
2024-11-04 17:15:12 +00:00
Ygal Blum
dbfc8cccda Quadlet - support image file based mount in container file
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
2024-11-01 16:20:23 -04:00
Paul Holzinger
e6d987882e API: container logs flush status code
API clients expect the status code quickly otherwise they can time out.
If we do not flush we may not write the header immediately and only when
futher logs are send.

Fixes #23712

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-01 18:54:13 +01:00
Paul Holzinger
768ad8653a rework event code to improve API errors
One of the problems with the Events() API was that you had to call it in
a new goroutine. This meant the the error returned by it had to be read
back via a second channel. This cuased other bugs in the past but here
the biggest problem is that basic errors such as invalid since/until
options were not directly returned to the caller.
It meant in the API we were not able to write http code 200 quickly
because we always waited for the first event or error from the
channels. This in turn made some clients not happy as they assume the
server hangs on time out if no such events are generated.

To fix this we resturcture the entire event flow. First we spawn the
goroutine inside the eventer Read() function so not all the callers have
to. Then we can return the basic error quickly without the goroutine.
The caller then checks the error like any normal function and the API
can use this one to decide which status code to return.
Second we now return errors/event in one channel then the callers can
decide to ignore or log them which makes it a bit more clear.

Fixes c46884aa93 ("podman events: check for an error after we finish reading events")
Fixes #23712

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-01 18:54:13 +01:00
Paul Holzinger
0acd192b59 Makefile: vendor target should always remove toolchain
We never want the toolchain as the default is to use the same as the go
version. So the only purpose of toolchain is to force a newer compiler
than necessary which we do not want as we are getting build by many
different distributions and block builds that would otherwise work fine
is just not helpful to anyone.

Also update the go.mod comments remind people that there should be no
toolchain. The make vendor target with the toolchain will now guarantee
this so the CI will fail otherwise.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-01 13:23:01 +01:00
Paul Holzinger
f4ad93d5f6 test/tools/go.mod: remove toolchain
Like our main go.mod we never want to force a specific toolchain.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-01 13:23:01 +01:00
Paul Holzinger
d633824a95 Instrument cleanup tracer to log weird volume removal flake
Debug for #23913, I though if we have no idea which process is nuking
the volume then we need to figure this out. As there is no reproducer
we can (ab)use the cleanup tracer. Simply trace all unlink syscalls to
see which process deletes our special named volume. Given the volume
name is used as path on the fs and is deleted on volume rm we should
know exactly which process deleted it the next time hopefully.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-30 18:50:07 +01:00
renovate[bot]
c7ff3b75cb fix(deps): update module github.com/onsi/ginkgo/v2 to v2.21.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-10-30 12:17:30 +00:00
renovate[bot]
5f66277138 chore(deps): update dependency setuptools to ~=75.3.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-10-29 10:33:01 +00:00
openshift-merge-bot[bot]
a56cda18cf Merge pull request #24388 from shenpengfeng/main
chore: fix some function names in comment
2024-10-29 10:32:12 +00:00
shenpengfeng
9abc17f1e1 chore: fix some function names in comment
Signed-off-by: shenpengfeng <xinhangzhou@icloud.com>
2024-10-29 17:57:31 +08:00
openshift-merge-bot[bot]
3a7e1deed4 Merge pull request #24390 from edsantiago/safename-070
CI: make 070-build.bats use safe image names
2024-10-28 14:41:28 +00:00
openshift-merge-bot[bot]
2cbb2e8c42 Merge pull request #24392 from edsantiago/parallelize-520
CI: parallelize 520-checkpoint tests
2024-10-28 13:49:13 +00:00
Ed Santiago
41a82c9a95 CI: parallelize 450-interactive system tests
This has been running reliably for weeks in #23275

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 07:03:29 -06:00
Ed Santiago
10d056cc5e CI: parallelize 520-checkpoint tests
This has been running reliably for weeks in #23275

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 07:02:51 -06:00
Ed Santiago
e6b7e4ff84 CI: make 070-build.bats use safe image names
In preparation for maybe some day being able to run build tests
in parallel.

SUPER IMPORTANT NOTE! BUILD TESTS CANNOT BE PARALLELIZED YET!
buildah, when run in parallel, barfs with:

    race: parallel builds: copying...committing...creating... layer not known

Until this is fixed, podman-build can never be run in parallel.
See https://github.com/containers/buildah/issues/5674

This PR is simply cleaning things up so, if/when that day comes,
the ensuing parallelize PR will be short & sweet.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 06:58:26 -06:00
openshift-merge-bot[bot]
0962a1e1bf Merge pull request #24352 from edsantiago/systemd-leak-cleanup
System tests: clean up unit file leaks
2024-10-28 12:07:27 +00:00
Paul Holzinger
64516e1b8f test/system: add podman network reload test to distro gating
The recent fedora kernel 6.11.4 has a problem with ipv6 networks [1].
This is not a podman bug at all but rather a kernel regression. I can
reproduce the issue easily by running this test.

Given many users were hit by this add it to the distro level gating
which runs in the fedora openQA framework and then we should catch a
bad kernel like this hopefully in the future and prevent it from going
into stable.

[1] https://github.com/containers/podman/issues/24374

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-28 11:51:43 +01:00
Ed Santiago
743a0d49eb System tests: clean up unit file leaks
Quadlet tests and some systemd tests leak unit files, as
reported by 'systemctl list-units --failed'. Clean them up.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 04:45:04 -06:00
Paul Holzinger
6069cdda00 healthcheck: do not leak statup service
The startup service is special because we have to transition from
startup to the normal unit. And in order to do so we kill ourselves (as
we are run as part of the service). This means we always exited 1 which
causes systemd to keep us failure and not remove the transient unit
unless "reset-failed" is called. As there is no process around to do
that we cannot really do this, thus make us exit(0) which makes more
sense.

Of course we could try to reset-failed the unit later but the code for
that seems more complicated than that.

Add a new test from Ed that ensures we check for all healthcheck units
not just the timer to avoid leaks. I slightly modified it to provide a
better error on leaks.

Fixes: 0bbef4b830 ("libpod: rework shutdown handler flow")
Fixes: #24351

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-25 13:47:59 +02:00
Jan Rodák
afedb83917 Add Startup HealthCheck configuration to the podman inspect
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
2024-10-24 13:49:51 +02:00
Ed Santiago
ee9c681f31 Buildah treadmill: improve wording in test-fail instructions
Clarify, expand, fix a typo. These are the instructions
shown when the **patching** step fails, typically when
buildah's helpers.bash is changed in a way that conflicts
with our make-it-work-in-podman patches.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-23 12:34:33 -06:00
Paul Holzinger
0cdb9b3b22 ps: fix display of exposed ports
This fixes two problems, first if a port is published and exposed it
should not be shown twice. It is enough to show the published one.

Second, if there is a huge range the ports were no grouped causing the
output to be unreadable basically. Now we group exposed ports like we do
with the normal published ports.

Fixes #23317

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-23 15:03:30 +02:00
David Gibson
5b131b8273 test/system: Fix spurious "duplicate tests" failures in pasta tests
As an internal consistency check, the pasta tests check for duplicated test
cases by grepping a log file for a parsed test id.  However it uses
grep -F for the purpose which will not perform an exact match, but a
substring match.  There are some tests which generate an id which is a
substring of the id for other tests, so when test order is randomised, this
can cause a spurious failure.  This can happen in practice when running
the test in parallel with very high concurrency (e.g. -j 100).

Fix this by adding the -x option to grep, which only checks for full line
exact matches.

Fixes: https://github.com/containers/podman/issues/24342

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2024-10-23 14:02:53 +11:00
Miloslav Trmač
6fd0e227b4 Improve "podman load - from URL"
Don't assume that the loaded image will be deduplicated
with the server image.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:36:14 +02:00
Miloslav Trmač
77ef28c14f Try to repair c/storage after removing an additional image store
The additional image store feature assumes that images / layers
in the additional store never go away, while we do remove it after
this test. Try to repair the store.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:36:03 +02:00
Miloslav Trmač
1d7ec1ef5f Use the config digest to compare images loaded/pulled using different methods
Historically, non-schema1 images had a deterministic image ID == config digest.
With zstd:chunked, we don't want to deduplicate layers pulled by consuming the
full tarball and layers partially pulled based on TOC, because we can't cheaply
ensure equivalence; so, image IDs for images where a TOC was used differ.

To accommodate that, compare images using their configs digests, not using image IDs.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:36:02 +02:00
Miloslav Trmač
bf8f2b5551 Simplify the additional store test
When looking up the current-store image ID, do that
from the same output where we verify that the ID is from the
current store, instead of listing images twice.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:15:46 +02:00
Miloslav Trmač
3bc6072142 Fix the store choice in "podman pull image with additional store"
The test got the stores RW status backwards.

Before zstd:chunked, both image IDs should be the same, so this used
to make no difference.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:15:46 +02:00
openshift-merge-bot[bot]
57095a9e62 Merge pull request #24335 from giuseppe/test-set-soft-ulimit
test: set soft ulimit
2024-10-22 11:09:41 +00:00
openshift-merge-bot[bot]
f4227e887c Merge pull request #24275 from Luap99/wait-condition
libpod API: only return exit code without conditions
2024-10-22 10:53:12 +00:00
Giuseppe Scrivano
94878af151 test: set soft ulimit
when the current soft limit is higher than the new value, ulimit fails
to set the hard limit as (tested on Rawhide):

[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument

to avoid the problem, set also the soft limit:

[root@rawhide ~]# ulimit -n -H
12345678
[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument
[root@rawhide ~]# ulimit -n -SH 1048575
[root@rawhide ~]# ulimit -n -H
1048575

commit 71d5ee0e04 introduced the issue.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-10-22 12:05:07 +02:00
Miloslav Trmač
fdc9feea0e Fix 330-corrupt-images.bats in composefs test runs
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-18 23:44:04 +02:00
openshift-merge-bot[bot]
290d94d3c0 Merge pull request #24300 from edsantiago/flake-fix-checkpoint-test
CI: e2e: fix checkpoint flake
2024-10-18 16:42:44 +00:00
Paul Holzinger
67e0fa8b89 quadlet: add default network dependencies to all units
There is no good reason for the special case, kube and pod units
definitely need it. Volume and network units maybe not but for
consistency we add it there as well. This makes the docs much easier to
write and understand for users as the behavior will not differ.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-18 14:01:22 +02:00
Paul Holzinger
57b022782b quadlet: ensure user units wait for the network
As documented in the issue there is no way to wait for system units from
the user session[1]. This causes problems for rootless quadlet units as
they might be started before the network is fully up. TWhile this was
always the case and thus was never really noticed the main thing that
trigger a bunch of errors was the switch to pasta.

Pasta requires the network to be fully up in order to correctly select
the right "template" interface based on the routes. If it cannot find a
suitable interface it just fails and we cannot start the container
understandingly leading to a lot of frustration from users.

As there is no sign of any movement on the systemd issue we work around
here by using our own user unit that check if the system session
network-online.target it ready.

Now for testing it is a bit complicated. While we do now correctly test
the root and rootless generator since commit ada75c0bb8 the resulting
Wants/After= lines differ between them and there is no logic in the
testfiles themself to say if root/rootless to match specifics. One idea
was to use `assert-key-is-rootless/root` but that seemed like more
duplication for little reason so use a regex and allow both to make it
pass always. To still have some test coverage add a check in the system
test to ask systemd if we did indeed have the right depdendencies where
we can check for exact root/rootless name match.

[1] https://github.com/systemd/systemd/issues/3312

Fixes #22197

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-18 11:43:48 +02:00
Paul Holzinger
ada75c0bb8 test/e2e: test quadlet with and without --user
This seems to be a testing gap, we need to test both for full coverage.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-17 15:53:10 +02:00