Commit Graph

1838 Commits

Author SHA1 Message Date
Ed Santiago
1c77ee6fc5 CI: system tests: parallelize 010
Final cleanup. Has been working fine in #23257 for weeks.
Not much gain here, but every little bit helps.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-11-13 04:14:57 -07:00
Ed Santiago
969417711d system tests: safer install_kube_template()
Previous version was badly broken: it relied on 'make'
rebuilding a file under cwd, which is a no-no; and, in
the case where we don't have a source directory, just
blindly hoped that there'd be a system-installed .service
file with the correct path to podman.

Solution:
  . if running in source directory, run sed directly into
    destination service file in $UNIT_DIR. This is ugly
    duplication of a line in Makefile.

  . if NOT running in a source directory, check $PODMAN:
    . if it's /usr/bin/podman, continue. Include a warning
      that will be shown only on test failure.
    . otherwise skip, because we don't know what we're testing

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-11-11 10:44:32 -07:00
openshift-merge-bot[bot]
ee5b8de70d Merge pull request #24413 from giuseppe/add-test-zstd-chunked
tests: add basic zstd:chunked system test
2024-11-08 14:36:06 +00:00
openshift-merge-bot[bot]
a1c1ae62e7 Merge pull request #24340 from l0rd/ssh-knownhosts-test
New `system connection add` test
2024-11-08 13:24:46 +00:00
Giuseppe Scrivano
30a82cad7a test: add zstd:chunked system tests
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-11-08 12:39:07 +01:00
Ed Santiago
fbbfd07463 kube SIGINT system test: fix race in timeout handling
Up to now this test has been run using:

    PODMAN_TIMEOUT=2 run_podman kube play ...

...and this gives podman time to start the pod before getting
the signal.

When run in parallel, under heavy load, the above command seems
to time out before podman has gotten its act together. Weird
things happen, like weird exit status and (most crucially)
zombie containers.

Solution: wait for container to actually start before we kill it.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-11-07 11:01:08 -07:00
Mario Loriedo
cbf1d7fcae Avoid printing PR text to stdout in system test
Signed-off-by: Mario Loriedo <mario.loriedo@gmail.com>
2024-11-07 17:48:27 +01:00
Paul Holzinger
fb3a0e93a8 test/system: add regression test for TZDIR local issue
Regression test for #23550. Setting the TZDIR env should make no
difference for the local timezone as this is not a real timezone name
that is resolved from that directory.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-07 10:39:15 +01:00
Daniel J Walsh
6346a11b09 AdditionalSupport for SubPath volume mounts
Add support for inspecting Mounts which include SubPaths.

Handle SubPaths for kubernetes image volumes.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-11-06 10:10:26 -05:00
Ed Santiago
2c01264568 CI: systests: workaround for parallel podman-stop flake
Just bump up a timeout when running parallel, because of high load.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-11-04 10:45:14 -07:00
Paul Holzinger
d633824a95 Instrument cleanup tracer to log weird volume removal flake
Debug for #23913, I though if we have no idea which process is nuking
the volume then we need to figure this out. As there is no reproducer
we can (ab)use the cleanup tracer. Simply trace all unlink syscalls to
see which process deletes our special named volume. Given the volume
name is used as path on the fs and is deleted on volume rm we should
know exactly which process deleted it the next time hopefully.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-30 18:50:07 +01:00
openshift-merge-bot[bot]
3a7e1deed4 Merge pull request #24390 from edsantiago/safename-070
CI: make 070-build.bats use safe image names
2024-10-28 14:41:28 +00:00
openshift-merge-bot[bot]
2cbb2e8c42 Merge pull request #24392 from edsantiago/parallelize-520
CI: parallelize 520-checkpoint tests
2024-10-28 13:49:13 +00:00
Ed Santiago
41a82c9a95 CI: parallelize 450-interactive system tests
This has been running reliably for weeks in #23275

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 07:03:29 -06:00
Ed Santiago
10d056cc5e CI: parallelize 520-checkpoint tests
This has been running reliably for weeks in #23275

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 07:02:51 -06:00
Ed Santiago
e6b7e4ff84 CI: make 070-build.bats use safe image names
In preparation for maybe some day being able to run build tests
in parallel.

SUPER IMPORTANT NOTE! BUILD TESTS CANNOT BE PARALLELIZED YET!
buildah, when run in parallel, barfs with:

    race: parallel builds: copying...committing...creating... layer not known

Until this is fixed, podman-build can never be run in parallel.
See https://github.com/containers/buildah/issues/5674

This PR is simply cleaning things up so, if/when that day comes,
the ensuing parallelize PR will be short & sweet.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 06:58:26 -06:00
openshift-merge-bot[bot]
0962a1e1bf Merge pull request #24352 from edsantiago/systemd-leak-cleanup
System tests: clean up unit file leaks
2024-10-28 12:07:27 +00:00
Paul Holzinger
64516e1b8f test/system: add podman network reload test to distro gating
The recent fedora kernel 6.11.4 has a problem with ipv6 networks [1].
This is not a podman bug at all but rather a kernel regression. I can
reproduce the issue easily by running this test.

Given many users were hit by this add it to the distro level gating
which runs in the fedora openQA framework and then we should catch a
bad kernel like this hopefully in the future and prevent it from going
into stable.

[1] https://github.com/containers/podman/issues/24374

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-28 11:51:43 +01:00
Ed Santiago
743a0d49eb System tests: clean up unit file leaks
Quadlet tests and some systemd tests leak unit files, as
reported by 'systemctl list-units --failed'. Clean them up.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-28 04:45:04 -06:00
Paul Holzinger
6069cdda00 healthcheck: do not leak statup service
The startup service is special because we have to transition from
startup to the normal unit. And in order to do so we kill ourselves (as
we are run as part of the service). This means we always exited 1 which
causes systemd to keep us failure and not remove the transient unit
unless "reset-failed" is called. As there is no process around to do
that we cannot really do this, thus make us exit(0) which makes more
sense.

Of course we could try to reset-failed the unit later but the code for
that seems more complicated than that.

Add a new test from Ed that ensures we check for all healthcheck units
not just the timer to avoid leaks. I slightly modified it to provide a
better error on leaks.

Fixes: 0bbef4b830 ("libpod: rework shutdown handler flow")
Fixes: #24351

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-25 13:47:59 +02:00
Jan Rodák
afedb83917 Add Startup HealthCheck configuration to the podman inspect
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
2024-10-24 13:49:51 +02:00
David Gibson
5b131b8273 test/system: Fix spurious "duplicate tests" failures in pasta tests
As an internal consistency check, the pasta tests check for duplicated test
cases by grepping a log file for a parsed test id.  However it uses
grep -F for the purpose which will not perform an exact match, but a
substring match.  There are some tests which generate an id which is a
substring of the id for other tests, so when test order is randomised, this
can cause a spurious failure.  This can happen in practice when running
the test in parallel with very high concurrency (e.g. -j 100).

Fix this by adding the -x option to grep, which only checks for full line
exact matches.

Fixes: https://github.com/containers/podman/issues/24342

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2024-10-23 14:02:53 +11:00
Miloslav Trmač
6fd0e227b4 Improve "podman load - from URL"
Don't assume that the loaded image will be deduplicated
with the server image.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:36:14 +02:00
Miloslav Trmač
77ef28c14f Try to repair c/storage after removing an additional image store
The additional image store feature assumes that images / layers
in the additional store never go away, while we do remove it after
this test. Try to repair the store.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:36:03 +02:00
Miloslav Trmač
1d7ec1ef5f Use the config digest to compare images loaded/pulled using different methods
Historically, non-schema1 images had a deterministic image ID == config digest.
With zstd:chunked, we don't want to deduplicate layers pulled by consuming the
full tarball and layers partially pulled based on TOC, because we can't cheaply
ensure equivalence; so, image IDs for images where a TOC was used differ.

To accommodate that, compare images using their configs digests, not using image IDs.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:36:02 +02:00
Miloslav Trmač
bf8f2b5551 Simplify the additional store test
When looking up the current-store image ID, do that
from the same output where we verify that the ID is from the
current store, instead of listing images twice.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:15:46 +02:00
Miloslav Trmač
3bc6072142 Fix the store choice in "podman pull image with additional store"
The test got the stores RW status backwards.

Before zstd:chunked, both image IDs should be the same, so this used
to make no difference.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-22 19:15:46 +02:00
Giuseppe Scrivano
94878af151 test: set soft ulimit
when the current soft limit is higher than the new value, ulimit fails
to set the hard limit as (tested on Rawhide):

[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument

to avoid the problem, set also the soft limit:

[root@rawhide ~]# ulimit -n -H
12345678
[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument
[root@rawhide ~]# ulimit -n -SH 1048575
[root@rawhide ~]# ulimit -n -H
1048575

commit 71d5ee0e04 introduced the issue.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-10-22 12:05:07 +02:00
Miloslav Trmač
fdc9feea0e Fix 330-corrupt-images.bats in composefs test runs
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-10-18 23:44:04 +02:00
Paul Holzinger
57b022782b quadlet: ensure user units wait for the network
As documented in the issue there is no way to wait for system units from
the user session[1]. This causes problems for rootless quadlet units as
they might be started before the network is fully up. TWhile this was
always the case and thus was never really noticed the main thing that
trigger a bunch of errors was the switch to pasta.

Pasta requires the network to be fully up in order to correctly select
the right "template" interface based on the routes. If it cannot find a
suitable interface it just fails and we cannot start the container
understandingly leading to a lot of frustration from users.

As there is no sign of any movement on the systemd issue we work around
here by using our own user unit that check if the system session
network-online.target it ready.

Now for testing it is a bit complicated. While we do now correctly test
the root and rootless generator since commit ada75c0bb8 the resulting
Wants/After= lines differ between them and there is no logic in the
testfiles themself to say if root/rootless to match specifics. One idea
was to use `assert-key-is-rootless/root` but that seemed like more
duplication for little reason so use a regex and allow both to make it
pass always. To still have some test coverage add a check in the system
test to ask systemd if we did indeed have the right depdendencies where
we can check for exact root/rootless name match.

[1] https://github.com/systemd/systemd/issues/3312

Fixes #22197

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-18 11:43:48 +02:00
Ed Santiago
67e39c1ec5 pasta udp tests: new bytecheck helper
...for debugging #24147, because "md5sum mismatch" is not
the best way to troubleshoot bytestream differences.

socat is run on the container, so this requires building a
new testimage (20241011). Bump to new CI VMs[1] which include it.

 [1] https://github.com/containers/automation_images/pull/389

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-16 10:15:19 -06:00
Ed Santiago
1ddb15c81f System tests: safer pause-image creation
The current mypod hack breaks down when running individual tests:

    $ hack/bats 010   <<< barfs because it does not want pause-image!

Reason: Bats does not provide any official way to tell if tests
are being run in parallel.

Workaround: use an undocumented way.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-16 06:02:23 -06:00
openshift-merge-bot[bot]
a2eb5429b3 Merge pull request #24264 from edsantiago/try-try-again
CI: fix changing-rootFsSize flake
2024-10-15 22:05:42 +00:00
Ed Santiago
1b57dcab61 CI: fix changing-rootFsSize flake
(Second try). Use an airgapped image in the inspect-data tests.

Fixes: #23756

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-15 05:14:49 -06:00
Giuseppe Scrivano
71d5ee0e04 podman: do not set rlimits to the default value
since the effect would be to lower the rlimits when their definition
is higher than the default value.

The test doesn't fail on the previous version, unless the system is
configured with a nofile ulimit higher than the default value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2317721

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-10-11 23:04:27 +02:00
Paul Holzinger
fe404959ed test: update timezone checks
In debian EST and MST7MDT are gone by default and moved to a special
package[1], instead of also installing that in the images lets use
different timezones in the test.

[1] 42c0008f86

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-10 17:44:08 +02:00
Ed Santiago
38803713d6 CI: quadlet system tests: use airgapped testimage
This command sequence causes SizeRootFs to change on foo:

   podman tag foo newimagename
   podman save ... newimagename
   podman load ...

Solution: get foo completely out of the picture. Use an
airgapped image: new image, new digest, new everything.

Fixes: #23756

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-09 14:11:00 -06:00
Ed Santiago
e7833d52cf 055-rm test: clean up a test, and document
There's an important reason why the healthcheck container in 055-rm
test uses 'sleep infinity' and not 'top. Document it.

And, the test itself wasn't actually working as intended. Make
it safer by confirming that the container actually enters
the "stopping" state.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-07 15:22:49 -06:00
openshift-merge-bot[bot]
1f7fe1d1e8 Merge pull request #24167 from giuseppe/improve-check-for-current-user-mapped
libpod: hasCurrentUserMapped checks for gid too
2024-10-04 16:55:13 +00:00
Giuseppe Scrivano
e46ae46f18 libpod: hasCurrentUserMapped checks for gid too
the kernel checks that both the uid and the gid are mapped inside the
user namespace, not only the uid:

/**
 * privileged_wrt_inode_uidgid - Do capabilities in the namespace work over the inode?
 * @ns: The user namespace in question
 * @idmap: idmap of the mount @inode was found from
 * @inode: The inode in question
 *
 * Return true if the inode uid and gid are within the namespace.
 */
bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
				 struct mnt_idmap *idmap,
				 const struct inode *inode)
{
	return vfsuid_has_mapping(ns, i_uid_into_vfsuid(idmap, inode)) &&
	       vfsgid_has_mapping(ns, i_gid_into_vfsgid(idmap, inode));
}

for this reason, improve the check for hasCurrentUserMapped to verify
that the gid is also mapped, and if it is not, use an intermediate
mount for the container rootfs.

Closes: https://github.com/containers/podman/issues/24159

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-10-04 16:17:04 +02:00
openshift-merge-bot[bot]
06f24180ce Merge pull request #24125 from edsantiago/ci-desired-network
CI: require and test CI_DESIRED_NETWORK on RHEL
2024-10-02 12:48:49 +00:00
Ed Santiago
410537808e System tests: sdnotify: wait for socket file creation
Potential race between starting socat (which creates a socket
file) and processes accessing said socket. Or maybe not. I
dunno, I'm grasping at straws. This is an elusive flake.

Fixes: #23798 (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-01 14:50:16 -06:00
Ed Santiago
b791dfb558 CI: require and test CI_DESIRED_NETWORK on RHEL
Although podman has moved on from CNI, RHEL has not. Make
sure that builds on RHEL test the desired network backend(s).

Effective immediately, gating.yaml on all RHEL branches
must set CI_DESIRED_NETWORK (=cni or =netavark)

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-10-01 10:44:07 -06:00
openshift-merge-bot[bot]
514d25d53b Merge pull request #24068 from edsantiago/cors-fixes
CORS system test: clean up
2024-09-27 13:19:28 +00:00
openshift-merge-bot[bot]
87dcf9d9d2 Merge pull request #24062 from ygalblum/quadlet-restore-dir-order
Quadlet - make sure the order of the UnitsDir is deterministic
2024-09-27 12:02:24 +00:00
openshift-merge-bot[bot]
08cbd38994 Merge pull request #24073 from edsantiago/oh-i-give-up
System tests: set a default XDG_RUNTIME_DIR
2024-09-26 18:45:39 +00:00
Ygal Blum
ebbec00b0d Quadlet - make sure the order of the UnitsDir is deterministic
Change getUnitDirs to maintain a slice in addition to the map and return the slice
Add helper functions to make the code more readable
Adjust unit tests
Restore system test

Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
2024-09-26 10:57:47 -04:00
openshift-merge-bot[bot]
4e38381d37 Merge pull request #23900 from Honny1/healthcheck-log
HealthCheck log output options
2024-09-26 11:55:55 +00:00
Ed Santiago
70c131ed68 System tests: set a default XDG_RUNTIME_DIR
Yield to reality: if $XDG_RUNTIME_DIR is unset, assume a
reasonable default (rootless only). This clears up a
common failure in Fedora gating tests, and will probably
prevent future time wasters.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-09-25 12:45:17 -06:00
Ed Santiago
73cbc13190 CORS system test: clean up
Primary motivator: 'curl -v' format changes in f42

Drive-bys:
 * 127.0.0.1, not localhost
 * use wait_for_port, not sleep
 * show curl commands and their output, to ease debugging failures
 * better failure assertions

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-09-25 07:46:07 -06:00