podman

mirror of https://github.com/containers/podman.git synced 2025-09-09 11:32:19 +08:00

Author	SHA1	Message	Date
Ed Santiago	cf5df5b805	quadlet tests: skip on RHEL8 rootless skip in setup() if journald unavailable. To be pedantic, this is overkill: some quadlet tests pass because they don't run journald. Too bad. Also skip a play-kube test that requires journal Signed-off-by: Ed Santiago <santiago@redhat.com>	2023-03-21 07:18:14 -06:00
Valentin Rothberg	1541ce56cf	kube play: set service container as main PID when possible Commit 4fa307f14923 fixed a number of issues in the sdnotify proxies. Whenever a container runs with a custom sdnotify policy, the proxies need to keep running which in turn required Podman to run and wait for the service container to stop. Improve on that behavior and set the service container as the main PID (instead of Podman) when no container needs sdnotify. Fixes: #17345 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-02-10 13:31:03 +01:00
OpenShift Merge Robot	2dd9e0859c	Merge pull request #16947 from ygalblum/kube-service-container-logdriver Kube Play: use passthrough as the default log-driver if service-container is set	2023-01-03 09:28:00 -05:00
Ygal Blum	68fbebfacc	Kube Play: use passthrough as the default log-driver if service-container is set Reasoning --------- When the log-driver is passthrough, the journal socket is passed to the containers as-is which has two advantages: 1. journald can see who the actual sender of the log event is, rather than thinking everything comes from the conmon process 2. conmon will not have to copy all the log data Code Changes ------------ If log-driver was not set by the user and service-container is set use passthrough as the default log-driver Update the system tests - explicitly set logdriver in sdnotify and play tests - podman-kube template test: Verify the default log driver for service-container Signed-off-by: Ygal Blum <ygal.blum@gmail.com>	2023-01-03 10:34:24 +02:00
Ed Santiago	16b595c32c	Build and use a newer systemd image ...based on f37, not f31. And make it fedora-minimal so it's smaller. And clean up dnf so it's even smaller. And tag it with our proper YMD tag, and commit the script that builds it. This broke the system-df tests. In the process of resolving that, I found those tests a little lacking. So, improve their coverage a little bit. Signed-off-by: Ed Santiago <santiago@redhat.com>	2023-01-02 13:26:46 -07:00
Valentin Rothberg	4fa307f149	kube sdnotify: run proxies for the lifespan of the service As outlined in #16076, a subsequent BARRIER may follow the READY message sent by a container. To correctly imitate the behavior of systemd's NOTIFY_SOCKET, the notify proxies span up by `kube play` must hence process messages for the entirety of the workload. We know that the workload is done and that all containers and pods have exited when the service container exits. Hence, all proxies are closed at that time. The above changes imply that Podman runs for the entirety of the workload and will henceforth act as the MAINPID when running inside of systemd. Prior to this change, the service container acted as the MAINPID which is now not possible anymore; Podman would be killed immediately on exit of the service container and could not clean up. The kube template now correctly transitions to in-active instead of failed in systemd. Fixes: #16076 Fixes: #16515 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-12-06 14:15:11 +01:00
Valentin Rothberg	8c3af71862	notify k8s system test: move sending message into exec The flake in #16076 is likely related to the notify message not being delivered/read correctly. Move sending the message into an exec session such that flakes will reveal an error message. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-12-05 14:32:06 +01:00
Ed Santiago	120a77e394	testimage: add iproute2 & socat, for pasta networking PR #16141 introduces a new network type, "pasta". Its tests rely on running 'ip -j' and socat in the container. Add them. Also: bump to alpine 3.16.2 (from 3.16.0) Also: clean up apk cache, this saves us 2MB+ in the image Also (unrelated): clean up two broken uses of '$(< ...)' that are causing tests to blow up under bats 1.8 on my laptop New testimage is 20221018 and, sigh, is 12.7MB (up 4MB). Signed-off-by: Ed Santiago <santiago@redhat.com>	2022-10-18 11:50:48 -06:00
Ed Santiago	d5f044ee7a	System tests: reenable some skipped aarch64 tests Background: in order to add aarch64 tests, we had to add emergency skips to a lot of failing tests. No attempt was ever made to understand why they were failing. Fast forward to today, I filed #15888 just to see if tests are still failing. Looks like a number of them are fixed. (Yes, magically). Remove those skips. See: #15074, #15277 Signed-off-by: Ed Santiago <santiago@redhat.com>	2022-09-21 14:07:22 -06:00
Valentin Rothberg	79e21b5b16	kube play: sd-notify integration Integrate sd-notify policies into `kube play`. The policies can be configured for all contianers via the `io.containers.sdnotify` annotation or for indidivual containers via the `io.containers.sdnotify/$name` annotation. The `kube play` process will wait for all containers to be ready by waiting for the individual `READY=1` messages which are received via the `pkg/systemd/notifyproxy` proxy mechanism. Also update the simple "container" sd-notify test as it did not fully test the expected behavior which became obvious when adding the new tests. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-08-10 21:12:39 +02:00
Valentin Rothberg	3fc126e152	libpod: allow the notify socket to be passed programatically The notify socket can now either be specified via an environment variable or programatically (where the env is ignored). The notify mode and the socket are now also displayed in `container inspect` which comes in handy for debugging and allows for propper testing. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-08-10 21:10:17 +02:00
Lokesh Mandvekar	da98c88778	Cirrus: enable Fedora 36 aarch64 tasks on EC2 new file: test/e2e/config_arm64.go Tests that fail on aarch64 have been skipped with `skip_if_aarch64`. Co-authored-by: Chris Evich <cevich@redhat.com> Co-authored-by: Ed Santiago <santiago@redhat.com> Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>	2022-07-27 15:27:52 -04:00
Valentin Rothberg	8684d41e38	k8systemd: run k8s workloads in systemd Support running `podman play kube` in systemd by exploiting the previously added "service containers". During `play kube`, a service container is started before all the pods and containers, and is stopped last. The service container communicates its conmon PID via sdnotify. Add a new systemd template to dispatch such k8s workloads. The argument of the template is the path to the k8s file. Note that the path must be escaped for systemd not to bark: Let's assume we have a `top.yaml` file in the home directory: ``` $ escaped=$(systemd-escape ~/top.yaml) $ systemctl --user start podman-play-kube@$escaped.service ``` Closes: https://issues.redhat.com/browse/RUN-1287 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-05-17 10:18:58 +02:00
Valentin Rothberg	03af8213ce	sdnotify: send MAINPID only once Send the main PID only once. Previously, `(*Container).start()` and the conmon handler sent them ~simultaneously and went into a race. I noticed the issue while debugging a WIP PR. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-05-12 11:11:37 +02:00
Ed Santiago	cc42321697	sdnotify test: accept MAINPID anywhere systemd sometimes spits out lines in the wrong order. Deal with it. This fixes an infrequent flake that I haven't filed because I didn't understand it well enough. (Hence, this reduces BUGS but does not reduce BUG COUNT. Sorry!) Signed-off-by: Ed Santiago <santiago@redhat.com>	2021-09-30 12:09:48 -06:00
Ed Santiago	e3c7e02a0e	System tests: add cleanup & debugging output Cleanup: the final 'play' test wasn't cleaning up after itself, leading to angry warning messages when rerunning tests (in my environment; never in CI) Debug: I'm seeing a lot of "Could not parse READY=1 as MAINPID=nnn" flakes in the sdnotify:container test (nine in the past month). Add debug traces to help diagnose in future flakes. Signed-off-by: Ed Santiago <santiago@redhat.com>	2021-09-01 11:29:59 -06:00
Daniel J Walsh	c22f3e8b4e	Implement SD-NOTIFY proxy in conmon This leverages conmon's ability to proxy the SD-NOTIFY socket. This prevents locking caused by OCI runtime blocking, waiting for SD-NOTIFY messages, and instead passes the messages directly up to the host. NOTE: Also re-enable the auto-update tests which has been disabled due to flakiness. With this change, Podman properly integrates into systemd. Fixes: #7316 Signed-off-by: Joseph Gooch <mrwizard@dok.org> Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Signed-off-by: Valentin Rothberg <rothberg@redhat.com>	2021-08-20 11:12:05 +02:00
Ed Santiago	9fd7ab50f8	System tests: honor $OCI_RUNTIME (for CI) Some CI systems set $OCI_RUNTIME as a way to override the default crun. Integration (e2e) tests honor this, but system tests were not aware of the convention; this means we haven't been testing system tests with runc, which means RHEL gating tests are now failing. The proper solution would be to edit containers.conf on CI systems. Sorry, that would involve too much CI-VM work. Instead, this PR detects $OCI_RUNTIME and creates a dummy containers.conf file using that runtime. Add: various skips for tests that don't work with runc. Refactor: add a helper function so we don't need to do the complicated 'podman info blah blah .OCIRuntime.blah' thing in many places. BUG: we leave a tmp file behind on exit. Signed-off-by: Ed Santiago <santiago@redhat.com>	2021-05-03 20:15:21 -06:00
Ed Santiago	660a72993c	sdnotify tests: try real hard to kill socat processes podman gating tests are hanging in the new Fedora CI setup; long and tedious investigation suggests that 'socat' processes are being left unkilled, which then causes BATS to hang when it (presumably) runs a final 'wait' in its end cleanup. The two principal changes are to exec socat in a subshell with fd3 closed, and to pkill its child processes before killing the process itself. I don't know if both are needed. The pkill definitely is; the exec may just be superstition. Since I've wasted more than a day of PTO time on this, I'm okay with a little superstition. What I do know is that with these two changes, my reproducer fails to reproduce in over one hour of trying (normally it fails within 5 minutes). AND, update: only rawhide (f35) leaves stray socat processes behind. f33 and ubuntu do not, so 'pkill -P' fails. I really have no idea what's going on. Signed-off-by: Ed Santiago <santiago@redhat.com>	2021-03-11 16:21:51 -07:00
Ed Santiago	1345d0358b	system tests: the catch-up game - run test: minor cleanup to .containerenv test. Basically, make it do only two podman-runs (they're expensive) and tighten up the results checks - ps test: add ps -a --storage. Requires small tweak to run_podman helper, so we can have "timeout" be an expected result - sdnotify test: workaround for #8718 (seeing MAINPID=xxx as last output line instead of READY=1). As found by the newly-added debugging echos, what we are seeing is: MAINPID=103530 READY=1 MAINPID=103530 It's not supposed to be that way; it's supposed to be just the first two. But when faced with reality, we must bend to accommodate it, so let's accept READY=1 anywhere in the output stream, not just as the last line. Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-12-14 15:06:43 -07:00
Ed Santiago	f5b3dc976c	Tests: Fix common flakes, and improve apiv2 test log - apiv2 - the 'ten /info requests' test is flaking often, taking ~8 seconds (our limit is 7, up from 5 a few weeks ago). Brent suggested that the first /info call might be expensive, because it needs to access storage. So, let's prime it by running one /info outside the timing loop. And, because even that continues to fail, bump it up to 10 seconds and file #8076 to track the slowdown. - toolbox test - WaitForReady() has timed out, even on one occasion causing a run failure because it failed 3 times. Solution: bump up timeout from 2s to 5s. Not really great, but CI systems are underpowered, and it's not unreasonable that 2s might be too low. - sdnotify test - add a 'podman wait' between stop & rm. This may prevent a "cannot rm container as it is running" race condition. While working on this, Brent and I noticed a few ways that test-apiv2 logging can be improved: - test name: when request is POST, display the jsonified parameters, not the original input ones. This should make it much easier to reproduce failures. - use curl's "--write-out" option to capture http code, content type, and request time. We were getting the first two via grep from logged headers; this is cleaner. And there was no other way to get timing. We now include the timing as X-Response-Time in the log file. - abort on any curl error, not just 7 (cannot connect). Any error at all from curl is bad news. Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-10-20 11:32:49 -06:00
Ed Santiago	1646da834c	System test additions - run --userns=keep-id: confirm that $HOME gets set (#8013) - inspect: confirm that JSON output is a sane number of lines (10 or more), not an unreadable one-liner (#8011 and #8021). Do so with image, pod, network, volume because the code paths might be different. - cgroups: confirm that 'run' preserves cgroup manager (#7970) - sdnotify: reenable tests, and hope CI doesn't hang. This test was disabled on August 18 because CI jobs were hanging and timing out. My suspicion was that it was #7316, which in turn seems to have hinged on conmon #182. The latter was merged on Sep 16, so let's cross our fingers and see what happens. Also: remove inaccurate warning from a networking test. And, wow, fix is_cgroupsv2(), it has never actually worked. Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-10-14 15:32:02 -06:00
Ed Santiago	a9dbd2b3de	Migrate away from docker.io CI and system tests currently pull some images from docker.io. Eliminate that, by: - building a custom image containing much of what we need for testing; and - copying other needed images to quay.io (Reason: effective 2020-11-01 docker.io will limit the number of image pulls). The principal change is to create a new quay.io/libpod/testimage, using the new test/system/build-testimage script, instead of relying on quay.io/libpod/alpine_labels. We also switch to using a hardcoded :YYYYMMDD tag, instead of :latest, in an attempt to futureproof our CI. This image includes 'httpd' from busybox-extras, which we use in our networking test (previously we had to pull and run busybox from docker.io). The testimage can and should be extended as needed for future tests, e.g. adding test file content or other useful tools. For the '--pull' tests which require actually pulling from the registry, I've created an image with the same name but tagged :00000000 so it will never be pulled by default. Since this image is only used minimally, it's just busybox. Unfortunately there remain two cases we cannot solve in this tiny alpine-based image: 1) docker registry 2) systemd For those, I've (manually) run: podman pull [ docker.io/library/registry:2.7 \| registry.fedoraproject.org/fedora:31 ] podman tag !$ quay.io/... podman push !$ ...and amended the calling tests accordingly. I've tried to make the the smallest reasonable diff, not the smallest possible one. I hope it's a reasonable tradeoff. Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-09-08 06:06:06 -06:00
Ed Santiago	d254fa4c35	system tests: enable more remote tests; cleanup info, images, run, networking tests: remove some skip_if_remote()s that were added in the varlink days. All of these tests now seem to work with APIv2. help test: check that first output line from 'podman --help' is the program description (regression check for #7273). load test: clean up stray images, rewrite test to make it conform to existing convention. In the process, discover and file #7337 exec test (and networking): file #7360, and add FIXME comment to skip()s suggesting evaluating those tests once that is fixed. pod test: now that #6328 is fixed, use 'podman pod inspect --format' instead of relying on jq Various other tests: add an explanation of why test is disabled so we can more easily distinguish "this will never be meaningful under remote" vs "hey, doesn't work for now, but maybe someday". Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-08-19 08:12:14 -06:00
Ed Santiago	18f36d8cf6	Re-disable sdnotify tests to try to fix CI Some CI tests are hanging, timing out in 60 or 120 minutes. I wonder if it's #7316, the bug where all podman commands hang forever if NOTIFY_SOCKET is set? Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-08-18 07:21:47 -06:00
Ed Santiago	60ab5f3ae6	system tests: enable sdnotify tests Oops. PR #6693 (sdnotify) added tests, but they were disabled due to broken crun on f31. I tried for three weeks to get a magic CI:IMG PR to update crun on the CI VMs ... but in that time I forgot to actually enable those new tests. This PR removes a 'skip', replacing it with a check that systemd is running plus one more to make sure our runtime is crun. It looks like sdnotify just doesn't work on Ubuntu (it hangs), and my guess is that it's a crun/runc issue. I also changed the test image from fedora:latest to :31, because, sigh, fedora:latest removed the systemd-notify tool. WARNING WARNING WARNING: the symptom of a missing systemd-notify is that podman will hang forever, not even stopped by the timeout command in podman_run! (Filed: #7316). This means that if the sdnotify-in-container test ever fails, the symptom will be that Cirrus itself will time out (2 hours?). This is horrible. I don't know what to do about it other than push for a fix for 7316. Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-08-13 19:16:25 -06:00
Ed Santiago	10ad46eb73	BATS system tests for new sdnotify Signed-off-by: Ed Santiago <santiago@redhat.com>	2020-07-06 17:47:22 +00:00

27 Commits