podman

mirror of https://github.com/containers/podman.git synced 2025-05-22 17:46:52 +08:00

Author	SHA1	Message	Date
Giuseppe Scrivano	d094a9f18e	podman: fix --sdnotify=healthy with --rm Now WaitForExit returns the exit code as stored in the db instead of returning an error when the container was removed. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-22 21:34:38 +02:00
Giuseppe Scrivano	e166f6bfe0	libpod: wait another interval for healthcheck wait for another interval when the container transitioned to "stopped" to give more time to the healthcheck status to change. Closes: https://github.com/containers/podman/issues/22760 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-22 21:34:34 +02:00
Giuseppe Scrivano	35375e0af8	container_api: do not wait for healtchecks if stopped do not wait for the healthcheck status to change if the container is stopped. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-15 09:34:08 +02:00
Giuseppe Scrivano	b06c58b4a5	libpod: wait for healthy on main thread wait for the healthy status on the thread where the container lock is held. Otherwise, if it is performed from a go routine, a different thread is used (since the runtime.LockOSThread() call doesn't have any effect), causing pthread_mutex_unlock() to fail with EPERM. Closes: https://github.com/containers/podman/issues/22651 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-14 22:55:02 +02:00
Matt Heon	482ef7bfcf	Add support for updating restart policy This is something Docker does, and we did not do until now. Most difficult/annoying part was the REST API, where I did not really want to modify the struct being sent, so I made the new restart policy parameters query parameters instead. Testing was also a bit annoying, because testing restart policy always is. Signed-off-by: Matt Heon <mheon@redhat.com>	2024-04-17 08:23:51 -04:00
Matt Heon	be3f075402	Make `podman update` changes persistent The logic here is more complex than I would like, largely due to the behavior of `podman inspect` for running containers. When a container is running, `podman inspect` will source as much as possible from the OCI spec used to run that container, to grab up-to-date information on things like devices. We don't want to change this, it's definitely the right behavior, but it does make updating a running container inconvenient: we have to rewrite the OCI spec as part of the update to make sure that `podman inspect` will read the correct resource limits. Also, make update emit events. Docker does it, we should as well. Signed-off-by: Matt Heon <mheon@redhat.com>	2024-04-17 08:23:50 -04:00
Giuseppe Scrivano	950f612b56	logging: new mode -l passthrough-tty it works in a similar way to passthrough but it allows to be used also on a TTY. conmon support: https://github.com/containers/conmon/pull/465 Closes: https://github.com/containers/podman/issues/20767 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-02-28 17:23:59 +01:00
Matt Heon	72f1617fac	Bump Go module to v5 Moving from Go module v4 to v5 prepares us for public releases. Move done using gomove [1] as with the v3 and v4 moves. [1] https://github.com/KSubedi/gomove Signed-off-by: Matt Heon <mheon@redhat.com>	2024-02-08 09:35:39 -05:00
Oleksandr Redko	2a2d0b0e18	chore: delete obsolete // +build lines Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2024-01-04 11:53:38 +02:00
Paul Holzinger	bad25da92e	libpod: add !remote tag This should never be pulled into the remote client. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2023-10-24 12:11:34 +02:00
Valentin Rothberg	589867d716	podman: don't restart after kill Also add a new `StoppedByUser` field to the container-inspect state which can be useful during debugging and is now also used in the regression test. Note that I moved the `false` check one test above such that we can compare the previous Podman version which should just be stuck in the `wait $ctr` command since it will continue restarting. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-09-07 15:18:02 +02:00
Valentin Rothberg	0cfd12786f	add "healthy" sdnotify policy Add a new "healthy" sdnotify policy that instructs Podman to send the READY message once the container has turned healthy. Fixes: #6160 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-07-25 11:17:44 +02:00
Erik Sjölund	b5ce0ab2de	Fix language, typos and markdown layout [NO NEW TESTS NEEDED] Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>	2023-07-24 11:18:25 +02:00
Valentin Rothberg	1398cbce8a	container wait: support health states Support two new wait conditions, "healthy" and "unhealthy". This further paves the way for integrating sdnotify with health checks which is currently being tracked in #6160. Fixes: #13627 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-06-23 14:16:32 +02:00
Valentin Rothberg	811867249b	container wait API: use string slice instead of state slice Massage the internal APIs to use a string slice instead of a state slice for passing wait conditions. This paves the way for waiting on non-state conditions such as "healthy". Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-06-23 09:26:30 +02:00
Valentin Rothberg	c0ab293131	container wait: indicate timeout in error When waiting for a container, there may be a time window where conmon has already exited but the container hasn't been fully cleaned up. In that case, we give the container at most 20 seconds to be fully cleaned up. We cannot wait forever since conmon may have been killed or something else went wrong. After the timeout, we optimistically assume the container to be cleaned up and its exit code to present. If no exit code can be found, we return an error. Indicate in the error whether the timeout kicked in to help debug (transient) errors and flakes (e.g., #18860). [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-06-13 13:48:29 +02:00
Valentin Rothberg	1b9272a060	wait: look for exit code in stopped state Make sure to look for the container's exit code when it's in stopped state. With `--restart=always`, the container seems to stay in the stopped state which led the wait logic to loop until the 20 seconds timeout for the cleanup process to have finished kicks in. Also defensively make sure to loop when the container is in stopped state but no exit code has been written yet. Add a regression test to make sure Podman doesn't wait more than 20 seconds. Even on a CI machine under high load I expect it to take much much much less than that, so I do not expect this test to flake in the future. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-05-22 14:53:19 +02:00
Paul Holzinger	95557a532e	libpod: do not Cleanup() more than once If the container was already cleaned up we should not try to do it again. Podman stop will always try to call Cleanup() if you look at the podman event log and just keep calling podman stop --all you see a cleanup event every time. This is not wanted. Also in case of the host pidns we report a error every single time, see the linked issue. Fixes #18460 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2023-05-04 13:53:40 +02:00
Paul Holzinger	edb64f8a76	libpod: stop containers with --restart=always Commit 1ab833fb73 improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user. Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again. A simple reproducer is: ``` podman run --restart=always --name test -d alpine true podman stop test ``` then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently. With this patch it worked on the first try. Fixes #18259 [NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (#17912). Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2023-04-20 11:23:05 +02:00
Doug Rabson	34ff27b813	libpod: avoid nil pointer dereference in (*Container).Cleanup On FreeBSD, c.config.Spec.Linux is not populated - in this case, we can assume that the container is not using a pid namespace. [NO NEW TESTS NEEDED] Signed-off-by: Doug Rabson <dfr@rabson.org>	2023-03-06 11:51:42 +00:00
Erik Sjölund	08e13867a9	Fix typos. Improve language. Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>	2023-02-09 21:56:27 +01:00
Valentin Rothberg	6132c4d548	ps: do not sync container Do not sync containers with the runtime and the database when listing containers. It turns out to be extremely expensive and unnecessary. The sync was needed since listing all containers from the database did not populate their state. Doing that, however, is much faster since we already have a connection to the database. This change makes listing 200 containers 2 times faster than before. [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-01-26 10:04:16 +01:00
OpenShift Merge Robot	74a961a9b8	Merge pull request #17025 from giuseppe/terminate-processes-no-pid-namespace oci: terminate all container processes on cleanup	2023-01-08 06:45:03 -05:00
Giuseppe Scrivano	9fe86ec7f6	oci: terminate all container processes on cleanup if the container has no pid namespace, they are not killed when the container process ends. In this case, attempt to kill them in the same way. The problem was noticed with toolbox where the exec'ed sessions are not terminated when the container is stopped, blocking the system shutdown. [NO NEW TESTS NEEDED] Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2023-01-07 15:00:51 +01:00
Matthew Heon	92cdad0315	Fix a potential defer logic error around locking in several top-level API functions. These are the first line of the function that contains them, which makes sense; we want to capture any error returned by the function. However, making this the first defer means that it is the last thing to run after the function returns - meaning that the container's `defer c.lock.Unlock()` has already fired, leading to a chance we modify the container without holding its lock. We could move the function around so it's no longer the first defer, but then we'd have to call it twice (immediately after `defer c.lock.Unlock()` if the container is not batched, and a second time in a new `else` block right after the lock/sync call to make sure we handle batched containers). Seems simpler to just leave it like this. [NO NEW TESTS NEEDED] Can't really test for DB corruption easily. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2023-01-06 13:12:19 -05:00
Jake Correnti	df02cb51ee	Add container error message to ContainerState This change aims to store an error message to the ContainerState struct with the last known error from the Start, StartAndAttach, and Stop OCI Runtime functions. The goal was to act in accordance with Docker's behavior. Fixes: #13729 Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>	2023-01-03 13:21:24 -05:00
Paul Holzinger	3ac5d10098	export: use io.Writer instead of file This allows use to use STDOUT directly without having to call open again, also this makes the export API endpoint much more performant since it no longer needs to copy to a temp file. I noticed that there was no export API test so I added one. And lastly opening /dev/stdout will not work on windows. Fixes #16870 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2022-12-20 14:38:41 +01:00
Matthew Heon	871172e6fe	Ensure that StartAndAttach locks while sending signals The OCI Runtime's KillContainer interface can modify container state (if the signal fails to send, as it would if the container failed immediately after starting, we will update state to pick up the fact that the container exited). As such, it can edit the DB, and needs to be run locked. There are fortunately only a few places where this function is used, and most of them are already safe. The only exception is StartAndAttach(), which does a SIGWINCH in an unlocked portion of the function. Fortunately it's a goroutine, so just add a lock and defer unlock and it should be fixed. [NO NEW TESTS NEEDED] I have no idea how to induce a scenario that would cause this consistently. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2022-10-27 10:52:55 -04:00
OpenShift Merge Robot	80a6017e2e	Merge pull request #16182 from dfr/freebsd-pidfd libpod: Factor out the call to PidFdOpen from (*Container).WaitForExit	2022-10-17 09:55:43 -04:00
Doug Rabson	51c376c8a1	libpod: Factor out the call to PidFdOpen from (*Container).WaitForExit This allows us to add a simple stub for FreeBSD which returns -1, leading WaitForExit to fall back to the sleep loop approach. [NO NEW TESTS NEEDED] Signed-off-by: Doug Rabson <dfr@rabson.org>	2022-10-14 13:24:32 +01:00
Valentin Rothberg	b35fab6f1c	kill: wait for the container Make sure to wait for the container to exit after kill. While the cleanup process will take care eventually of transitioning the state, we need to give a guarantee to the user to leave the container in the expected state once the (kill) command has finished. The issue could be observed in a flaking test (#16142) where `podman rm -f -t0` failed because the preceding `podman kill` left the container in "running" state which ultimately confused the "stop" backend. Note that we should only wait for the container to exit when SIGKILL is being used. Other signals have different semantics. [NO NEW TESTS NEEDED] as I do not know how to reliably reproduce the issue. If #16142 stops flaking, we are good. Fixes: #16142 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-10-14 13:21:52 +02:00
OpenShift Merge Robot	e5ace19aa2	Merge pull request #16117 from alexlarsson/container-terminal-helper Add and use libpod/Container.Terminal() helper	2022-10-11 16:18:02 -04:00
Alexander Larsson	1038f063e0	Add and use libpod/Container.Terminal() helper This just gets ctr.config.Spec.Process.Terminal with some null checks, allowing several places that open-coded this to use the helper. In particular, this helps the code in pkg/domain/infra/abi/terminal.StartAttachCtr(), that used to do: `ctr.Spec().Process.Terminal`, which looks fine, but actually causes a deep json copy in the `ctr.Spec()` call that takes over 3 msec. [NO NEW TESTS NEEDED] Just minor performance effects Signed-off-by: Alexander Larsson <alexl@redhat.com>	2022-10-11 17:17:11 +02:00
Alexander Larsson	c34b5be990	Avoid unnecessary timeout of 250msec when waiting on container shutdown When you run "podman run foo" we attach to the container, which essentially blocks until the container process exits. When that happens podman immediately calls Container.WaitForExit(), but at this point the exit value has not yet been written to the db by conmon. This means that we almost always hit the "check for exit state; sleep 250msec" loop in WaitForExit(), delaying the exit of podman run by 250 msec. More recent kernels (>= 5.3) supports the pidfd_open() syscall, that lets you open a fd representing a pid and then poll on it to wait until the process exits. We can use this to have the first sleep be exactly as long as is needed for conmon to exit (if we know its pid). If for whatever reason there is still issues we use the old sleep loop on later iterations. This makes "time podman run fedora true" about 200msec faster. [NO NEW TESTS NEEDED] Signed-off-by: Alexander Larsson <alexl@redhat.com>	2022-10-10 11:43:02 +02:00
Chris Evich	d968f3fe09	Replace deprecated ioutil Package `io/ioutil` was deprecated in golang 1.16, preventing podman from building under Fedora 37. Fortunately, functionality identical replacements are provided by the packages `io` and `os`. Replace all usage of all `io/ioutil` symbols with appropriate substitutions according to the golang docs. Signed-off-by: Chris Evich <cevich@redhat.com>	2022-09-20 15:34:27 -04:00
Daniel J Walsh	2c63b8439b	Fix stutters Podman adds an Error: to every error message. So starting an error message with "error" ends up being reported to the user as Error: error ... This patch removes the stutter. Also ioutil.ReadFile errors report the Path, so wrapping the err message with the path causes a stutter. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2022-09-10 07:52:00 -04:00
Charlie Doern	050f3291b9	implement podman update podman update allows users to change the cgroup configuration of an existing container using the already defined resource limits flags from podman create/run. The supported flags in crun are: this command is also now supported in the libpod api via the /libpod/containers/<CID>/update endpoint where the resource limits are passed inthe request body and follow the OCI resource spec format –memory –cpus –cpuset-cpus –cpuset-mems –memory-swap –memory-reservation –cpu-shares –cpu-quota –cpu-period –blkio-weight –cpu-rt-period –cpu-rt-runtime -device-read-bps -device-write-bps -device-read-iops -device-write-iops -memory-swappiness -blkio-weight-device resolves #15067 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2022-09-01 13:02:01 -04:00
Valentin Rothberg	35035d281c	container wait: improve error message Improve the error message when looking up the exit code of a container. The state of the container may help us track down #14859 which flakes rarely and is impossible to reproduce on my machine. [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-07-22 13:57:08 +02:00
Valentin Rothberg	62cdc387de	podman wait: return 0 if container never ran Make sure to return/exit with 0 when waiting for a container that never ran. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-07-11 13:06:40 +02:00
Paul Holzinger	cc6faddfaa	use c/common code for resize and CopyDetachable Since conmon-rs also uses this code we moved it to c/common. Now podman should has this also to prevent duplication. [NO NEW TESTS NEEDED] Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2022-07-06 16:57:07 +02:00
Sascha Grunert	251d91699d	libpod: switch to golang native error wrapping We now use the golang error wrapping format specifier `%w` instead of the deprecated github.com/pkg/errors package. [NO NEW TESTS NEEDED] Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-07-05 16:06:32 +02:00
Valentin Rothberg	b9aa475555	Sync: handle exit file Make sure `Sync()` handles state transitions and exit codes correctly. The function was only being called when batching which could render containers in an unusable state when running concurrently with other state-altering functions/commands since the state must be re-read from the database before acting upon it. Fixes: #14761 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-07-05 12:32:02 +02:00
openshift-ci[bot]	2cc3f127f4	Merge pull request #14720 from sstosh/rm-option Fix: Prevent OCI runtime directory remain	2022-06-29 19:51:53 +00:00
Toshiki Sonoda	3619f0be95	Fix: Prevent OCI runtime directory remain This bug was introduced in https://github.com/containers/podman/pull/8906. When we use 'podman rm/restart/stop/kill etc...' command to the container running with --rm, the OCI runtime directory remains at /run/<runtime name> (root user) or /run/user/<user id>/<runtime name> (rootless user). This bug could cause other bugs. For example, when we checkpoint the container running with --rm (podman checkpoint --export) and restore it (podman restore --import) with crun, error message "Error: OCI runtime error: crun: container `<container id>` already exists" is outputted. This error is caused by an attempt to restore the container with the same container ID as the remaining OCI runtime's container ID. Therefore, I fix that the cleanupRuntime() function runs to remove the OCI runtime directory, even if the container has already been removed by --rm option. Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>	2022-06-24 09:29:24 +09:00
Valentin Rothberg	30e7cbccc1	libpod: fix wait and exit-code logic This commit addresses three intertwined bugs to fix an issue when using Gitlab runner on Podman. The three bug fixes are not split into separate commits as tests won't pass otherwise; avoidable noise when bisecting future issues. 1) Podman conflated states: even when asking to wait for the `exited` state, Podman returned as soon as a container transitioned to `stopped`. The issues surfaced in Gitlab tests to fail [1] as `conmon`'s buffers have not (yet) been emptied when attaching to a container right after a wait. The race window was extremely narrow, and I only managed to reproduce with the Gitlab runner [1] unit tests. 2) The clearer separation between `exited` and `stopped` revealed a race condition predating the changes. If a container is configured for autoremoval (e.g., via `run --rm`), the "run" process competes with the "cleanup" process running in the background. The window of the race condition was sufficiently large that the "cleanup" process has already removed the container and storage before the "run" process could read the exit code and hence waited indefinitely. Address the exit-code race condition by recording exit codes in the main libpod database. Exit codes can now be read from a database. When waiting for a container to exit, Podman first waits for the container to transition to `exited` and will then query the database for its exit code. Outdated exit codes are pruned during cleanup (i.e., non-performance critical) and when refreshing the database after a reboot. An exit code is considered outdated when it is older than 5 minutes. While the race condition predates this change, the waiting process has apparently always been fast enough in catching the exit code due to issue 1): `exited` and `stopped` were conflated. The waiting process hence caught the exit code after the container transitioned to `stopped` but before it `exited` and got removed. 3) With 1) and 2), Podman is now waiting for a container to properly transition to the `exited` state. Some tests did not pass after 1) and 2) which revealed the third bug: `conmon` was executed with its working directory pointing to the OCI runtime bundle of the container. The changed working directory broke resolving relative paths in the "cleanup" process. The "cleanup" process error'ed before actually cleaning up the container and waiting "main" process ran indefinitely - or until hitting a timeout. Fix the issue by executing `conmon` with the same working directory as Podman. Note that fixing 3) may address a number of issues we have seen in the past where for some reason cleanup processes did not fire. [1] https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27119#note_970712864 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com> [MH: Minor reword of commit message] Signed-off-by: Matthew Heon <mheon@redhat.com>	2022-06-23 09:11:57 -04:00
OpenShift Merge Robot	95888735e3	Merge pull request #14384 from mheon/move_attach Move Attach under the OCI Runtime interface	2022-06-02 14:20:25 -04:00
Jhon Honce	8efdbf5c4c	Add API support for NoOverwriteDirNonDir Update method signatures and structs to pass option to buildah code ```release-note NONE ``` [NO NEW TESTS NEEDED] Signed-off-by: Jhon Honce <jhonce@redhat.com>	2022-05-26 16:31:15 -07:00
Matthew Heon	ea1a8e2432	Move Attach under the OCI Runtime interface With conmon-rs on the horizon, we need to disentangle Libpod from legacy Conmon to the greatest extent possible. There are definitely opportunities for codesharing between the two, but we have to assume the implementations will be largely disjoint given the different architectures. Fortunately, most of the work has already been done in the past. The conmon-managed OCI runtime mostly sits behind an interface, with a few exceptions - the most notable of those being attach. This PR thus moves Attach behind the interface, to ensure that we can have attach implementations that don't use our existing unix socket streaming if necessary. Still to-do is conmon cleanup. There's a lot of code that removes Conmon-specific files, or kills the Conmon PID, and all of it will need to be refactored behind the interface. [NO NEW TESTS NEEDED] Just moving some things around. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2022-05-26 14:57:08 -04:00
Matthew Heon	9fcfea7643	First batch of resolutions to FIXMEs Most of these are no longer relevant, just drop the comments. Most notable change: allow `podman kill` on paused containers. Works just fine when I test it. Signed-off-by: Matthew Heon <mheon@redhat.com>	2022-05-25 13:28:04 -04:00
Paul Holzinger	51fbf3da9e	enable gocritic linter The linter ensures a common code style. - use switch/case instead of else if - use if instead of switch/case for single case statement - add space between comment and text - detect the use of defer with os.Exit() - use short form var += "..." instead of var = var + "..." - detect problems with append() ``` newSlice := append(orgSlice, val) ``` This could lead to nasty bugs because the orgSlice will be changed in place if it has enough capacity too hold the new elements. Thus we newSlice might not be a copy. Of course most of the changes are just cosmetic and do not cause any logic errors but I think it is a good idea to enforce a common style. This should help maintainability. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2022-04-26 18:12:22 +02:00

1 2 3 4 5

243 Commits