podman

mirror of https://github.com/containers/podman.git synced 2025-05-17 06:59:07 +08:00

Author	SHA1	Message	Date
Jan Kaluza	701aade262	Add --env and --unsetenv to podman update. The --env is used to add new environment variable to container or override the existing one. The --unsetenv is used to remove the environment variable. It is done by sharing "env" and "unsetenv" flags between both "update" and "create" commands and later handling these flags in the "update" command handler. The list of environment variables to add/remove is stored in newly added variables in the ContainerUpdateOptions. The Container.Update API call is refactored to take the ContainerUpdateOptions as an input to limit the number of its arguments. The Env and UnsetEnv lists are later handled using the envLib package and the Container is updated. The remote API is also extended to handle Env and EnvUnset. Fixes: #24875 Signed-off-by: Jan Kaluza <jkaluza@redhat.com>	2025-03-21 13:15:44 +01:00
Paul Holzinger	a1008a1294	libpod: add missing return in WaitForConditionWithInterval() AS pointed out by Valentin on #25491, it is not an actual bug but this is makes it more clear how it works and should not confuse readers why this case has no return. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2025-03-13 14:10:33 +01:00
Yuri Timenkov	eed5f9ee4a	libpod: race in WaitForConditionWithInterval() There are multiple concurrent goroutinces which produce result and they race agains each other, while producing different results. This commit addresses at least a part of the problem - producing different results for competing "sources". Fixes: #25479 Signed-off-by: Yuri Timenkov <yuri@timenkov.pro>	2025-03-06 12:40:43 +00:00
Yuri Timenkov	d0efd0e278	libpod: race in WaitForExit() with autoremove When waiting for container to be not-running, sometimes wait retuns code -1 with an empty error instead of actual exit code. It turned out that syncContainer returns ErrCtrRemoved for a removed container instead of ErrNoSuchCtr, while data can still be pulled from the database. This fixes the issue by taking into account both codes. Fixes: #25479 Signed-off-by: Yuri Timenkov <yuri@timenkov.pro>	2025-03-06 12:37:47 +00:00
Matt Heon	46d874aa52	Refactor graph traversal & use for pod stop First, refactor our existing graph traversal code to improve code sharing. There still isn't much sharing between inward traversal (stop, remove) and outward traversal (start) but stop and remove are sharing most of their code, which seems a positive. Second, add a new graph-traversal function to stop containers. We already had start and remove; stop uses the newly-refactored inward-traversal code which it shares with removal. Third, rework the shared stop/removal inward-traversal code to add locking. This allows parallel execution of stop and removal, which should improve the performance of `podman pod rm` and retain the performance of `podman pod stop` at about what it is right now. Fourth and finally, use the new graph-based stop when possible to solve unordered stop problems with pods - specifically, the infra container stopping before application containers, leaving those containers without a working network. Fixes https://issues.redhat.com/browse/RHEL-76827 Signed-off-by: Matt Heon <mheon@redhat.com>	2025-02-06 18:28:12 -05:00
Matt Heon	06fa617f61	Lock pod while starting and stopping containers The intention behind this is to stop races between `pod stop\|start` and `container stop\|start` being run at the same time. This could result in containers with no working network (they join the still-running infra container's netns, which is then torn down as the infra container is stopped, leaving the container in an otherwise unused, nonfunctional, orphan netns. Locking the pod (if present) in the public container start and stop APIs should be sufficient to stop this. Signed-off-by: Matt Heon <mheon@redhat.com>	2025-02-03 11:19:20 -05:00
Jan Rodák	a1249425bd	Configure HealthCheck with `podman update` New flags in a `podman update` can change the configuration of HealthCheck when the container is started, without having to restart or recreate the container. This can help determine why a given container suddenly started failing HealthCheck without interfering with the services it provides. For example, reconfigure HealthCheck to keep logs longer than the usual last X results, store logs to other destinations, etc. Fixes: https://issues.redhat.com/browse/RHEL-60561 Signed-off-by: Jan Rodák <hony.com@seznam.cz>	2024-11-19 19:44:14 +01:00
Maximilian Hueter	314dece926	add default polling interval to Container.Wait Signed-off-by: Maximilian Hueter <maximilian.hueter@icloud.com>	2024-10-30 20:00:52 +01:00
Paul Holzinger	fbed3a01d2	wait: fix handling of multiple conditions with exited As it turns on things are not so simple after all... In podman-py it was reported[1] that waiting might hang, per our docs wait on multiple conditions should exit once the first one is hit and not all of them. However because the new wait logic never checked if the context was cancelled the goroutine kept running until conmon exited and because we used a waitgroup to wait for all of them to finish it blocked until that happened. First we can remove the waitgroup as we only need to wait for one of them anyway via the channel. While this alone fixes the hang it would still leak the other goroutine. As there is no way to cancel a goroutine all the code must check for a cancelled context in the wait loop to no leak. Fixes 8a943311db ("libpod: simplify WaitForExit()") [1] https://github.com/containers/podman-py/issues/425 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-09-17 17:35:17 +02:00
Paul Holzinger	a89fef6e2a	cleanup: add new --stopped-only option The podman container cleanup process runs asynchronous and by the time it gets the lock it is possible another podman process already did the cleanup and then did a new init() to start it again. If the cleanup process gets the lock there it will cause very weird things. This can be observed in the remote start API as CI flakes. Fixes #23754 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-27 15:01:23 +02:00
Paul Holzinger	bf74797c69	fix races in the HTTP attach API This is very similar to commit 3280da0500, we cannot check the state then unlock to then lock again and do the action. Everything must happen under one lock. To fix this move the code into the HTTPAttach function in libpod. The locking here is a bit weird because attach blocks for the lifetime of attach which can be very long so we must unlock before performing the attach. Fixes #23757 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-27 15:00:08 +02:00
Paul Holzinger	80639df27a	podman wait: allow waiting for removal of containers By default wait only waits for the exit of a container, there is really no way to make it wait for the removal too when the container was created with --rm. I though I found a clever way in 8a943311db but this is not working race free. While it works most of the time any other parallel process might call syncContainer() before the cleanup process holds the lock until it removes it. As such the wait hack to only update the state and not sync the exit file did not work so we can drop that. However the test wants to wait for the removal to happen by the cleanup process and we can already say --condition=removing to do this but this will throw an error if the ctr was removed instead of counting this as success so fix that as well. Fixes #23640 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-16 15:44:02 +02:00
Paul Holzinger	8a943311db	libpod: simplify WaitForExit() The current code did several complicated state checks that simply do not work properly on a fast restarting container. It uses a special case for --restart=always but forgot to take care of --restart=on-failure which always hang for 20s until it run into the timeout. The old logic also used to call CheckConmonRunning() but synced the state before which means it may check a new conmon every time and thus misses exits. To fix the new the code is much simpler. Check the conmon pid, if it is no longer running then get then check exit file and get exit code. This is related to #23473 but I am not sure if this fixes it because we cannot reproduce. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-15 11:07:27 +02:00
Paul Holzinger	78cb1e28cb	libpod: do not save expected stop errors in ctr state If we try to stop a contianer that is not running or paused we get an ErrCtrStateInvalid or ErrCtrStopped error. As podman stop is idempotent this is not a user visable error at all so we should also never log it in the container state. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-12 12:09:01 +02:00
Paul Holzinger	f276d53532	libpod: fix broken saveContainerError() We cannot unlock then lock again without syncing the state as this will then save a potentially old state causing very bad things, such as double netns cleanup issues. The fix here is simple move the saveContainerError() under the same lock. The comment about the re-lock is just wrong. Not doing this under the same lock would cause us to update the error after something else changed the container alreayd. Most likely this was caused by a misunderstanding on how go defer's work. Given they run Last In - First Out (LIFO) it is safe as long as out defer function is after the defer unlock() call. I think this issue is very bad and might have caused a variety of other weird flakes. As fact I am confident that this fixes the double cleanup errors. Fixes #21569 Also fixes the netns removal ENOENT issues seen in #19721. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-12 11:19:47 +02:00
Daniel J Walsh	7768cf235e	Run codespell on source Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2024-07-23 07:28:23 -04:00
Paul Holzinger	3280da0500	fix race conditions in start/attach logic The current code did something like this: lock() getState() unlock() if state != running lock() getState() == running -> error unlock() This of course is wrong because between the first unlock() and second lock() call another process could have modified the state. This meant that sometimes you would get a weird error on start because the internal setup errored as the container was already running. In general any state check without holding the lock is incorrect and will result in race conditions. As such refactor the code to combine both StartAndAttach and Attach() into one function that can handle both. With that we can move the running check into the locked code. Also use typed error for this specific error case then the callers can check and ignore the specific error when needed. This also allows us to fix races in the compat API that did a similar racy state check. This commit changes slightly how we output the result, previously a start on already running container would never print the id/name of the container which is confusing and sort of breaks idempotence. Now it will include the output except when --all is used. Then it only reports the ids that were actually started. Fixes #23246 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-12 15:11:34 +02:00
Giuseppe Scrivano	d094a9f18e	podman: fix --sdnotify=healthy with --rm Now WaitForExit returns the exit code as stored in the db instead of returning an error when the container was removed. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-22 21:34:38 +02:00
Giuseppe Scrivano	e166f6bfe0	libpod: wait another interval for healthcheck wait for another interval when the container transitioned to "stopped" to give more time to the healthcheck status to change. Closes: https://github.com/containers/podman/issues/22760 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-22 21:34:34 +02:00
Giuseppe Scrivano	35375e0af8	container_api: do not wait for healtchecks if stopped do not wait for the healthcheck status to change if the container is stopped. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-15 09:34:08 +02:00
Giuseppe Scrivano	b06c58b4a5	libpod: wait for healthy on main thread wait for the healthy status on the thread where the container lock is held. Otherwise, if it is performed from a go routine, a different thread is used (since the runtime.LockOSThread() call doesn't have any effect), causing pthread_mutex_unlock() to fail with EPERM. Closes: https://github.com/containers/podman/issues/22651 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-14 22:55:02 +02:00
Matt Heon	482ef7bfcf	Add support for updating restart policy This is something Docker does, and we did not do until now. Most difficult/annoying part was the REST API, where I did not really want to modify the struct being sent, so I made the new restart policy parameters query parameters instead. Testing was also a bit annoying, because testing restart policy always is. Signed-off-by: Matt Heon <mheon@redhat.com>	2024-04-17 08:23:51 -04:00
Matt Heon	be3f075402	Make `podman update` changes persistent The logic here is more complex than I would like, largely due to the behavior of `podman inspect` for running containers. When a container is running, `podman inspect` will source as much as possible from the OCI spec used to run that container, to grab up-to-date information on things like devices. We don't want to change this, it's definitely the right behavior, but it does make updating a running container inconvenient: we have to rewrite the OCI spec as part of the update to make sure that `podman inspect` will read the correct resource limits. Also, make update emit events. Docker does it, we should as well. Signed-off-by: Matt Heon <mheon@redhat.com>	2024-04-17 08:23:50 -04:00
Giuseppe Scrivano	950f612b56	logging: new mode -l passthrough-tty it works in a similar way to passthrough but it allows to be used also on a TTY. conmon support: https://github.com/containers/conmon/pull/465 Closes: https://github.com/containers/podman/issues/20767 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-02-28 17:23:59 +01:00
Matt Heon	72f1617fac	Bump Go module to v5 Moving from Go module v4 to v5 prepares us for public releases. Move done using gomove [1] as with the v3 and v4 moves. [1] https://github.com/KSubedi/gomove Signed-off-by: Matt Heon <mheon@redhat.com>	2024-02-08 09:35:39 -05:00
Oleksandr Redko	2a2d0b0e18	chore: delete obsolete // +build lines Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2024-01-04 11:53:38 +02:00
Paul Holzinger	bad25da92e	libpod: add !remote tag This should never be pulled into the remote client. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2023-10-24 12:11:34 +02:00
Valentin Rothberg	589867d716	podman: don't restart after kill Also add a new `StoppedByUser` field to the container-inspect state which can be useful during debugging and is now also used in the regression test. Note that I moved the `false` check one test above such that we can compare the previous Podman version which should just be stuck in the `wait $ctr` command since it will continue restarting. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-09-07 15:18:02 +02:00
Valentin Rothberg	0cfd12786f	add "healthy" sdnotify policy Add a new "healthy" sdnotify policy that instructs Podman to send the READY message once the container has turned healthy. Fixes: #6160 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-07-25 11:17:44 +02:00
Erik Sjölund	b5ce0ab2de	Fix language, typos and markdown layout [NO NEW TESTS NEEDED] Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>	2023-07-24 11:18:25 +02:00
Valentin Rothberg	1398cbce8a	container wait: support health states Support two new wait conditions, "healthy" and "unhealthy". This further paves the way for integrating sdnotify with health checks which is currently being tracked in #6160. Fixes: #13627 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-06-23 14:16:32 +02:00
Valentin Rothberg	811867249b	container wait API: use string slice instead of state slice Massage the internal APIs to use a string slice instead of a state slice for passing wait conditions. This paves the way for waiting on non-state conditions such as "healthy". Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-06-23 09:26:30 +02:00
Valentin Rothberg	c0ab293131	container wait: indicate timeout in error When waiting for a container, there may be a time window where conmon has already exited but the container hasn't been fully cleaned up. In that case, we give the container at most 20 seconds to be fully cleaned up. We cannot wait forever since conmon may have been killed or something else went wrong. After the timeout, we optimistically assume the container to be cleaned up and its exit code to present. If no exit code can be found, we return an error. Indicate in the error whether the timeout kicked in to help debug (transient) errors and flakes (e.g., #18860). [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-06-13 13:48:29 +02:00
Valentin Rothberg	1b9272a060	wait: look for exit code in stopped state Make sure to look for the container's exit code when it's in stopped state. With `--restart=always`, the container seems to stay in the stopped state which led the wait logic to loop until the 20 seconds timeout for the cleanup process to have finished kicks in. Also defensively make sure to loop when the container is in stopped state but no exit code has been written yet. Add a regression test to make sure Podman doesn't wait more than 20 seconds. Even on a CI machine under high load I expect it to take much much much less than that, so I do not expect this test to flake in the future. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-05-22 14:53:19 +02:00
Paul Holzinger	95557a532e	libpod: do not Cleanup() more than once If the container was already cleaned up we should not try to do it again. Podman stop will always try to call Cleanup() if you look at the podman event log and just keep calling podman stop --all you see a cleanup event every time. This is not wanted. Also in case of the host pidns we report a error every single time, see the linked issue. Fixes #18460 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2023-05-04 13:53:40 +02:00
Paul Holzinger	edb64f8a76	libpod: stop containers with --restart=always Commit 1ab833fb73 improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user. Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again. A simple reproducer is: ``` podman run --restart=always --name test -d alpine true podman stop test ``` then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently. With this patch it worked on the first try. Fixes #18259 [NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (#17912). Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2023-04-20 11:23:05 +02:00
Doug Rabson	34ff27b813	libpod: avoid nil pointer dereference in (*Container).Cleanup On FreeBSD, c.config.Spec.Linux is not populated - in this case, we can assume that the container is not using a pid namespace. [NO NEW TESTS NEEDED] Signed-off-by: Doug Rabson <dfr@rabson.org>	2023-03-06 11:51:42 +00:00
Erik Sjölund	08e13867a9	Fix typos. Improve language. Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>	2023-02-09 21:56:27 +01:00
Valentin Rothberg	6132c4d548	ps: do not sync container Do not sync containers with the runtime and the database when listing containers. It turns out to be extremely expensive and unnecessary. The sync was needed since listing all containers from the database did not populate their state. Doing that, however, is much faster since we already have a connection to the database. This change makes listing 200 containers 2 times faster than before. [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-01-26 10:04:16 +01:00
OpenShift Merge Robot	74a961a9b8	Merge pull request #17025 from giuseppe/terminate-processes-no-pid-namespace oci: terminate all container processes on cleanup	2023-01-08 06:45:03 -05:00
Giuseppe Scrivano	9fe86ec7f6	oci: terminate all container processes on cleanup if the container has no pid namespace, they are not killed when the container process ends. In this case, attempt to kill them in the same way. The problem was noticed with toolbox where the exec'ed sessions are not terminated when the container is stopped, blocking the system shutdown. [NO NEW TESTS NEEDED] Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2023-01-07 15:00:51 +01:00
Matthew Heon	92cdad0315	Fix a potential defer logic error around locking in several top-level API functions. These are the first line of the function that contains them, which makes sense; we want to capture any error returned by the function. However, making this the first defer means that it is the last thing to run after the function returns - meaning that the container's `defer c.lock.Unlock()` has already fired, leading to a chance we modify the container without holding its lock. We could move the function around so it's no longer the first defer, but then we'd have to call it twice (immediately after `defer c.lock.Unlock()` if the container is not batched, and a second time in a new `else` block right after the lock/sync call to make sure we handle batched containers). Seems simpler to just leave it like this. [NO NEW TESTS NEEDED] Can't really test for DB corruption easily. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2023-01-06 13:12:19 -05:00
Jake Correnti	df02cb51ee	Add container error message to ContainerState This change aims to store an error message to the ContainerState struct with the last known error from the Start, StartAndAttach, and Stop OCI Runtime functions. The goal was to act in accordance with Docker's behavior. Fixes: #13729 Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>	2023-01-03 13:21:24 -05:00
Paul Holzinger	3ac5d10098	export: use io.Writer instead of file This allows use to use STDOUT directly without having to call open again, also this makes the export API endpoint much more performant since it no longer needs to copy to a temp file. I noticed that there was no export API test so I added one. And lastly opening /dev/stdout will not work on windows. Fixes #16870 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2022-12-20 14:38:41 +01:00
Matthew Heon	871172e6fe	Ensure that StartAndAttach locks while sending signals The OCI Runtime's KillContainer interface can modify container state (if the signal fails to send, as it would if the container failed immediately after starting, we will update state to pick up the fact that the container exited). As such, it can edit the DB, and needs to be run locked. There are fortunately only a few places where this function is used, and most of them are already safe. The only exception is StartAndAttach(), which does a SIGWINCH in an unlocked portion of the function. Fortunately it's a goroutine, so just add a lock and defer unlock and it should be fixed. [NO NEW TESTS NEEDED] I have no idea how to induce a scenario that would cause this consistently. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2022-10-27 10:52:55 -04:00
OpenShift Merge Robot	80a6017e2e	Merge pull request #16182 from dfr/freebsd-pidfd libpod: Factor out the call to PidFdOpen from (*Container).WaitForExit	2022-10-17 09:55:43 -04:00
Doug Rabson	51c376c8a1	libpod: Factor out the call to PidFdOpen from (*Container).WaitForExit This allows us to add a simple stub for FreeBSD which returns -1, leading WaitForExit to fall back to the sleep loop approach. [NO NEW TESTS NEEDED] Signed-off-by: Doug Rabson <dfr@rabson.org>	2022-10-14 13:24:32 +01:00
Valentin Rothberg	b35fab6f1c	kill: wait for the container Make sure to wait for the container to exit after kill. While the cleanup process will take care eventually of transitioning the state, we need to give a guarantee to the user to leave the container in the expected state once the (kill) command has finished. The issue could be observed in a flaking test (#16142) where `podman rm -f -t0` failed because the preceding `podman kill` left the container in "running" state which ultimately confused the "stop" backend. Note that we should only wait for the container to exit when SIGKILL is being used. Other signals have different semantics. [NO NEW TESTS NEEDED] as I do not know how to reliably reproduce the issue. If #16142 stops flaking, we are good. Fixes: #16142 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-10-14 13:21:52 +02:00
OpenShift Merge Robot	e5ace19aa2	Merge pull request #16117 from alexlarsson/container-terminal-helper Add and use libpod/Container.Terminal() helper	2022-10-11 16:18:02 -04:00
Alexander Larsson	1038f063e0	Add and use libpod/Container.Terminal() helper This just gets ctr.config.Spec.Process.Terminal with some null checks, allowing several places that open-coded this to use the helper. In particular, this helps the code in pkg/domain/infra/abi/terminal.StartAttachCtr(), that used to do: `ctr.Spec().Process.Terminal`, which looks fine, but actually causes a deep json copy in the `ctr.Spec()` call that takes over 3 msec. [NO NEW TESTS NEEDED] Just minor performance effects Signed-off-by: Alexander Larsson <alexl@redhat.com>	2022-10-11 17:17:11 +02:00

1 2 3 4 5 ...

260 Commits