It seems that if some background tasks are queued in libpod's Runtime before the worker's channel is set up (eg. in the refresh phase), they are not executed later on, but the workerGroup's counter is still ticked up. This leads podman to hang when the imageEngine is shutdown, since it waits for the workerGroup to be done.
fixescontainers/podman#22984
Signed-off-by: Farya Maerten <me@ltow.me>
Simplify the work-queue implementation by using a wait group. Once all
queued work items are done, the channel can be closed.
The system tests revealed a flake (i.e., #14351) which indicated that
the service container does not always get stopped which suggests a race
condition when queuing items. Those items are queued in a goroutine to
prevent potential dead locks if the queue ever filled up too quickly.
The race condition in question is that if a work item queues another,
the goroutine for queuing may not be scheduled fast enough and the
runtime shuts down; it seems to happen fairly easily on the slow CI
machines. The wait group fixes this race and allows for simplifying
the code.
Also increase the queue's buffer size to 10 to make things slightly
faster.
[NO NEW TESTS NEEDED] as we are fixing a flake.
Fixes: #14351
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Add the notion of an "exit policy" to a pod. This policy controls the
behaviour when the last container of pod exits. Initially, there are
two policies:
- "continue" : the pod continues running. This is the default policy
when creating a pod.
- "stop" : stop the pod when the last container exits. This is the
default behaviour for `play kube`.
In order to implement the deferred stop of a pod, add a worker queue to
the libpod runtime. The queue will pick up work items and in this case
helps resolve dead locks that would otherwise occur if we attempted to
stop a pod during container cleanup.
Note that the default restart policy of `play kube` is "Always". Hence,
in order to really solve #13464, the YAML files must set a custom
restart policy; the tests use "OnFailure".
Fixes: #13464
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>