if the container has no pid namespace, they are not killed when the
container process ends. In this case, attempt to kill them in the
same way.
The problem was noticed with toolbox where the exec'ed sessions are
not terminated when the container is stopped, blocking the system
shutdown.
[NO NEW TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
in several top-level API functions. These are the first line of
the function that contains them, which makes sense; we want to
capture any error returned by the function. However, making this
the first defer means that it is the last thing to run after the
function returns - meaning that the container's
`defer c.lock.Unlock()` has already fired, leading to a chance we
modify the container without holding its lock.
We could move the function around so it's no longer the first
defer, but then we'd have to call it twice (immediately after
`defer c.lock.Unlock()` if the container is not batched, and a
second time in a new `else` block right after the lock/sync call
to make sure we handle batched containers). Seems simpler to just
leave it like this.
[NO NEW TESTS NEEDED] Can't really test for DB corruption easily.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This change aims to store an error message to the ContainerState struct
with the last known error from the Start, StartAndAttach, and Stop OCI
Runtime functions.
The goal was to act in accordance with Docker's behavior.
Fixes: #13729
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
This allows use to use STDOUT directly without having to call open
again, also this makes the export API endpoint much more performant
since it no longer needs to copy to a temp file.
I noticed that there was no export API test so I added one.
And lastly opening /dev/stdout will not work on windows.
Fixes#16870
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The OCI Runtime's KillContainer interface can modify container
state (if the signal fails to send, as it would if the container
failed immediately after starting, we will update state to pick
up the fact that the container exited). As such, it can edit the
DB, and needs to be run locked.
There are fortunately only a few places where this function is
used, and most of them are already safe. The only exception is
StartAndAttach(), which does a SIGWINCH in an unlocked portion of
the function. Fortunately it's a goroutine, so just add a lock
and defer unlock and it should be fixed.
[NO NEW TESTS NEEDED] I have no idea how to induce a scenario
that would cause this consistently.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This allows us to add a simple stub for FreeBSD which returns -1,
leading WaitForExit to fall back to the sleep loop approach.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Make sure to wait for the container to exit after kill. While the
cleanup process will take care eventually of transitioning the state, we
need to give a guarantee to the user to leave the container in the
expected state once the (kill) command has finished.
The issue could be observed in a flaking test (#16142) where
`podman rm -f -t0` failed because the preceding `podman kill`
left the container in "running" state which ultimately confused
the "stop" backend.
Note that we should only wait for the container to exit when SIGKILL is
being used. Other signals have different semantics.
[NO NEW TESTS NEEDED] as I do not know how to reliably reproduce the
issue. If #16142 stops flaking, we are good.
Fixes: #16142
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This just gets ctr.config.Spec.Process.Terminal with some null checks,
allowing several places that open-coded this to use the helper.
In particular, this helps the code in
pkg/domain/infra/abi/terminal.StartAttachCtr(), that used to do:
`ctr.Spec().Process.Terminal`, which looks fine, but actually causes
a deep json copy in the `ctr.Spec()` call that takes over 3 msec.
[NO NEW TESTS NEEDED] Just minor performance effects
Signed-off-by: Alexander Larsson <alexl@redhat.com>
When you run "podman run foo" we attach to the container, which essentially
blocks until the container process exits. When that happens podman immediately
calls Container.WaitForExit(), but at this point the exit value has not
yet been written to the db by conmon. This means that we almost always
hit the "check for exit state; sleep 250msec" loop in WaitForExit(),
delaying the exit of podman run by 250 msec.
More recent kernels (>= 5.3) supports the pidfd_open() syscall, that
lets you open a fd representing a pid and then poll on it to wait
until the process exits. We can use this to have the first sleep
be exactly as long as is needed for conmon to exit (if we know its pid).
If for whatever reason there is still issues we use the old sleep loop
on later iterations.
This makes "time podman run fedora true" about 200msec faster.
[NO NEW TESTS NEEDED]
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Package `io/ioutil` was deprecated in golang 1.16, preventing podman from
building under Fedora 37. Fortunately, functionality identical
replacements are provided by the packages `io` and `os`. Replace all
usage of all `io/ioutil` symbols with appropriate substitutions
according to the golang docs.
Signed-off-by: Chris Evich <cevich@redhat.com>
Podman adds an Error: to every error message. So starting an error
message with "error" ends up being reported to the user as
Error: error ...
This patch removes the stutter.
Also ioutil.ReadFile errors report the Path, so wrapping the err message
with the path causes a stutter.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
podman update allows users to change the cgroup configuration of an existing container using the already defined resource limits flags
from podman create/run. The supported flags in crun are:
this command is also now supported in the libpod api via the /libpod/containers/<CID>/update endpoint where
the resource limits are passed inthe request body and follow the OCI resource spec format
–memory
–cpus
–cpuset-cpus
–cpuset-mems
–memory-swap
–memory-reservation
–cpu-shares
–cpu-quota
–cpu-period
–blkio-weight
–cpu-rt-period
–cpu-rt-runtime
-device-read-bps
-device-write-bps
-device-read-iops
-device-write-iops
-memory-swappiness
-blkio-weight-device
resolves#15067
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Improve the error message when looking up the exit code of a container.
The state of the container may help us track down #14859 which flakes
rarely and is impossible to reproduce on my machine.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Since conmon-rs also uses this code we moved it to c/common. Now podman
should has this also to prevent duplication.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Make sure `Sync()` handles state transitions and exit codes correctly.
The function was only being called when batching which could render
containers in an unusable state when running concurrently with other
state-altering functions/commands since the state must be re-read from
the database before acting upon it.
Fixes: #14761
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This bug was introduced in https://github.com/containers/podman/pull/8906.
When we use 'podman rm/restart/stop/kill etc...' command to
the container running with --rm, the OCI runtime directory
remains at /run/<runtime name> (root user) or
/run/user/<user id>/<runtime name> (rootless user).
This bug could cause other bugs.
For example, when we checkpoint the container running with
--rm (podman checkpoint --export) and restore it
(podman restore --import) with crun, error message
"Error: OCI runtime error: crun: container `<container id>`
already exists" is outputted.
This error is caused by an attempt to restore the container with
the same container ID as the remaining OCI runtime's container ID.
Therefore, I fix that the cleanupRuntime() function runs to
remove the OCI runtime directory,
even if the container has already been removed by --rm option.
Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
This commit addresses three intertwined bugs to fix an issue when using
Gitlab runner on Podman. The three bug fixes are not split into
separate commits as tests won't pass otherwise; avoidable noise when
bisecting future issues.
1) Podman conflated states: even when asking to wait for the `exited`
state, Podman returned as soon as a container transitioned to
`stopped`. The issues surfaced in Gitlab tests to fail [1] as
`conmon`'s buffers have not (yet) been emptied when attaching to a
container right after a wait. The race window was extremely narrow,
and I only managed to reproduce with the Gitlab runner [1] unit
tests.
2) The clearer separation between `exited` and `stopped` revealed a race
condition predating the changes. If a container is configured for
autoremoval (e.g., via `run --rm`), the "run" process competes with
the "cleanup" process running in the background. The window of the
race condition was sufficiently large that the "cleanup" process has
already removed the container and storage before the "run" process
could read the exit code and hence waited indefinitely.
Address the exit-code race condition by recording exit codes in the
main libpod database. Exit codes can now be read from a database.
When waiting for a container to exit, Podman first waits for the
container to transition to `exited` and will then query the database
for its exit code. Outdated exit codes are pruned during cleanup
(i.e., non-performance critical) and when refreshing the database
after a reboot. An exit code is considered outdated when it is older
than 5 minutes.
While the race condition predates this change, the waiting process
has apparently always been fast enough in catching the exit code due
to issue 1): `exited` and `stopped` were conflated. The waiting
process hence caught the exit code after the container transitioned
to `stopped` but before it `exited` and got removed.
3) With 1) and 2), Podman is now waiting for a container to properly
transition to the `exited` state. Some tests did not pass after 1)
and 2) which revealed the third bug: `conmon` was executed with its
working directory pointing to the OCI runtime bundle of the
container. The changed working directory broke resolving relative
paths in the "cleanup" process. The "cleanup" process error'ed
before actually cleaning up the container and waiting "main" process
ran indefinitely - or until hitting a timeout. Fix the issue by
executing `conmon` with the same working directory as Podman.
Note that fixing 3) *may* address a number of issues we have seen in the
past where for *some* reason cleanup processes did not fire.
[1] https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27119#note_970712864
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
[MH: Minor reword of commit message]
Signed-off-by: Matthew Heon <mheon@redhat.com>
With conmon-rs on the horizon, we need to disentangle Libpod from
legacy Conmon to the greatest extent possible. There are
definitely opportunities for codesharing between the two, but we
have to assume the implementations will be largely disjoint given
the different architectures.
Fortunately, most of the work has already been done in the past.
The conmon-managed OCI runtime mostly sits behind an interface,
with a few exceptions - the most notable of those being attach.
This PR thus moves Attach behind the interface, to ensure that we
can have attach implementations that don't use our existing unix
socket streaming if necessary.
Still to-do is conmon cleanup. There's a lot of code that removes
Conmon-specific files, or kills the Conmon PID, and all of it
will need to be refactored behind the interface.
[NO NEW TESTS NEEDED] Just moving some things around.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Most of these are no longer relevant, just drop the comments.
Most notable change: allow `podman kill` on paused containers.
Works just fine when I test it.
Signed-off-by: Matthew Heon <mheon@redhat.com>
The linter ensures a common code style.
- use switch/case instead of else if
- use if instead of switch/case for single case statement
- add space between comment and text
- detect the use of defer with os.Exit()
- use short form var += "..." instead of var = var + "..."
- detect problems with append()
```
newSlice := append(orgSlice, val)
```
This could lead to nasty bugs because the orgSlice will be changed in
place if it has enough capacity too hold the new elements. Thus we
newSlice might not be a copy.
Of course most of the changes are just cosmetic and do not cause any
logic errors but I think it is a good idea to enforce a common style.
This should help maintainability.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This is an enhancement proposal for the checkpoint / restore feature of
Podman that enables container migration across multiple systems with
standard image distribution infrastructure.
A new option `--create-image <image>` has been added to the
`podman container checkpoint` command. This option tells Podman to
create a container image. This is a standard image with a single layer,
tar archive, that that contains all checkpoint files. This is similar to
the current approach with checkpoint `--export`/`--import`.
This image can be pushed to a container registry and pulled on a
different system. It can also be exported locally with `podman image
save` and inspected with `podman inspect`. Inspecting the image would
display additional information about the host and the versions of
Podman, criu, crun/runc, kernel, etc.
`podman container restore` has also been extended to support image
name or ID as input.
Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
CRIU supports checkpoint/restore of file locks. This feature is
required to checkpoint/restore containers running applications
such as MySQL.
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
This adds the parameter '--print-stats' to 'podman container restore'.
With '--print-stats' Podman will measure how long Podman itself, the OCI
runtime and CRIU requires to restore a checkpoint and print out these
information. CRIU already creates process restore statistics which are
just read in addition to the added measurements. In contrast to just
printing out the ID of the restored container, Podman will now print
out JSON:
# podman container restore --latest --print-stats
{
"podman_restore_duration": 305871,
"container_statistics": [
{
"Id": "47b02e1d474b5d5fe917825e91ac653efa757c91e5a81a368d771a78f6b5ed20",
"runtime_restore_duration": 140614,
"criu_statistics": {
"forking_time": 5,
"restore_time": 67672,
"pages_restored": 14
}
}
]
}
The output contains 'podman_restore_duration' which contains the
number of microseconds Podman required to restore the checkpoint. The
output also includes 'runtime_restore_duration' which is the time
the runtime needed to restore that specific container. Each container
also includes 'criu_statistics' which displays the timing information
collected by CRIU.
Signed-off-by: Adrian Reber <areber@redhat.com>
This adds the parameter '--print-stats' to 'podman container checkpoint'.
With '--print-stats' Podman will measure how long Podman itself, the OCI
runtime and CRIU requires to create a checkpoint and print out these
information. CRIU already creates checkpointing statistics which are
just read in addition to the added measurements. In contrast to just
printing out the ID of the checkpointed container, Podman will now print
out JSON:
# podman container checkpoint --latest --print-stats
{
"podman_checkpoint_duration": 360749,
"container_statistics": [
{
"Id": "25244244bf2efbef30fb6857ddea8cb2e5489f07eb6659e20dda117f0c466808",
"runtime_checkpoint_duration": 177222,
"criu_statistics": {
"freezing_time": 100657,
"frozen_time": 60700,
"memdump_time": 8162,
"memwrite_time": 4224,
"pages_scanned": 20561,
"pages_written": 2129
}
}
]
}
The output contains 'podman_checkpoint_duration' which contains the
number of microseconds Podman required to create the checkpoint. The
output also includes 'runtime_checkpoint_duration' which is the time
the runtime needed to checkpoint that specific container. Each container
also includes 'criu_statistics' which displays the timing information
collected by CRIU.
Signed-off-by: Adrian Reber <areber@redhat.com>
The backend for `ps --sync` has been nonfunctional for a long
while now - probably since v2.0. It's questionable how useful the
flag is in modern Podman (the original case it was intended to
catch, Conmon gone via SIGKILL, should be handled now via pinging
the process with a signal to ensure it's still alive) but having
the ability to force a refresh of container state from the OCI
runtime is still useful.
Signed-off-by: Matthew Heon <mheon@redhat.com>
This allows you to stop a container after a `podman stop` process
started, but did not finish, stopping the container (probably an
ignored stop signal, with no time to SIGKILL?). This is a very
narrow case, but once you're in it the only way to recover is a
`podman rm -f` of the container or extensive manual remediation
(you'd have to kill the container yourself, manually, and then
force a `podman ps --all --sync` to update its status from the
OCI runtime).
[NO NEW TESTS NEEDED] I have no idea how to verify this one -
we need to test that it actually started *during* the other stop
command, and that's nontrivial.
Signed-off-by: Matthew Heon <mheon@redhat.com>
it allows to pass the current std streams down to the container.
conmon support: https://github.com/containers/conmon/pull/289
[NO TESTS NEEDED] it needs a new conmon.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This adds support to checkpoint containers out of pods and restore
container into pods.
It is only possible to restore a container into a pod if it has been
checkpointed out of pod. It is also not possible to restore a non pod
container into a pod.
The main reason this does not work is the PID namespace. If a non pod
container is being restored in a pod with a shared PID namespace, at
least one process in the restored container uses PID 1 which is already
in use by the infrastructure container. If someone tries to restore
container from a pod with a shared PID namespace without a shared PID
namespace it will also fail because the resulting PID namespace will not
have a PID 1.
Signed-off-by: Adrian Reber <areber@redhat.com>
Implement container to container copy. Previously data could only be
copied from/to the host.
Fixes: #7370
Co-authored-by: Mehul Arora <aroram18@mcmaster.ca>
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
The checkpoint archive compression was hardcoded to `archive.Gzip`.
There have been requests to make the used compression algorithm
selectable. There was especially the request to not compress the
checkpoint archive to be able to create faster checkpoints when not
compressing it.
This also changes the default from `gzip` to `zstd`. This change should
not break anything as the restore code path automatically handles
whatever compression the user provides during restore.
Signed-off-by: Adrian Reber <areber@redhat.com>
The --trace has helped in early stages analyze Podman code. However,
it's contributing to dependency and binary bloat. The standard go
tooling can also help in profiling, so let's turn `--trace` into a NOP.
[NO TESTS NEEDED]
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Traditionally, the path resolution for containers has been resolved on
the *host*; relative to the container's mount point or relative to
specified bind mounts or volumes.
While this works nicely for non-running containers, it poses a problem
for running ones. In that case, certain kinds of mounts (e.g., tmpfs)
will not resolve correctly. A tmpfs is held in memory and hence cannot
be resolved relatively to the container's mount point. A copy operation
will succeed but the data will not show up inside the container.
To support these kinds of mounts, we need to join the *running*
container's mount namespace (and PID namespace) when copying.
Note that this change implies moving the copy and stat logic into
`libpod` since we need to keep the container locked to avoid race
conditions. The immediate benefit is that all logic is now inside
`libpod`; the code isn't scattered anymore.
Further note that Docker does not support copying to tmpfs mounts.
Tests have been extended to cover *both* path resolutions for running
and created containers. New tests have been added to exercise the
tmpfs-mount case.
For the record: Some tests could be improved by using `start -a` instead
of a start-exec sequence. Unfortunately, `start -a` is flaky in the CI
which forced me to use the more expensive start-exec option.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
prune a dependency that was only being used for a simple struct. Should
correct checksum issue on tarballs
[NO TESTS NEEDED]
Fixes: #9355
Signed-off-by: baude <bbaude@redhat.com>
We missed bumping the go module, so let's do it now :)
* Automated go code with github.com/sirkon/go-imports-rename
* Manually via `vgrep podman/v2` the rest
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>