8 Commits

Author SHA1 Message Date
70c131ed68 System tests: set a default XDG_RUNTIME_DIR
Yield to reality: if $XDG_RUNTIME_DIR is unset, assume a
reasonable default (rootless only). This clears up a
common failure in Fedora gating tests, and will probably
prevent future time wasters.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-09-25 12:45:17 -06:00
792796183f libpod: setupNetNS() correctly mount netns
The netns dir has a special logic to bind mout itself and make itslef
shared. This code here didn't which lead to catastrophic bug during
netns unmounting as we were unable to unmount the netns as the mount got
duplicated and had the wrong parent mount. This caused us to loop forever
trying to remove the file.

Fixes https://issues.redhat.com/browse/RHEL-59620
Fixes #23685

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-09-20 15:19:22 +02:00
3350cd3eed pkg/rootless: simplify reexec for container code
The code currently tried to avoid joining the userns from conmon
directly and rather joined to only read the pid file and then send this
back to use so we could join the userns. From the comment this was done
because we could not read the pid file. However this is no longer true
as of commit 49eb5af301 and file is no always owned by the real user.

This means we can just remove this special logic and join the namespace
directly there. A test has been added to check the rejoin logic with a
custom uidmapping.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-08 13:28:31 +02:00
2a609b0f74 rootless: fix reexec to use /proc/self/exe
Under some circumstances podman might be executed with a different argv0
than the actual path to the podman binary. This breaks the reexec logic
as it tried to exec argv0 which failed.

This is visible when using podmansh as login shell which get's the
special -podmansh on argv0 to signal the shell it is a login shell.

To fix this we can simply use /proc/self/exe as command path which is
much more robust and the argv array is still passed correctly.

Fixes #22672

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-05-14 12:02:19 +02:00
4c1c4c082a Vendor latest c/common and fix tests
This vendors the latest c/common version, including making Pasta
the default rootless network provider. That broke a number of
tests, which have been fixed as part of this PR.

Also includes a change to network stats logic, which simplifies
the code a bit and makes it actually work with Pasta.

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-02-29 12:16:51 -05:00
232c32bd35 CI: systests: safer isolation in registry & tests
Our test registry (used for login & local registry tests)
was being run using the standard podman tmpdir, hence the
standard podman database, This was then getting clobbered
in the 330-corrupt-images test, which runs "system reset".
We just didn't know this was happening. Until we added
a registry test after the system reset. Oops.

Solution: new helper function podman_isolation_opts()
sets --root, --runroot, *and --tmpdir*. Refactor all
existing --root/--runroot usages. Document.

Next problem: the "network reload" test in 500-networking.bats
did not (could not) know about our registry port, so the
"iptables -F" command reverted that to DROP, so the subsequent
podman-auth in 700-play timed out.

Solution: add a podman-isolated "network reload" to start_registry().

Final problem, because, really, those weren't enough: a BATS
bug where running with --filter-tags would set IFS=',' in setup_suite
which in turn has catastrophic consequences:

    https://github.com/bats-core/bats-core/issues/812

See #20966 for details of the failure and further conversation.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2023-12-13 09:46:09 -07:00
6bc52c9c5e pkg/rootless: correctly handle proxy signals on reexec
There are quite a lot of places in podman were we have some signal
handlers, most notably libpod/shutdown/handler.go.

However when we rexec we do not want any of that and just send all
signals we get down to the child obviously. So before we install our
signal handler we must first reset all others with signal.Reset().

Also while at it fix a problem were the joinUserAndMountNS() code path
would not forward signals at all. This code path is used when you have
running containers but the pause process was killed.

Fixes #16091
Given that signal handlers run in different goroutines parallel it would
explain why it flakes sometimes in CI. However to my understanding this
flake can only happen when the pause process is dead before we run the
podman command. So the question still is what kills the pause process?

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-05-25 16:48:15 +02:00
bab95de9a2 rootless: make sure we only use a single pause process
Currently --tmpdir changes the location of the pause.pid file. this
causes issues because the c code in pkg/rootless does not know about
that. I tried to fix this[1] by fixing the c code to not use the
shortcut. While this fix worked it will result in many pause processes
leaking in the integrration tests.

Commit ab88632 added this behavior but following the disccusion it was
never the intention that we end up having more than one pause process.
The issues that was trying to fix was caused by somthing else AFAICT,
the main problem seems to be that the pause.pid file parent directory
may not be created when we try to create the pid file so it failed with
ENOENT. This patch fixes it by creating this directory always and revert
the change to no longer depend on the tmpdir value.

With this commit we now always use XDG_RUNTIME_DIR/libpod/tmp/pause.pid
for all podman processes. This allows the c shortcut to work reliably
and should therefore improve perfomance over my other approach.

A system test is added to ensure we see the right behavior and that
podman system migrate actually stops the pause process. Thanks to Ed
Santiago for the improved test to make it work for both `catatonit` and
`podman pause`.

This should fix the issues with namespace missmatches that we can see in
CI as flakes.

[1] https://github.com/containers/podman/pull/18057

Fixes #18057

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-11 10:57:46 +02:00