Add some test steps into quadlet - ContainerName. These steps are
used to ensure the default configuration for quadlets generated
service files is sending stdout/stderr/syslog to the journald.
Signed-off-by: Yiqiao Pu <ypu@redhat.com>
The restore code path never called completeNetworkSetup() and this means
that hosts/resolv.conf files were not populated. This fix is simply to
call this function. There is a big catch here. Technically this is
suposed to be called after the container is created but before it is
started. There is no such thing for restore, the container runs right
away. This means that if we do the call afterwards there is a short
interval where the file is still empty. Thus I decided to call it
before which makes it not working with PostConfigureNetNS (userns) but
as this does not work anyway today so I don't see it as problem.
Fixes#22901
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The pod was set after we checked the namespace and the namespace code
only checked the --pod flag but didn't consider --pod-id-file option.
As such fix the check to first set the pod option on the spec then use
that for the namespace. Also make sure we always use an empty default
otherwise it would be impossible in the backend to know if a user
requested a specific userns or not, i.e. even in case of a set
PODMAN_USERNS env a container should still get the userns from the pod
and not use the var in this case. Therefore unset it from the default
cli value.
There are more issues here around --pod-id-file and cli validation that
does not consider the option as conflicting with --userns like --pod
does but I decided to fix the bug at hand and don't try to fix the
entire mess which most likely would take days.
Fixes#22931
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Do not return 200 status code before we know if there will be an error.
Delay writing the status code until we send the first response. That way
we can set an error code inside the loop when we get a error on the
first try, i.e. because an invalid descriptor was used.
Fixes#22986
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When we failed to do anything we should return 500, the 409 code has a
special meaing to the client as it uses a different error format. As
such the remote client was not able to unmarshal the error correctly and
just returned an empty string.
Fixes#22989
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
With (esp. Debian) CI VM images built by
https://github.com/containers/automation_images/ pull/338 CI no-longer
tests with runc nor cgroups v1. Add logic to fail under these
conditions. Prune back high-level YAML/script envars and logic formerly
required to support these things.
Signed-off-by: Chris Evich <cevich@redhat.com>
When a user specifies a invalid connection in CONTAINER_CONNECTION then
podman should return a proper error saying so. Currently it ignored the
error and in rootFlags() just exited early with defining any flags. This
caused a panic then when trying to use the flags later.
In order to address this first store the connection error in the
PodmanConfig struct and not abort right away during flag setup. This is
important as the user might have specified a flag with a valid remote
connection. As such we check all flags and only when none were given we
return the connection error.
Also while at it I noticed that the default connection reported via
podman --help was wrong as it only used the old containers.conf field
for it and did not consider the podman-connections.json default.
New regression tests have been added to make sure it behaves correctly.
This fixes the problem reported in the PR #22997.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add a test to verify that restoring a container in a Pod works when
the `container restore --pod` option is used with Pod *name* (this
functionality was previously limited to support only full Pod ID).
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
PR #22821 (CI speedup) was overly aggressive in one kube test.
It's flaking. Bump up timeout from 3s to 4.
Signed-off-by: Ed Santiago <santiago@redhat.com>
At the end of all tests always check for leaks. That should make us more
robust against adding tests at the end that would leak stuff otherwise.
TODO: something seems wrong with bats when returning an error in
teardown_suite(), it prints a warning:
bats warning: Executed <NUM+1> instead of expected <NUM> tests
And also the output is formatted weirdly in this case where the podman
args are split over multiple lines.
But the test fails as expected so I don't think it is a problem.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
While these are not really slow they still take about 100-250ms if I
time this locally. Given they are run for every test this adds up
quickly. Looking at CI logs I can see the timings for skipped
tests are all in 600ms range. So I think it is safe to assume that these
functions need to get faster.
We have over 670 test cases currently so we talk about over 400s spend
in these functions in CI. This allows for big gains.
Now overall this is a tricky trade of, while all tests should cleanup
after themselves there is no guarantee for that as such errors can be
leaked into other tests making debugging much harder. To work at least a
bit against this teardown checks if the test was successful and only
skips the podman commands bases on that. Without it a single flake could
cause all following tets to fail.
As such this commit does the proper setup once one suite start then only
after a test failed.
In order for this to work at all we have to fix all leaks first, see
previous commits. And then for the future keep a very strong eye on
this during reviews.
Also add a PODMAN_BATS_LEAK_CHECK option
By default test must cleanup themselves and to speed up CI we no longer
do any cleanup in teardown by default. However there is still many cases
where we might have to debug a leak so add a new PODMAN_BATS_LEAK_CHECK
env option that can be set and should cause teardown to fail if the test
did not cleanup properly.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Remove leaking containers and remove unessesary push/pull args. For push
it tries to push an image as argument which makes no sense and for pull
we try to pull argument as image which is also wrong.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This is the same as what --squash-all is doing, and we already support
--squash with --layers=true since this is the default.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If a container was stopped and we try to start it before we called
cleanup it tried to reuse the network which caused a panic as the pasta
code cannot deal with that. It is also never correct as the netns must
be created by the runtime in case of custom user namespaces used. As
such the proper thing is to clean the netns up first.
Also change a e2e test to report better errors. It is not directly
related to this chnage but it failed on v1 of this patch so we noticed
the ugly error message it produced. Thanks to Ed for the fix.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This fixes a regression added in commit 4fd84190b8, because the name was
overwritten by the createTimer() timer call the removeTransientFiles()
call removed the new timer and not the startup healthcheck. And then
when the container was stopped we leaked it as the wrong unit name was
in the state.
A new test has been added to ensure the logic works and we never leak
the system timers.
Fixes#22884
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The buildah buil kill trick is bad as we have to sleep and wait to aboid
flakes which takes time. Instead it is possible to redo this build part
manually with buildah commands. It is not trival and harder to
understand but it safes 2-3s so I think it is worth it.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
First, as root don't wait 5s for the timeout, 1s is enough. Also switch
to use the curl --max-time option instead, that way we know we do not
kill curl before it had the chance to do anything possibly.
Second, combine podman inspect commands into one. This makes the test
faster by over one second as we safe a bunch of podman commands.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Another case of contianer does not exit with SIGTERM so we waste 10s.
Now because our contianer reacts to sigterm and exits 0 the systemd unit
status changed to inactive from failed.
And most importantly add Notify=yes because the socat call always failed
as the default is to not leak the notify socket into the container.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It is not clear at all why the count of 30 was choosen, this seems a
lot and of course takes quite a while. The test takes over 16s in CI.
To speed it up reduce the count to 10. I think this should still be good
enough to ensure there are no races IMO.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>