In case of timeouts actually log the command again and make sure to send
SIGABRT to the process as go will create a useful stack strace where we
can see where things are hanging. It also kill the process unlike the
default Eventually().Should(Exit()) call the leaves the process around.
The output will be captured by default in the log so we just see the
stack trace there.
And while at it bump the timout up to 10 mins, we are hitting hard
flakes in CI where machine init takes longer than 5 mins for unknown
reasons but this seems to be good enough.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
I am seeing a weird flake in my parallel system test PR. The issue is
that system units generated by podman systemd generate leave a container
in the Removing state behind.
As far as I can tell the porblems seems to be that the cleanup process
is killed while it tries to remove the container from the db. Because
the cidfile was removed before the ExecStopPost=podman rm ... process no
longer had access to the cidfile and reported no error because it runs
with --ignore.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
- fix test name to reflect that it's not pasta-only
(followup from #21563)
- in one podman-update test run in OpenQA, defer assertion
failures so we can gather better data on regressions.
This would've been helpful in diagnosing bz2281805.
- add an error-message check to one test that needed it
(found by accident)
- add distro-integration test tag to a handful of new tests,
so they run in OpenQA. Found via 'git diff 33891e8 test/system'
and scanning for '^\+@test '. I only added tests that IMO
have some risk of interacting poorly with kernel or systemd
updates, e.g. quadlet, modules, tmpfs+noswap.
Signed-off-by: Ed Santiago <santiago@redhat.com>
As we want to get rid of the special titles convert the existing skips
to the only_if condition, this makes it more readable as we do not need
to negate so much.
Then add similar conditions for all test tasks, this removes the need to
a special title such as CI:DOCS as the logic is smart enough to only
docs changes when no source code was changed.
Update the documentation for the new logic and no longer point
contributors to the CI:DOCS title as it is gone now.
There is a bunch of duplication in the rules as yaml doesn't allow us to
share only parts of a string. To prevent unwanted drift a test case in
contrib/cirrus/cirrus_yaml_test.py is added to ensure all conditions
follow the same base ruleset.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The events code makes use of two channels, one for the events and one
for the resulting error. Then in the main file we have a loop reading
from both channels that should exit on first error it gets.
However in case the event channel is closed before the error channel
cotains the error it could caused an early exit as it looked like all
events were done. Commit c46884aa93 fixed that somewhat by checking for
an error in the error channel before exiting. This however was still
racy as it added a default case in the select which means the channel
check is non blocking. Thus the error was not yet send into the channel.
To fix this we should make it a blocking read to wait for the error in
the channel. Also the err != nil check can be removed as we either
return err or nil anyway.
And as last step make sure the error channel is closed, that prevents us
from blocking forever in case the main select already processed the nil
error.
Fixes#23165
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Currently all podman machine rm errors in AfterEach were ignored.
This means some leaked and caused issues later on, see #22844.
To fix it first rework the logic to only remove machines when needed at
the place were they are created using DeferCleanup(), however
DeferCleanup() does not work well together with AfterEach() as it always
run AfterEach() before DeferCleanup(). As AfterEach() deletes the dir
the podman machine rm call can not be done afterwards.
As such migrate all cleanup to use DeferCleanup() and while I have to
touch this fix the code to remove the per file duplciation and define
the setup/cleanup once in the global scope.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
On linux and macos the connections are stored under the home dir by
default so it is not a problem there but on windows we first check
the APPDATA env and use this dir as config storage. This has the problem
that it is not cleaned up after each test as such connections might leak
into the following test causing failues there.
Fixes#22844
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The test check the the default volume is not on tmpfs, however what it
should really check that the volume is on our container storage fs. It
is possible that users run the storage on top of tmpfs so this test
always failed there.
The better check is to compare the fs from the graphroot and the volume.
Unfortunately, for unknown reasons stat -f -c %T returns UNKNOWN and not
the actual fs. I have no idea why, to work around that we now parse
/proc/mounts manually for the fs. Not nice but at least it works
correctly.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Try to speed up the CI tests by using tmpfs as container storage.
This is important for system tests, other tests setup their own --root
already on tmpfs so it should not effect them.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This senetence does not add any value and instead confuses users as it
suggest that the name is somhow special and related to bridge networks
which is not the case. Using either the name or id is fine as described
in the sentence before.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When a users asks for specific devices we should still add them and not
ignore them just because privileged adds all of them.
Most notably if you set --device /dev/null:/dev/test you expect
/dev/test in the container, however as we ignored them this was not the
case. Another side effect is that the input was not validated at at all.
This leads to confusion as descriped in the issue.
Fixes#23132
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
I found that Quadlet didn't currently have support for log options.
This merge allows Quadlet to handle log options and correctly
pass those values through to `podman run` for Container and Kube
types.
Syntactically consistent with existing parameters:
```ini
[Container]
Image=localhost/imagename
LogOpt=path=/var/log/container/mycontainer.json
LogOpt=size=10mb
```
Signed-off-by: Brett Calliss <brett@obligatory.email>
Close loophole that would allow you to assign more memory than the
system has to a podman machine
Fixes: #18206
Signed-off-by: Brent Baude <bbaude@redhat.com>
The logs are not verbose if `--debug` is not set, and very useful to
have if gvproxy exits unexpectedly.
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
This PR is a couple of small fixes so that our CI would be capable of running the machine test suite on the libkrun provider.
RUN-2172
Signed-off-by: Brent Baude <bbaude@redhat.com>
Podman machine reset now removes and resets machines from all providers availabe on the platform.
On windows, if the user is does not have admin privs, machine will only reset WSL, but will emit a warning that it is unable to remove hyperV machines without elevated privs.
Signed-off-by: Ashley Cui <acui@redhat.com>