In debian environment we are hitting an edge-case where older buildah
version is not compatible with newer podman version because both of them
are using different storage driver.
I.e
* Podmand defaults to native `overlay`.
* Older buildah version defaults to `vfs`.
See discussions below for more details
* containers#18510 (comment)
Co-authored-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Aditya R <arajan@redhat.com>
The Sysctl=name=value entry can be used to set --sysctl=name=value
directly without the need to use PodmanArgs=--sysctl=name=value.
Signed-off-by: Laurenz Kruty <git@laurenzkruty.de>
There are certain messages logged by OCI runtimes when killing a
container that has already stopped that we really do not care
about when stopping a container. Due to our architecture, there
are inherent races around stopping containers, and so we cannot
guarantee that *we* are the people to kill it - but that doesn't
matter because Podman only cares that the container has stopped,
not who delivered the fatal signal.
Unfortunately, the OCI runtimes don't understand this, and log
various warning messages when the `kill` command is invoked on a
container that was already dead. These cause our tests to fail,
as we now check for clean STDERR when running Podman. To work
around this, capture STDERR for the OCI runtime in a buffer only
for stopping containers, and go through and discard any of the
warnings we identified as spurious.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
On FreeBSD, it usually lives in /usr/local/bin/bash. This uses the shell
'commmand' builtin to find the path which works in bash, dash and the
FreeBSD /bin/sh.
Signed-off-by: Doug Rabson <dfr@rabson.org>
Due to a bad file-format design, if a cirrus-cron job happened to have a
name w/ spaces, the generated e-mail text would be broken. For example:
```
Cron build 'VM' Failed: https://cirrus-ci.com/build/Image Maintenance
5630822628196352
```
Fix this by flipping the field-order in an intermediate file, so the
build ID comes first, then the job name. This makes it much easier for
`read` to process, since all words will be stored into the final
variable (now the job name).
Also change all variables that reference this intermediate file such
that they continue to reflect the expected field order. Update script
tests and add a new test to confirm expected file processing and output.
Signed-off-by: Chris Evich <cevich@redhat.com>
First: fix podman-registry script so it preserves the initial $PODMAN,
so all subsequent invocations of ps, logs, and stop will use the
same binary and arguments. Until now we've handled this by requiring
that our caller manage $PODMAN (and keep it the same), but that's
just wrong.
Next, simplify the golang interface: move the $PODMAN setting into
registry.go, instead of requiring e2e callers to set it. (This
could use some work: the local/remote conditional is icky).
IMPORTANT: To prevent registry.go from using the wrong podman binary,
the Start() call is gone. Only StartWithOptions() is valid now.
And, minor cleanup: comments, and add an actual error-message check
Reason for this PR is a recurring flake, #18355, whose multiple
failure modes I truly can't understand. I don't think this PR
is going to fix it, but this is still necessary work.
Signed-off-by: Ed Santiago <santiago@redhat.com>
We use shared-memory pthread mutexes to handle mutual exclusion
in Libpod. It turns out that these have configurable options for
how to handle a recursive lock (IE, a thread trying to lock a
lock that the same thread had previously locked). The mutex can
either deadlock, or allow the duplicate lock without deadlocking.
Default behavior is, helpfully, unspecified, so if not explicitly
set there is no clear indication of which of these behaviors will
be seen. Unfortunately, today is the first I learned of this, so
our initial implementation did *not* explicitly set our preferred
behavior.
This turns out to be a major problem with a language like Golang,
where multiple goroutines can (and often do) use the same OS
thread. So we can have two goroutines trying to stop the same
container, and if the no-deadlock mutex behavior is in use, both
threads will successfully acquire the lock because the C library,
not knowing about Go's lightweight threads, sees the same PID
trying to lock a mutex twice, and allows it without question.
It appears that, at least on Fedora/RHEL/Debian libc, the default
(unspecified) behavior of the locks is the non-deadlocking
version - so, effectively, our locks have been of questionable
utility within the same Podman process for the last four years.
This is somewhat concerning.
What's even more concerning is that the Golang-native sync.Mutex
that was also in use did nothing to prevent the duplicate locking
(I don't know if I like the implications of this).
Anyways, this resolves the major issue of our locks not working
correctly by explicitly setting the correct pthread mutex
behavior.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
If the first container to get the pod lock is the infra container
it's going to want to remove the entire pod, which will also
remove every other container in the pod. Subsequent containers
will get the pod lock and try to access the pod, only to realize
it no longer exists - and that, actually, the container being
removed also no longer exists.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This was causing some CI flakes. I'm pretty sure that the pods
being removed already isn't a bug, but just the result of another
container in the pod removing it first - so no reason not to
ignore the errors.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This fixes a lint issue, but I'm keeping it in its own commit so
it can be reverted independently if necessary; I don't know what
side effects this may have. I don't *think* there are any
issues, but I'm not sure why it wasn't a pointer in the first
place, so there may have been a reason.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
- trust_test: adding 'Ordered' seems to resolve a very common
flake. I've tested this for dozens of CI runs, and haven't
seen the flake recur (normally it fails every few runs).
- exec and search tests: add FlakeAttempts(3). This is a NOP
under our current CI setup, in which we run ginkgo with
a global --flake-attempts=3. I am submitting this as an
optimistic step toward a no-flake-attempts world (#17967)
Fixes: #18358
Signed-off-by: Ed Santiago <santiago@redhat.com>
For filter=id=XXX (containers, pods) and =ctr-ids=XXX (pods):
if XXX is only hex characters, treat it as a PREFIX
otherwise, treat it as a REGEX
Add tests. Update documentation. And fix an incorrect help message.
Fixes: #18471
Signed-off-by: Ed Santiago <santiago@redhat.com>
Ginkgo test names can have more than two levels: there can be
a nested series of Describes() before the final It(). (e.g.,
quadlet_test.go). Handle that.
Before: we just assumed that the third-or-maybe-fourth line
after a "-----" divider was the test name.
Now: examine every line after the "-----" divider, until the
first empty line. Lines with /path/to/source/file are ignored,
lines with text strings are assembled together to make anchors.
This is still imperfect but it's much better than before.
SPECIAL NOTE: in order to allow linking to timing results
in the AfterSuite, I've changed the test name from Leaf to Full.
This will now be a much longer string, and hence much less
readable, but I'm inclined to think it's more correct. Please
review carefully and lmk if I should revert.
Finally, as an unrelated add-on, add links (at top) to original
log, journal, and (if applicable) podman-remote server logs.
Signed-off-by: Ed Santiago <santiago@redhat.com>
To debug a deadlock, we really want to know what lock is actually
locked, so we can figure out what is using that lock. This PR
adds support for this, using trylock to check if every lock on
the system is free or in use. Will really need to be run a few
times in quick succession to verify that it's not a transient
lock and it's actually stuck, but that's not really a big deal.
Signed-off-by: Matt Heon <mheon@redhat.com>
This is a general debug command that identifies any lock
conflicts that could lead to a deadlock. It's only intended for
Libpod developers (while it does tell you if you need to run
`podman system renumber`, you should never have to do that
anyways, and the next commit will include a lot more technical
info in the output that no one except a Libpod dev will want).
Hence, hidden command, and only implemented for the local driver
(recommend just running it by SSHing into a `podman machine` VM
in the unlikely case it's needed by remote Podman).
These conflicts should normally never happen, but having a
command like this is useful for debugging deadlock conditions
when they do occur.
Signed-off-by: Matt Heon <mheon@redhat.com>
This is a nice quality-of-life change that should help to debug
situations where someone runs out of locks (usually when a bunch
of unused volumes accumulate).
Signed-off-by: Matt Heon <mheon@redhat.com>
Being able to easily identify what lock has been allocated to a
given Libpod object is only somewhat useful for debugging lock
issues, but it's trivial to expose and I don't see any harm in
doing so.
Signed-off-by: Matt Heon <mheon@redhat.com>