This was added by commit 84e42877a ("make lint: re-enable revive"),
making nolintlint became almost useless.
Remove the ungodly amount of unused nolint annotations.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
We use shared-memory pthread mutexes to handle mutual exclusion
in Libpod. It turns out that these have configurable options for
how to handle a recursive lock (IE, a thread trying to lock a
lock that the same thread had previously locked). The mutex can
either deadlock, or allow the duplicate lock without deadlocking.
Default behavior is, helpfully, unspecified, so if not explicitly
set there is no clear indication of which of these behaviors will
be seen. Unfortunately, today is the first I learned of this, so
our initial implementation did *not* explicitly set our preferred
behavior.
This turns out to be a major problem with a language like Golang,
where multiple goroutines can (and often do) use the same OS
thread. So we can have two goroutines trying to stop the same
container, and if the no-deadlock mutex behavior is in use, both
threads will successfully acquire the lock because the C library,
not knowing about Go's lightweight threads, sees the same PID
trying to lock a mutex twice, and allows it without question.
It appears that, at least on Fedora/RHEL/Debian libc, the default
(unspecified) behavior of the locks is the non-deadlocking
version - so, effectively, our locks have been of questionable
utility within the same Podman process for the last four years.
This is somewhat concerning.
What's even more concerning is that the Golang-native sync.Mutex
that was also in use did nothing to prevent the duplicate locking
(I don't know if I like the implications of this).
Anyways, this resolves the major issue of our locks not working
correctly by explicitly setting the correct pthread mutex
behavior.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
To debug a deadlock, we really want to know what lock is actually
locked, so we can figure out what is using that lock. This PR
adds support for this, using trylock to check if every lock on
the system is free or in use. Will really need to be run a few
times in quick succession to verify that it's not a transient
lock and it's actually stuck, but that's not really a big deal.
Signed-off-by: Matt Heon <mheon@redhat.com>
This is a nice quality-of-life change that should help to debug
situations where someone runs out of locks (usually when a bunch
of unused volumes accumulate).
Signed-off-by: Matt Heon <mheon@redhat.com>
On FreeBSD, the path argument to shm_open is not a filesystem path and we
must use shm_unlink to remove it. This changes the Linux build to also use
shm_unlink which avoids assuming that shared memory segments live in
/dev/shm.
Signed-off-by: Doug Rabson <dfr@rabson.org>
Motivated to have a working `make lint` on Fedora 37 (beta).
Most changes come from the new `gofmt` standards.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The nolintlint linter does not deny the use of `//nolint`
Instead it allows us to enforce a common nolint style:
- force that a linter name must be specified
- do not add a space between `//` and `nolint`
- make sure nolint is only used when there is actually a problem
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When starting a container libpod/runtime_pod_linux.go:NewPod calls
libpod/lock/lock.go:AllocateLock ends up in here. If you exceed
num_locks, in response to a "podman run ..." you will see:
Error: error allocating lock for new container: no space left on device
As noted inline, this error is technically true as it is talking about
the SHM area, but for anyone who has not dug into the source (i.e. me,
before a few hours ago :) your initial thought is going to be that
your disk is full. I spent quite a bit of time trying to diagnose
what disk, partition, overlay, etc. was filling up before I realised
this was actually due to leaking from failing containers.
This overrides this case to give a more explicit message that
hopefully puts people on the right track to fixing this faster. You
will now see:
$ ./bin/podman run --rm -it fedora bash
Error: error allocating lock for new container: allocation failed; exceeded num_locks (20)
[NO NEW TESTS NEEDED] (just changes an existing error message)
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Currently, subsequent runs of `make localunit` fail and complain about
prior existing /dev/shm/libpod_test and /dev/shm/test1.
This commit deletes these files if existing already, prior to running
the tests.
Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>
To avoid unnecessary warnings and errors in the future I'd like to
propose building all cgo related sources with `-Wall -Werror`. This
commit fixes some warnings which came up in `shm_lock.c`, too.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
After a reboot, when we refresh Podman's state, we retrieved the
lock from the fresh SHM instance, but we did not mark it as
allocated to prevent it being handed out to other containers and
pods.
Provide a method for marking locks as in-use, and use it when we
refresh Podman state after a reboot.
Fixes#2900
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When we're renumbering locks, we're destroying all existing
allocations anyways, so destroying the old lock struct is not a
particularly big deal. Existing long-lived libpod instances will
continue to use the old locks, but that will be solved in a
followon.
Also, solve an issue with returning error values in the C code.
There were a few places where we return ERRNO where it was not
set, so make them return actual error codes).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Renumber is a way of renumbering container locks after the number
of locks available has changed.
For now, renumber only works with containers.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This patch makes the path unigue to each UID.
Also cleans up some return code to return the path it is trying to lock.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Golint wants to rename the struct. I think the name is fine. I
can disable golint. Golint will no longer complain about the
name.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Move SHM specific code into a subpackage. Within the main locks
package, move the manager to be linux-only and add a non-Linux
unsupported build file.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>