When the path does not exist, filepath.EvalSymlinks returns an
empty string - so we can't just ignore ENOENT, we have to discard
the result if an ENOENT is returned.
Should fix Jira issue RHEL-37948
Signed-off-by: Matt Heon <mheon@redhat.com>
Moving from Go module v4 to v5 prepares us for public releases.
Move done using gomove [1] as with the v3 and v4 moves.
[1] https://github.com/KSubedi/gomove
Signed-off-by: Matt Heon <mheon@redhat.com>
When Podman starts, it checks a number of critical runtime paths
against stored values in the database to make sure that existing
containers are not broken by a configuration change. We recently
made some changes to this logic to make our handling of the some
options more sane (StaticDir in particular was set based on other
passed options in a way that was not particularly sane) which has
made the logic more sensitive to paths with symlinks. As a simple
fix, handle symlinks properly in our DB vs runtime comparisons.
The BoltDB bits are uglier because very, very old Podman versions
sometimes did not stuff a proper value in the database and
instead used the empty string. SQLite is new enough that we don't
have to worry about such things.
Fixes#20872
Signed-off-by: Matt Heon <mheon@redhat.com>
Only one process can write to the sqlite db at the same time, if another
process tries to use it at that time it fails and a database is locked
error is returned. If this happens sqlite should keep retrying until it
can write. To do that we can just set the _busy_timeout option. A 100s
timeout should be enough even on slower systems but not to much in case
there is a deadlock so it still returns in a reasonable time.
[NO NEW TESTS NEEDED] I think we strongly need to consider some form of
parallel stress testing to catch bugs like this.
Fixes#20809
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Added additional check for event type to be remove and set the correct exitcode.
While it was getting difficult to maintain the omitempty notation for Event->ContainerExitCode, changing the type from int to int ptr gives us the ability to check for ContainerExitCode to be not nil and continue operations from there.
closes#19124
Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
If a transaction is started it must either be committed or rolled back.
The function uses defer to call `tx.Rollback()` if there is an error
returned. However it also called `tx.Commit()` and afterwards further
errors can be returned which means it tries to roll back a already
committed transaction which cannot work.
This fix is to make sure tx.Commit() is the last call in that function.
see https://github.com/containers/podman/issues/20731
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We have to Commit() the transaction. Note this is only in a rare pod
remove code path and very unlikely to ever be used.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Use sqlite as default but for upgrades it will still use boltdb to avoid
breaking anyone. This is done by checking if the boltdb file already
exists and if it does then we have to use it.
I added a e2e test to check the new logic and removed the system test
for it, the problem with the system test is that we share the storage
dir there so all following commands without --db-backend would try to
use boltdb as a single --db-backend boltdb command will create the file
and then all folllwing commands will use it because of the backwards
compat. In e2e tests each test uses their own --root so it is not an
issue there.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in #19938.
There were multiple issues that I try to summarize below:
- c/common loaded the graphroot from c/storage to set the defaults for
static and volume dir. That ignored Podman's --root flag and
surfaced in #19938 and other bugs. c/common does not set the
defaults anymore which gives Podman the ability to detect when the
user/admin configured a custom directory (not empty value).
- When parsing the CLI, Podman (ab)uses containers.conf structures to
set the defaults but also to override them in case the user specified
a flag. The --root flag overrode the static dir which is wrong and
broke a couple of use cases. Now there is a dedicated field for in
the "PodmanConfig" which also includes a containers.conf struct.
- The defaults for static and volume dir and now being set correctly
and adhere to --root.
- The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
cleanup process. I believe that _all_ env variables should be passed
to conmon to avoid such subtle bugs.
Overall I find that the code and logic is scattered and hard to
understand and follow. I refrained from larger refactorings as I really
just want to get #19938 fixed and then go back to other priorities.
https://github.com/containers/common/pull/1659 broke three pkg/machine
tests. Those have been commented out until getting fixed.
Fixes: #19938
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
It turns out, after iterating over rows, we need to check for errors. It
also turns out that we did not do that at all.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Make sure to prune container exit codes only when the associated
container does not exist anymore. This is needed when checking if any
container in kube-play exited non-zero and a building block for the
below linked Jira card.
[NO NEW TESTS NEEDED] - there are no unit tests for exit code pruning.
Jira: https://issues.redhat.com/browse/RUN-1776
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
As shown in #17831, WAL mode plays a role in causing `database is locked`
errors. Those are errors, in theory, should not happen as the DB should
busy wait. mattn/go-sqlite3/issues/274 has some comments indicating
that the busy handler behaves differently in WAL mode which may be an
explanation to the error.
For now, let's disable WAL mode and only re-enable it when we have
clearer understanding of what's going on. The upstream issue along with
the SQLite documentation do not give me the clear guidance that I would
need.
[NO NEW TESTS NEEDED] - flake is only reproducible in CI.
Fixes: #18356
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
According to an old upstream issue [1]: "If the first statement after
BEGIN DEFERRED is a SELECT, then a read transaction is started.
Subsequent write statements will upgrade the transaction to a write
transaction if possible, or return SQLITE_BUSY."
So let's move the first SELECT under the same transaction as the table
initialization.
[NO NEW TESTS NEEDED] as it's a hard to cause race.
[1] https://github.com/mattn/go-sqlite3/issues/274#issuecomment-1429054597Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
`Ping()` requires the DB lock, so we had to move it into a transaction
to fix#17859. Since we try to access the DB directly afterwards, I
prefer to let that fail instead of paying the cost of a transaction
which would lock the DB for _all_ processes.
[NO NEW TESTS NEEDED] as it's a hard to reproduce race.
Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
If a container with an ID starting with "db1" exists, and a
container named "db1" also exists, and they are different
containers - if I run `podman inspect db1` the container named
"db1" should be inspected, and there should not be an error that
multiple containers matched the name or id "db1". This was
already handled by BoltDB, and now is properly managed by SQLite.
Fixes#17905
Signed-off-by: Matt Heon <mheon@redhat.com>
The DB config is a single-row table, and the first Podman process
to run against the database creates it. However, there was a race
where multiple Podman processes, started simultaneously, could
try and write it. Only the first would succeed, with subsequent
processes failing once (and then running correctly once re-ran),
but it was happening often in CI and deserves fixing.
[NO NEW TESTS NEEDED] It's a CI flake fix.
Signed-off-by: Matt Heon <mheon@redhat.com>
SQLite developers consider it a misfeature [1], and after turning it on,
we saw a new set of flakes. Let's turn it off and trust the developers
[1] that WAL mode is sufficient for our purposes.
Turning the shared cache off also makes the DB smaller and faster.
[NO NEW TESTS NEEDED]
[1] https://sqlite.org/forum/forumpost/1f291cdca4
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The symptoms in #17859 indicate that setting the PRAGMAs in individual
EXECs outside of a transaction can lead to concurrency issues and
failures when the DB is locked. Hence set all PRAGMAs when opening
the connection. Move them into individual constants to improve
documentation and readability.
Further make transactions exclusive as #17859 also mentions an error
that the DB is locked during a transaction.
[NO NEW TESTS NEEDED] - existing tests cover the code.
Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
<MH: Cherry-picked on top of my branch>
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
I was searching the SQLite docs for a fix, but apparently that
was the wrong place; it's a common enough error with the Go
frontend for SQLite that the fix is prominently listed in the API
docs for go-sqlite3. Setting cache mode to 'shared' and using a
maximum of 1 simultaneous open connection should fix.
Performance implications of this are unclear, but cache=shared
sounds like it will be a benefit, not a curse.
[NO NEW TESTS NEEDED] This fixes a flake with concurrent DB
access.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Transient mode means the DB should not persist, so instead of
using the GraphRoot we should use the RunRoot instead.
Signed-off-by: Matt Heon <mheon@redhat.com>
Return more sensible errors than SQLite's embedded constraint
failure ones. Should fix a number of integration tests.
Signed-off-by: Matt Heon <mheon@redhat.com>
A partial name match is tricky as we want it to be fast but also make
sure there's only one partial match iff there's no full one.
[NO NEW TESTS NEEDED] as it fixes a system test.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
I wasn't able to find a way to get error-checks working with the sqlite3
library with the time at hand.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
A number of fixes for pod creation and removal.
The important part is that matching partial IDs requires a trailing `%`
for SQL to interpret it as a wildcard. More information at
https://www.sqlitetutorial.net/sqlite-like/
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Allow to replace existing exit codes. A container may be started and
stopped multiple times etc.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The value of -1 is used when we do not _yet_ know the exit code of the
container. Otherwise, the DB checks would error. There's probably a
smarter than allowing -1 but for now, that will do the trick and let the
tests progress.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>