64 Commits

Author SHA1 Message Date
cb53abca28 In SQLite state, use defaults for empty-string checks
As part of our database init, we perform a check of the current
values for a few fields (graph driver, graph root, static dir,
and a few more) to validate that Libpod is being started with a
sane & sensible config, and the user's containers can actually be
expected to work. Basically, we take the current runtime config
and compare against values cached in the database from the first
time Podman was run.

We've had some issues with this logic before this year around
symlink resolution, but this is a new edge case. Somehow, the
database is being loaded with the empty string for some fields
(at least graph driver) which is causing comparisons to fail
because we will never compare against "" for those fields - we
insert the default value instead, assuming we have one.

Having a value of "" in the database largely invalidates the
check so arguably we could just drop it, but what BoltDB did -
and what SQLite does after this patch - is to use the default
value for comparison instead of "". This should still catch some
edge cases, and shouldn't be too harmful.

What this does not do is identify or solve the reason that we are
seeing the empty string in the database at all. From my read on
the logic, it must mean that the graph driver is explicitly set
to "" in the c/storage config at the time Podman is first run and
I'm not precisely sure how that happens.

Fixes #24738

Signed-off-by: Matt Heon <mheon@redhat.com>
2025-02-10 12:42:11 -05:00
6e027c0e37 Fix an improperly ignored error in SQLite
This looks like a case of boilerplate error handling making it
too easy to miss a legitimately ignored error, which is annoying.
In a more featureful language most of the SQL code here could be
macros (at the very least, Rust would have forced us to handle
all error cases, not just the one seen here).

Found while looking through the Libpod DB code, no actual bug I
can think of associated with this.

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-08-27 13:38:40 -04:00
b6b61a6a49 libpod: add hidden env to set sqlite timeout
Some users want to experiment with different timeout values.

Fixes #23236

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-22 12:59:00 +02:00
830e550073 Ignore result of EvalSymlinks on ENOENT
When the path does not exist, filepath.EvalSymlinks returns an
empty string - so we can't just ignore ENOENT, we have to discard
the result if an ENOENT is returned.

Should fix Jira issue RHEL-37948

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-07-11 09:39:56 -04:00
6dd9abf9ec sqlite_state: Fix RewriteVolumeConfig
The VolumeConfig table does not have an ID column, thus
use the Name column to update it.

Fixes #23052

Signed-off-by: Marius Hoch <mail@mariushoch.de>
2024-06-20 11:39:44 +02:00
c681df35c0 chore: fix function names in comment
Signed-off-by: findnature <cricis@aliyun.com>
2024-04-24 12:07:38 +08:00
72f1617fac Bump Go module to v5
Moving from Go module v4 to v5 prepares us for public releases.

Move done using gomove [1] as with the v3 and v4 moves.

[1] https://github.com/KSubedi/gomove

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-02-08 09:35:39 -05:00
8d14d41555 Run codespell on code
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-01-28 07:30:52 -05:00
2a2d0b0e18 chore: delete obsolete // +build lines
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2024-01-04 11:53:38 +02:00
f384bdf66b Handle symlinks when checking DB vs runtime configs
When Podman starts, it checks a number of critical runtime paths
against stored values in the database to make sure that existing
containers are not broken by a configuration change. We recently
made some changes to this logic to make our handling of the some
options more sane (StaticDir in particular was set based on other
passed options in a way that was not particularly sane) which has
made the logic more sensitive to paths with symlinks. As a simple
fix, handle symlinks properly in our DB vs runtime comparisons.

The BoltDB bits are uglier because very, very old Podman versions
sometimes did not stuff a proper value in the database and
instead used the empty string. SQLite is new enough that we don't
have to worry about such things.

Fixes #20872

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-12-02 15:48:47 -05:00
5b3d82f9bc sqlite: set busy timeout to 100s
Only one process can write to the sqlite db at the same time, if another
process tries to use it at that time it fails and a database is locked
error is returned. If this happens sqlite should keep retrying until it
can write. To do that we can just set the _busy_timeout option. A 100s
timeout should be enough even on slower systems but not to much in case
there is a deadlock so it still returns in a reasonable time.

[NO NEW TESTS NEEDED] I think we strongly need to consider some form of
parallel stress testing to catch bugs like this.

Fixes #20809

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-11-29 18:07:29 +01:00
83c08a2f5c Merge pull request #20609 from cgiradkar/19124_remove_event_fix
Set correct exitcode in remove events
2023-11-28 16:21:17 +00:00
2645f91bfe Merge pull request #20813 from Luap99/sqlite-removepodcontainers
sqlite: fix missing Commit() in RemovePodContainers()
2023-11-28 16:07:18 +00:00
572f38c0db Set correct exitcode in remove events and change ContainerExitCode from int to int ptr
Added additional check for event type to be remove and set the correct exitcode.
While it was getting difficult to maintain the omitempty notation for Event->ContainerExitCode, changing the type from int to int ptr gives us the ability to check for ContainerExitCode to be not nil and continue operations from there.

closes #19124

Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
2023-11-28 13:31:18 +00:00
d7b970a4c4 sqlite: fix issue in ValidateDBConfig()
If a transaction is started it must either be committed or rolled back.
The function uses defer to call `tx.Rollback()` if there is an error
returned. However it also called `tx.Commit()` and afterwards further
errors can be returned which means it tries to roll back a already
committed transaction which cannot work.

This fix is to make sure tx.Commit() is the last call in that function.
see https://github.com/containers/podman/issues/20731

[NO NEW TESTS NEEDED]

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-11-28 14:27:49 +01:00
e26f677b16 sqlite: fix missing Commit() in RemovePodContainers()
We have to Commit() the transaction. Note this is only in a rare pod
remove code path and very unlikely to ever be used.

[NO NEW TESTS NEEDED]

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-11-28 14:26:29 +01:00
478afa728d vendor: update containers/{common,storage,image,buildah}
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-11-21 21:04:47 +01:00
bad25da92e libpod: add !remote tag
This should never be pulled into the remote client.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-24 12:11:34 +02:00
29ae516006 use sqlite as default database
Use sqlite as default but for upgrades it will still use boltdb to avoid
breaking anyone. This is done by checking if the boltdb file already
exists and if it does then we have to use it.

I added a e2e test to check the new logic and removed the system test
for it, the problem with the system test is that we share the storage
dir there so all following commands without --db-backend would try to
use boltdb as a single --db-backend boltdb command will create the file
and then all folllwing commands will use it because of the backwards
compat. In e2e tests each test uses their own --root so it is not an
issue there.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-10 17:11:28 +02:00
6293ec2e2d fix handling of static/volume dir
The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in #19938.

There were multiple issues that I try to summarize below:

 - c/common loaded the graphroot from c/storage to set the defaults for
   static and volume dir.  That ignored Podman's --root flag and
   surfaced in #19938 and other bugs.  c/common does not set the
   defaults anymore which gives Podman the ability to detect when the
   user/admin configured a custom directory (not empty value).

 - When parsing the CLI, Podman (ab)uses containers.conf structures to
   set the defaults but also to override them in case the user specified
   a flag.  The --root flag overrode the static dir which is wrong and
   broke a couple of use cases.  Now there is a dedicated field for in
   the "PodmanConfig" which also includes a containers.conf struct.

 - The defaults for static and volume dir and now being set correctly
   and adhere to --root.

 - The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
   cleanup process.  I believe that _all_ env variables should be passed
   to conmon to avoid such subtle bugs.

Overall I find that the code and logic is scattered and hard to
understand and follow.  I refrained from larger refactorings as I really
just want to get #19938 fixed and then go back to other priorities.

https://github.com/containers/common/pull/1659 broke three pkg/machine
tests.  Those have been commented out until getting fixed.

Fixes: #19938
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-09-25 14:14:30 +02:00
2efa7c3fa1 make lint: enable rowserrcheck
It turns out, after iterating over rows, we need to check for errors. It
also turns out that we did not do that at all.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-06-19 14:31:40 +02:00
6dbc138339 prune exit codes only when container doesn't exist
Make sure to prune container exit codes only when the associated
container does not exist anymore.  This is needed when checking if any
container in kube-play exited non-zero and a building block for the
below linked Jira card.

[NO NEW TESTS NEEDED] - there are no unit tests for exit code pruning.

Jira: https://issues.redhat.com/browse/RUN-1776
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-05-25 13:14:27 +02:00
1fb3cdf8a8 sqlite: disable WAL mode
As shown in #17831, WAL mode plays a role in causing `database is locked`
errors.  Those are errors, in theory, should not happen as the DB should
busy wait.  mattn/go-sqlite3/issues/274 has some comments indicating
that the busy handler behaves differently in WAL mode which may be an
explanation to the error.

For now, let's disable WAL mode and only re-enable it when we have
clearer understanding of what's going on.  The upstream issue along with
the SQLite documentation do not give me the clear guidance that I would
need.

[NO NEW TESTS NEEDED] - flake is only reproducible in CI.

Fixes: #18356
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-05-09 15:54:26 +02:00
bbe9d61c49 sqlite: move first read into a transaction
According to an old upstream issue [1]: "If the first statement after
BEGIN DEFERRED is a SELECT, then a read transaction is started.
Subsequent write statements will upgrade the transaction to a write
transaction if possible, or return SQLITE_BUSY."

So let's move the first SELECT under the same transaction as the table
initialization.

[NO NEW TESTS NEEDED] as it's a hard to cause race.

[1] https://github.com/mattn/go-sqlite3/issues/274#issuecomment-1429054597

Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-04-25 16:01:49 +02:00
cdb5b3e990 sqlite: do not Ping() after connecting
`Ping()` requires the DB lock, so we had to move it into a transaction
to fix #17859. Since we try to access the DB directly afterwards, I
prefer to let that fail instead of paying the cost of a transaction
which would lock the DB for _all_ processes.

[NO NEW TESTS NEEDED] as it's a hard to reproduce race.

Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-28 11:27:43 +02:00
8bd9109fb8 Merge pull request #17917 from mheon/fix_17905
Ensure that SQLite state handles name-ID collisions
2023-03-27 07:48:37 -04:00
7daab31f1f Ensure that SQLite state handles name-ID collisions
If a container with an ID starting with "db1" exists, and a
container named "db1" also exists, and they are different
containers - if I run `podman inspect db1` the container named
"db1" should be inspected, and there should not be an error that
multiple containers matched the name or id "db1". This was
already handled by BoltDB, and now is properly managed by SQLite.

Fixes #17905

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-24 15:09:25 -04:00
e061cb968c Fix a race around SQLite DB config validation
The DB config is a single-row table, and the first Podman process
to run against the database creates it. However, there was a race
where multiple Podman processes, started simultaneously, could
try and write it. Only the first would succeed, with subsequent
processes failing once (and then running correctly once re-ran),
but it was happening often in CI and deserves fixing.

[NO NEW TESTS NEEDED] It's a CI flake fix.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-23 19:48:27 -04:00
b31d9e15f2 sqlite: do not use shared cache
SQLite developers consider it a misfeature [1], and after turning it on,
we saw a new set of flakes.  Let's turn it off and trust the developers
[1] that WAL mode is sufficient for our purposes.

Turning the shared cache off also makes the DB smaller and faster.

[NO NEW TESTS NEEDED]

[1] https://sqlite.org/forum/forumpost/1f291cdca4

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-22 15:44:38 +01:00
3925cd653b Drop SQLite max connections
The SQLite transaction lock Valentin found is (slightly) faster.
So let's go with that.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-21 14:20:34 -04:00
0fbc325156 sqlite: set connection attributes on open
The symptoms in #17859 indicate that setting the PRAGMAs in individual
EXECs outside of a transaction can lead to concurrency issues and
failures when the DB is locked.  Hence set all PRAGMAs when opening
the connection.  Move them into individual constants to improve
documentation and readability.

Further make transactions exclusive as #17859 also mentions an error
that the DB is locked during a transaction.

[NO NEW TESTS NEEDED] - existing tests cover the code.

Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>

<MH: Cherry-picked on top of my branch>

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-21 12:51:31 -04:00
9f0e0e8331 Fix database locked errors with SQLite
I was searching the SQLite docs for a fix, but apparently that
was the wrong place; it's a common enough error with the Go
frontend for SQLite that the fix is prominently listed in the API
docs for go-sqlite3. Setting cache mode to 'shared' and using a
maximum of 1 simultaneous open connection should fix.

Performance implications of this are unclear, but cache=shared
sounds like it will be a benefit, not a curse.

[NO NEW TESTS NEEDED] This fixes a flake with concurrent DB
access.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-21 09:57:56 -04:00
94f905a503 Fix SQLite DB schema migration code
It now can safely run on bare databases, before any tables are
created.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-17 13:24:53 -04:00
6142c16a9c Ensure SQLite places uses the runroot in transient mode
Transient mode means the DB should not persist, so instead of
using the GraphRoot we should use the RunRoot instead.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-15 14:45:28 -04:00
6e0f11da5d Improve handling of existing container names in SQLite
Return more sensible errors than SQLite's embedded constraint
failure ones. Should fix a number of integration tests.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-15 14:44:47 -04:00
38acab832d sqlite: remove dead code
Found by golangci-lint.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-01 16:09:51 +01:00
86d12520e9 sqlite: implement RewriteVolumeConfig
[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-01 16:09:51 +01:00
df88f546b6 sqlite: LookupVolume: fix partial name match
A partial name match is tricky as we want it to be fast but also make
sure there's only one partial match iff there's no full one.

[NO NEW TESTS NEEDED] as it fixes a system test.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-01 16:09:51 +01:00
01359457c4 sqlite: LookupVolume: wrap error
Wrap the error with the message expexted by the system tests.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-01 16:09:51 +01:00
e87014e444 sqlite: return correct error on pod-name conflict
I wasn't able to find a way to get error-checks working with the sqlite3
library with the time at hand.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-01 16:09:51 +01:00
84b5c6c713 sqlite: RewritePodConfig: update error message
Use the same error message as the boltdb backend.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-01 16:09:51 +01:00
5d2d609be4 sqlite: fix volume lookups with partial names
Requires the trailing `%` to work correctly, see
        https://www.sqlitetutorial.net/sqlite-like/

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 13:56:58 +01:00
495314a16a sqlite: fix container lookups with partial IDs
Requires the trailing `%` to work correctly, see
	https://www.sqlitetutorial.net/sqlite-like/

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 13:47:32 +01:00
efe7aeb1da sqlite: fix LookupPod
To return the error message expected by the system tests.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 13:42:41 +01:00
19c2f37ba5 sqlite: fix pod create/rm
A number of fixes for pod creation and removal.

The important part is that matching partial IDs requires a trailing `%`
for SQL to interpret it as a wildcard.  More information at
	https://www.sqlitetutorial.net/sqlite-like/

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 13:38:17 +01:00
e32bea9378 sqlite: LookupContainer: update error message
As expected by the system tests.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 11:36:47 +01:00
565bb56454 sqlite: AddContainerExitCode: allow to replace
Allow to replace existing exit codes.  A container may be started and
stopped multiple times etc.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 11:30:46 +01:00
1b1cdfa357 sqlite: fix AllContainers with state
The state has been unmarshalled into the config which surfaced in wrong
states.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 11:19:43 +01:00
21fcc9070f sqlite: fix "UPDATE TABLE" typos
"TABLE" should refer to the actual table.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 10:48:11 +01:00
3f96b0ef28 sqlite: SaveVolume: fix syntax error updating the volumes table
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-02-23 10:35:48 +01:00