Use sqlite as default but for upgrades it will still use boltdb to avoid
breaking anyone. This is done by checking if the boltdb file already
exists and if it does then we have to use it.
I added a e2e test to check the new logic and removed the system test
for it, the problem with the system test is that we share the storage
dir there so all following commands without --db-backend would try to
use boltdb as a single --db-backend boltdb command will create the file
and then all folllwing commands will use it because of the backwards
compat. In e2e tests each test uses their own --root so it is not an
issue there.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in #19938.
There were multiple issues that I try to summarize below:
- c/common loaded the graphroot from c/storage to set the defaults for
static and volume dir. That ignored Podman's --root flag and
surfaced in #19938 and other bugs. c/common does not set the
defaults anymore which gives Podman the ability to detect when the
user/admin configured a custom directory (not empty value).
- When parsing the CLI, Podman (ab)uses containers.conf structures to
set the defaults but also to override them in case the user specified
a flag. The --root flag overrode the static dir which is wrong and
broke a couple of use cases. Now there is a dedicated field for in
the "PodmanConfig" which also includes a containers.conf struct.
- The defaults for static and volume dir and now being set correctly
and adhere to --root.
- The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
cleanup process. I believe that _all_ env variables should be passed
to conmon to avoid such subtle bugs.
Overall I find that the code and logic is scattered and hard to
understand and follow. I refrained from larger refactorings as I really
just want to get #19938 fixed and then go back to other priorities.
https://github.com/containers/common/pull/1659 broke three pkg/machine
tests. Those have been commented out until getting fixed.
Fixes: #19938
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
It turns out, after iterating over rows, we need to check for errors. It
also turns out that we did not do that at all.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Make sure to prune container exit codes only when the associated
container does not exist anymore. This is needed when checking if any
container in kube-play exited non-zero and a building block for the
below linked Jira card.
[NO NEW TESTS NEEDED] - there are no unit tests for exit code pruning.
Jira: https://issues.redhat.com/browse/RUN-1776
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
As shown in #17831, WAL mode plays a role in causing `database is locked`
errors. Those are errors, in theory, should not happen as the DB should
busy wait. mattn/go-sqlite3/issues/274 has some comments indicating
that the busy handler behaves differently in WAL mode which may be an
explanation to the error.
For now, let's disable WAL mode and only re-enable it when we have
clearer understanding of what's going on. The upstream issue along with
the SQLite documentation do not give me the clear guidance that I would
need.
[NO NEW TESTS NEEDED] - flake is only reproducible in CI.
Fixes: #18356
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
According to an old upstream issue [1]: "If the first statement after
BEGIN DEFERRED is a SELECT, then a read transaction is started.
Subsequent write statements will upgrade the transaction to a write
transaction if possible, or return SQLITE_BUSY."
So let's move the first SELECT under the same transaction as the table
initialization.
[NO NEW TESTS NEEDED] as it's a hard to cause race.
[1] https://github.com/mattn/go-sqlite3/issues/274#issuecomment-1429054597Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
`Ping()` requires the DB lock, so we had to move it into a transaction
to fix#17859. Since we try to access the DB directly afterwards, I
prefer to let that fail instead of paying the cost of a transaction
which would lock the DB for _all_ processes.
[NO NEW TESTS NEEDED] as it's a hard to reproduce race.
Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
If a container with an ID starting with "db1" exists, and a
container named "db1" also exists, and they are different
containers - if I run `podman inspect db1` the container named
"db1" should be inspected, and there should not be an error that
multiple containers matched the name or id "db1". This was
already handled by BoltDB, and now is properly managed by SQLite.
Fixes#17905
Signed-off-by: Matt Heon <mheon@redhat.com>
The DB config is a single-row table, and the first Podman process
to run against the database creates it. However, there was a race
where multiple Podman processes, started simultaneously, could
try and write it. Only the first would succeed, with subsequent
processes failing once (and then running correctly once re-ran),
but it was happening often in CI and deserves fixing.
[NO NEW TESTS NEEDED] It's a CI flake fix.
Signed-off-by: Matt Heon <mheon@redhat.com>
SQLite developers consider it a misfeature [1], and after turning it on,
we saw a new set of flakes. Let's turn it off and trust the developers
[1] that WAL mode is sufficient for our purposes.
Turning the shared cache off also makes the DB smaller and faster.
[NO NEW TESTS NEEDED]
[1] https://sqlite.org/forum/forumpost/1f291cdca4
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The symptoms in #17859 indicate that setting the PRAGMAs in individual
EXECs outside of a transaction can lead to concurrency issues and
failures when the DB is locked. Hence set all PRAGMAs when opening
the connection. Move them into individual constants to improve
documentation and readability.
Further make transactions exclusive as #17859 also mentions an error
that the DB is locked during a transaction.
[NO NEW TESTS NEEDED] - existing tests cover the code.
Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
<MH: Cherry-picked on top of my branch>
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
I was searching the SQLite docs for a fix, but apparently that
was the wrong place; it's a common enough error with the Go
frontend for SQLite that the fix is prominently listed in the API
docs for go-sqlite3. Setting cache mode to 'shared' and using a
maximum of 1 simultaneous open connection should fix.
Performance implications of this are unclear, but cache=shared
sounds like it will be a benefit, not a curse.
[NO NEW TESTS NEEDED] This fixes a flake with concurrent DB
access.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Transient mode means the DB should not persist, so instead of
using the GraphRoot we should use the RunRoot instead.
Signed-off-by: Matt Heon <mheon@redhat.com>
Return more sensible errors than SQLite's embedded constraint
failure ones. Should fix a number of integration tests.
Signed-off-by: Matt Heon <mheon@redhat.com>
A partial name match is tricky as we want it to be fast but also make
sure there's only one partial match iff there's no full one.
[NO NEW TESTS NEEDED] as it fixes a system test.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
I wasn't able to find a way to get error-checks working with the sqlite3
library with the time at hand.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
A number of fixes for pod creation and removal.
The important part is that matching partial IDs requires a trailing `%`
for SQL to interpret it as a wildcard. More information at
https://www.sqlitetutorial.net/sqlite-like/
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Allow to replace existing exit codes. A container may be started and
stopped multiple times etc.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The value of -1 is used when we do not _yet_ know the exit code of the
container. Otherwise, the DB checks would error. There's probably a
smarter than allowing -1 but for now, that will do the trick and let the
tests progress.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
[NO NEW TESTS NEEDED] - the sqlite backend is still in development and
is not enabled by default.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
- Added a mechanism to check schema version and migrate
(no migrations yet since schema hasn't changed yet).
- Added pod support to AddContainer, and unified AddContainer and
RemoveContainer between containers and pods.
- Fixed newly-added GetPodName and GetCtrName in BoltDB so they
only return pod/container names.
Signed-off-by: Matt Heon <mheon@redhat.com>
This contains the implementation of (most) container functions,
with stubs for all pod and volume functions. Presently accessed
via environment variable only for testing purposes.
Signed-off-by: Matt Heon <mheon@redhat.com>