This commit adds two new annotations named
io.podman.annotations.cpuset/$ctrname and
io.podman.annotations.memory-nodes/$ctrname
The first one allows restricting a container's execution to specific
CPU cores while the second restricts memory allocations to specific
NUMA memory nodes. They are also added automatically when the
--cpuset-cpus and --cpuset-mems options are used.
Fixes: containers#26172
Signed-off-by: François Poirotte <clicky@erebot.net>
This fixes an issue where multiple paths separated by a colon were
treated as a single path, contrary to what docs say and unlike how mask
option works.
Test was updated with a case that fails without this commit.
Signed-off-by: Šimon Škoda <ver4a@uncontrol.me>
This commit adds new annotation called:
io.podman.annotations.pids-limit/$ctrname
This annotation is used to define the PIDsLimit for
a particular pod. It is also automatically defined
when newly added --pids-limit option is used.
Fixes: #24418
Signed-off-by: Jan Kaluza <jkaluza@redhat.com>
Add a new option to allow for mounting artifacts in the container, the
syntax is added to the existing --mount option:
type=artifact,src=$artifactName,dest=/path[,digest=x][,title=x]
This works very similar to image mounts. The name is passed down into
the container config and then on each start we lookup the artifact and
the figure out which blobs to mount. There is no protaction against a
user removing the artifact while still being used in a container. When
the container is running the bind mounted files will stay there (as the
kernel keeps the mounts active even if the bind source was deleted).
On the next start it will fail to start as if it does not find the
artifact. The good thing is that this technically allows someone to
update the artifact with the new file by creating a new artifact with
the same name.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Use a helper struct to hold the mounts instead of returning 5+ return
values from the functions. This allows use to easily add more volume
types without having to update all return lines every time in the
future. And 5+ return values are really not readable anymore so this
should make it easier to follow the code.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This should be set only by podman as it is used for the podman generate
systemd --new command. For the api it was set to the system service
command which is simply pointless. It must be empty in these cases.
Fixes#25026
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fixes: https://github.com/containers/podman/issues/25002
Also add the ability to inspect containers for
UseImageHosts and UseImageHostname.
Finally fixed some bugs in handling of --no-hosts for Pods,
which I descovered.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Add --hosts-file flag to container create, container run and pod create
* Add HostsFile field to pod inspect and container inspect results
* Test BaseHostsFile config in containers.conf
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
New flags in a `podman update` can change the configuration of HealthCheck when the container is started, without having to restart or recreate the container.
This can help determine why a given container suddenly started failing HealthCheck without interfering with the services it provides. For example, reconfigure HealthCheck to keep logs longer than the usual last X results, store logs to other destinations, etc.
Fixes: https://issues.redhat.com/browse/RHEL-60561
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
These flags can affect the output of the HealtCheck log. Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via `podman inspect`.
It is also limited to the last five executions and the first 500 characters per execution.
This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic `healthcheck service failed` record.
- The `--health-log-destination` flag sets the destination of the HealthCheck log.
- `none`: (default behavior) `HealthCheckResults` are stored in overlay containers. (For example: `$runroot/healthcheck.log`)
- `directory`: creates a log file named `<container-ID>-healthcheck.log` with JSON `HealthCheckResults` in the specified directory.
- `events_logger`: The log will be written with logging mechanism set by events_loggeri. It also saves the log to a default directory, for performance on a system with a large number of logs.
- The `--health-max-log-count` flag sets the maximum number of attempts in the HealthCheck log file.
- A value of `0` indicates an infinite number of attempts in the log file.
- The default value is `5` attempts in the log file.
- The `--health-max-log-size` flag sets the maximum length of the log stored.
- A value of `0` indicates an infinite log length.
- The default value is `500` log characters.
Add --health-max-log-count flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-max-log-size flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-log-destination flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
The pod was set after we checked the namespace and the namespace code
only checked the --pod flag but didn't consider --pod-id-file option.
As such fix the check to first set the pod option on the spec then use
that for the namespace. Also make sure we always use an empty default
otherwise it would be impossible in the backend to know if a user
requested a specific userns or not, i.e. even in case of a set
PODMAN_USERNS env a container should still get the userns from the pod
and not use the var in this case. Therefore unset it from the default
cli value.
There are more issues here around --pod-id-file and cli validation that
does not consider the option as conflicting with --userns like --pod
does but I decided to fix the bug at hand and don't try to fix the
entire mess which most likely would take days.
Fixes#22931
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Moving from Go module v4 to v5 prepares us for public releases.
Move done using gomove [1] as with the v3 and v4 moves.
[1] https://github.com/KSubedi/gomove
Signed-off-by: Matt Heon <mheon@redhat.com>
SpecGen is our primary container creation abstraction, and is
used to connect our CLI to the Libpod container creation backend.
Because container creation has a million options (I exaggerate
only slightly), the struct is composed of several other structs,
many of which are quite large.
The core problem is that SpecGen is also an API type - it's used
in remote Podman. There, we have a client and a server, and we
want to respect the server's containers.conf. But how do we tell
what parts of SpecGen were set by the client explicitly, and what
parts were not? If we're not using nullable values, an explicit
empty string and a value never being set are identical - and we
can't tell if it's safe to grab a default from the server's
containers.conf.
Fortunately, we only really need to do this for booleans. An
empty string is sufficient to tell us that a string was unset
(even if the user explicitly gave us an empty string for an
option, filling in a default from the config file is acceptable).
This makes things a lot simpler. My initial attempt at this
changed everything, including strings, and it was far larger and
more painful.
Also, begin the first steps of removing all uses of
containers.conf defaults from client-side. Two are gone entirely,
the rest are marked as remove-when-possible.
[NO NEW TESTS NEEDED] This is just a refactor.
Signed-off-by: Matt Heon <mheon@redhat.com>
Cut is a cleaner & more performant api relative to SplitN(_, _, 2) added in go 1.18
Previously applied this refactoring to buildah:
https://github.com/containers/buildah/pull/5239
Signed-off-by: Philip Dubé <philip@peerdb.io>
add a new option --preserve-fd that allows to specify a list of FDs to
pass down to the container.
It is similar to --preserve-fds but it allows to specify a list of FDs
instead of the maximum FD number to preserve.
--preserve-fd and --preserve-fds are mutually exclusive.
It requires crun since runc would complain if any fd below
--preserve-fds is not preserved.
Closes: https://github.com/containers/podman/issues/20844
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Add --rdt-class=COS to the create and run command to enable the
assignment of a container to a Class of Service (COS). The COS
represents a part of the cache based on the Cache Allocation Technology
(CAT) feature that is part of Intel's Resource Director Technology
(Intel RDT) feature set. By assigning a container to a COS, all PID's of
the container have only access to the cache space defined for this COS.
The COS has to be pre-configured based on the resctrl kernel driver.
cat_l2 and cat_l3 flags in /proc/cpuinfo represent CAT support for cache
level 2 and 3 respectively.
Signed-off-by: Wolfgang Pross <wolfgang.pross@intel.com>
The original SELinux support in Docker and Podman does not follow the
default SELinux rules for how label transitions are supposed to be
handled. Containers always switch their user and role to
system_u:system_r, rather then maintain the collers user and role.
For example
unconfined_u:unconfined_r:container_t:s0:c1,c2
Advanced SELinux administrators want to confine users but still allow
them to create containers from their role, but not allow them to launch
a privileged container like spc_t.
This means if a user running as
container_user_u:container_user_r:container_user_t:s0
Ran a container they would get
container_user_u:container_user_r:container_t:s0:c1,c2
If they run a privileged container they would run it with:
container_user_u:container_user_r:container_user_t:s0
If they want to force the label they would get an error
podman run --security-opt label=type:spc_t ...
Should fail. Because the container_user_r can not run with the spc_t.
SELinux rules would also prevent the user from forcing system_u user and
the sytem_r role.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Chris Evich <cevich@redhat.com>
the combination --pod and --userns is already blocked. Ignore the
PODMAN_USERNS variable when a pod is used, since it would cause to
create a new user namespace for the container.
Ideally a container should be able to do that, but its user namespace
must be a child of the pod user namespace, not a sibling. Since
nested user namespaces are not allowed in the OCI runtime specs,
disallow this case, since the end result is just confusing for the
user.
Closes: https://github.com/containers/podman/issues/18580
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Add --restart flag to pod create to allow users to set the
restart policy for the pod, which applies to all the containers
in the pod. This reuses the restart policy already there for
containers and has the same restart policy options.
Add "never" to the restart policy options to match k8s syntax.
It is a synonym for "no" and does the exact same thing where the
containers are not restarted once exited.
Only the containers that have exited will be restarted based on the
restart policy, running containers will not be restarted when an exited
container is restarted in the same pod (same as is done in k8s).
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
The default_ulimits field is currently ignored in podman run commands.
This PR fixes this.
Fixes: https://github.com/containers/podman/issues/17396
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Currently Podman prevents SELinux container separation,
when running within a container. This PR adds a new
--security-opt label=nested
When setting this option, Podman unmasks and mountsi
/sys/fs/selinux into the containers making /sys/fs/selinux
fully exposed. Secondly Podman sets the attribute
run.oci.mount_context_type=rootcontext
This attribute tells crun to mount volumes with rootcontext=MOUNTLABEL
as opposed to context=MOUNTLABEL.
With these two settings Podman inside the container is allowed to set
its own SELinux labels on tmpfs file systems mounted into its parents
container, while still being confined by SELinux. Thus you can have
nested SELinux labeling inside of a container.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* add tests
* add documentation for --shm-size-systemd
* add support for both pod and standalone run
Signed-off-by: danishprakash <danish.prakash@suse.com>
Reserved annotations are used internally by Podman and would effect
nothing when run with Kubernetes so we should not be generating these
annotations.
Fixes: https://github.com/containers/podman/issues/17105
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If you are running temporary containers within podman play kube
we should really be running these in read-only mode. For automotive
they plan on running all of their containers in read-only temporal
mode. Adding this option guarantees that the container image is not
being modified during the running of the container.
The containers can only write to tmpfs mounted directories.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
False is the assumed value, and inspect and podman generate kube are
being cluttered with a ton of annotations that indicate nothing.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The containers should be able to write to tmpfs mounted directories.
Also cleanup output of podman kube generate to not show default values.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Startup healthchecks are similar to K8S startup probes, in that
they are a separate check from the regular healthcheck that runs
before it. If the startup healthcheck fails repeatedly, the
associated container is restarted.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Podman adds an Error: to every error message. So starting an error
message with "error" ends up being reported to the user as
Error: error ...
This patch removes the stutter.
Also ioutil.ReadFile errors report the Path, so wrapping the err message
with the path causes a stutter.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
For systems that have extreme robustness requirements (edge devices,
particularly those in difficult to access environments), it is important
that applications continue running in all circumstances. When the
application fails, Podman must restart it automatically to provide this
robustness. Otherwise, these devices may require customer IT to
physically gain access to restart, which can be prohibitively difficult.
Add a new `--on-failure` flag that supports four actions:
- **none**: Take no action.
- **kill**: Kill the container.
- **restart**: Restart the container. Do not combine the `restart`
action with the `--restart` flag. When running inside of
a systemd unit, consider using the `kill` or `stop`
action instead to make use of systemd's restart policy.
- **stop**: Stop the container.
To remain backwards compatible, **none** is the default action.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
podman update allows users to change the cgroup configuration of an existing container using the already defined resource limits flags
from podman create/run. The supported flags in crun are:
this command is also now supported in the libpod api via the /libpod/containers/<CID>/update endpoint where
the resource limits are passed inthe request body and follow the OCI resource spec format
–memory
–cpus
–cpuset-cpus
–cpuset-mems
–memory-swap
–memory-reservation
–cpu-shares
–cpu-quota
–cpu-period
–blkio-weight
–cpu-rt-period
–cpu-rt-runtime
-device-read-bps
-device-write-bps
-device-read-iops
-device-write-iops
-memory-swappiness
-blkio-weight-device
resolves#15067
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Code from this dependency was replaced with a simple version. As a
result Podman's memory consumption has been reduced by ~10%.
[NO NEW TESTS NEEDED]
Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
Allow end users to preprocess default environment variables before
injecting them into container using `--env-merge`
Usage
```
podman run -it --rm --env-merge some=${some}-edit --env-merge
some2=${some2}-edit2 myimage sh
```
Closes: https://github.com/containers/podman/issues/15288
Signed-off-by: Aditya R <arajan@redhat.com>
pod resource limits introduced a regression where `FinishThrottleDevices` was not called for create/run
Signed-off-by: Charlie Doern <cdoern@redhat.com>
added the following flags and handling for podman pod create
--memory-swap
--cpuset-mems
--device-read-bps
--device-write-bps
--blkio-weight
--blkio-weight-device
--cpu-shares
given the new backend for systemd in c/common, all of these can now be exposed to pod create.
most of the heavy lifting (nearly all) is done within c/common. However, some rewiring needed to be done here
as well!
Signed-off-by: Charlie Doern <cdoern@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Previously, if a container had healthchecks disabled in the
docker-compose.yml file and the user did a `podman inspect <container>`,
they would have an incorrect output:
```
"Healthcheck":{
"Test":[
"CMD-SHELL",
"NONE"
],
"Interval":30000000000,
"Timeout":30000000000,
"Retries":3
}
```
After a quick change, the correct output is now the result:
```
"Healthcheck":{
"Test":[
"NONE"
]
}
```
Additionally, I extracted the hard-coded strings that were used for
comparisons into constants in `libpod/define` to prevent a similar issue
from recurring.
Closes: #14493
Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>