Files
podman/docs/tutorials/rootless_tutorial.md
Sergio Oller 91b8bc7f13 uid/gid mapping flags
Motivation
===========

This feature aims to make --uidmap and --gidmap easier to use, especially in rootless podman setups.

(I will focus here on the --gidmap option, although the same applies for --uidmap.)

In rootless podman, the user namespace mapping happens in two steps, through an intermediate mapping.

See https://docs.podman.io/en/latest/markdown/podman-run.1.html#uidmap-container-uid-from-uid-amount
for further detail, here is a summary:

First the user GID is mapped to 0 (root), and all subordinate GIDs (defined at /etc/subgid, and
usually >100000) are mapped starting at 1.

One way to customize the mapping is through the `--gidmap` option, that maps that intermediate mapping
to the final mapping that will be seen by the container.

As an example, let's say we have as main GID the group 1000, and we also belong to the additional GID 2000,
that we want to make accessible inside the container.

We first ask the sysadmin to subordinate the group to us, by adding "$user:2000:1" to /etc/subgid.

Then we need to use --gidmap to specify that we want to map GID 2000 into some GID inside the container.

And here is the first trouble:

Since the --gidmap option operates on the intermediate mapping, we first need to figure out where has
podman placed our GID 2000 in that intermediate mapping using:

    podman unshare cat /proc/self/gid_map

Then, we may see that GID 2000 was mapped to intermediate GID 5. So our --gidmap option should include:

    --gidmap 20000:5:1

This intermediate mapping may change in the future if further groups are subordinated to us (or we stop
having its subordination), so we are forced to verify the mapping with
`podman unshare cat /proc/self/gid_map` every time, and parse it if we want to script it.

**The first usability improvement** we agreed on #18333 is to be able to use:

    --gidmap 20000:@2000:1

so podman does this lookup in the parent user namespace for us.

But this is only part of the problem. We must specify a **full** gidmap and not only what we want:

    --gidmap 0:0:5 --gidmap 5:6:15000 --gidmap 20000:5:1

This is becoming complicated. We had to break the gidmap at 5, because the intermediate 5 had to
be mapped to another value (20000), and then we had to keep mapping all other subordinate ids... up to
close to the maximum number of subordinate ids that we have (or some reasonable value). This is hard
to explain to someone who does not understand how the mappings work internally.

To simplify this, **the second usability improvement** is to be able to use:

   --gidmap "+20000:@2000:1"

where the plus flag (`+`) states that the given mapping should extend any previous/default mapping,
overriding any previous conflicting assignment.

Podman will set that mapping and fill the rest of mapped gids with all other subordinated gids, leading
to the same (or an equivalent) full gidmap that we were specifying before.

One final usability improvement related to this is the following:

By default, when podman  gets a --gidmap argument but not a --uidmap argument, it copies the mapping.
This is convenient in many scenarios, since usually subordinated uids and gids are assigned in chunks
simultaneously, and the subordinated IDs in /etc/subuid and /etc/subgid for a given user match.

For scenarios with additional subordinated GIDs, this map copying is annoying, since it forces the user
to provide a --uidmap, to prevent the copy from being made. This means, that when the user wants:

    --gidmap 0:0:5 --gidmap 5:6:15000 --gidmap 20000:5:1

The user has to include a uidmap as well:

    --gidmap 0:0:5 --gidmap 5:6:15000 --gidmap 20000:5:1 --uidmap 0:0:65000

making everything even harder to understand without proper context.

For this reason, besides the "+" flag, we introduce the "u" and "g" flags. Those flags applied to a
mapping tell podman that the mapping should only apply to users or groups, and ignored otherwise.

Therefore we can use:

   --gidmap "+g20000:@2000:1"

So the mapping only applies to groups and is ignored for uidmaps. If no "u" nor "g" flag is assigned
podman assumes the mapping applies to both users and groups as before, so we preserve backwards compatibility.

Co-authored-by: Tom Sweeney <tsweeney@redhat.com>
Signed-off-by: Sergio Oller <sergioller@gmail.com>
2023-08-28 20:21:04 +02:00

12 KiB
Raw Blame History

PODMAN logo

Basic Setup and Use of Podman in a Rootless environment.

Prior to allowing users without root privileges to run Podman, the administrator must install or build Podman and complete the following configurations.

cgroup V2 support

The cgroup V2 Linux kernel feature allows the user to limit the amount of resources a rootless container can use. If the Linux distribution that you are running Podman on is enabled with cgroup V2 then you might need to change the default OCI Runtime. Some older versions of runc do not work with cgroup V2, you might have to switch to the alternative OCI runtime crun.

The alternative OCI runtime support for cgroup V2 can also be turned on at the command line by using the --runtime option:

podman --runtime crun

or for all commands by changing the value for the "Default OCI runtime" in the containers.conf file either at the system level or at the user level from runtime = "runc" to runtime = "crun".

Administrator Actions

Installing Podman

For installing Podman, see the installation instructions.

Building Podman

For building Podman, see the build instructions.

Install slirp4netns

The slirp4netns package provides user-mode networking for unprivileged network namespaces and must be installed on the machine in order for Podman to run in a rootless environment. The package is available on most Linux distributions via their package distribution software such as yum, dnf, apt, zypper, etc. If the package is not available, you can build and install slirp4netns from GitHub.

Ensure fuse-overlayfs is installed

When using Podman in a rootless environment, it is recommended to use fuse-overlayfs rather than the VFS file system. For that you need the fuse-overlayfs executable available in $PATH.

Your distribution might already provide it in the fuse-overlayfs package, but be aware that you need at least version 0.7.6. This especially needs to be checked on Ubuntu distributions as fuse-overlayfs is not generally installed by default and the 0.7.6 version is not available natively on Ubuntu releases prior to 20.04.

The fuse-overlayfs project is available from GitHub, and provides instructions for easily building a static fuse-overlayfs executable.

If Podman is used before fuse-overlayfs is installed, it may be necessary to adjust the storage.conf file (see "User Configuration Files" below) to change the driver option under [storage] to "overlay" and point the mount_program option in [storage.options.overlay] to the path of the fuse-overlayfs executable:

[storage]
  driver = "overlay"

  (...)

[storage.options.overlay]

  (...)

  mount_program = "/usr/bin/fuse-overlayfs"

Enable user namespaces (on RHEL7 machines)

The number of user namespaces that are allowed on the system is specified in the file /proc/sys/user/max_user_namespaces. On most Linux platforms this is preset by default and no adjustment is necessary. However, on RHEL7 machines, a user with root privileges may need to set that to a reasonable value by using this command: sysctl user.max_user_namespaces=15000.

/etc/subuid and /etc/subgid configuration

Rootless Podman requires the user running it to have a range of UIDs listed in the files /etc/subuid and /etc/subgid. The shadow-utils or newuid package provides these files on different distributions and they must be installed on the system. Root privileges are required to add or update entries within these files. The following is a summary from the How does rootless Podman work? article by Dan Walsh on opensource.com

For each user that will be allowed to create containers, update /etc/subuid and /etc/subgid for the user with fields that look like the following. Note that the values for each user must be unique. If there is overlap, there is a potential for a user to use another user's namespace and they could corrupt it.

cat /etc/subuid
johndoe:100000:65536
test:165536:65536

The format of this file is USERNAME:UID:RANGE

  • username as listed in /etc/passwd or in the output of getpwent.
  • The initial UID allocated for the user.
  • The size of the range of UIDs allocated for the user.

This means the user johndoe is allocated UIDs 100000-165535 as well as their standard UID in the /etc/passwd file. NOTE: this is not currently supported with network installs; these files must be available locally to the host machine. It is not possible to configure this with LDAP or Active Directory.

If you update either /etc/subuid or /etc/subgid, you need to stop all the running containers owned by the user and kill the pause process that is running on the system for that user. This can be done automatically by using the podman system migrate command which will stop all the containers for the user and will kill the pause process.

Rather than updating the files directly, the usermod program can be used to assign UIDs and GIDs to a user.

usermod --add-subuids 100000-165535 --add-subgids 100000-165535 johndoe
grep johndoe /etc/subuid /etc/subgid
/etc/subuid:johndoe:100000:65536
/etc/subgid:johndoe:100000:65536

Giving access to additional groups

Users can fully map additional groups to a container namespace if those groups subordinated to the user:

usermod --add-subgids 2000-2000 johndoe
grep johndoe /etc/subgid

This means the user johndoe can "impersonate" the group 2000 inside the container. Note that it is usually not a good idea to subordinate active user ids to other users, because it would allow user impersonation.

johndoe can use --group-add keep-groups to preserve the additional groups, and --gidmap="+g102000:@2000" to map the group 2000 in the host to the group 102000 in the container:

podman run \
  --rm \
  --group-add keep-groups \
  --gidmap="+g102000:@2000" \
  --volume "$PWD:/data:ro" \
  --workdir /data \
  alpine ls -lisa

Enable unprivileged ping

Users running in a non-privileged container may not be able to use the ping utility from that container.

If this is required, the administrator must verify that the UID of the user is part of the range in the /proc/sys/net/ipv4/ping_group_range file.

To change its value the administrator can use a call similar to: sysctl -w "net.ipv4.ping_group_range=0 2000000".

To make the change persist, the administrator will need to add a file with the .conf file extension in /etc/sysctl.d that contains net.ipv4.ping_group_range=0 $MAX_GID, where $MAX_GID is the highest assignable GID of the user running the container.

User Actions

The majority of the work necessary to run Podman in a rootless environment is on the shoulders of the machines administrator.

Once the Administrator has completed the setup on the machine and then the configurations for the user in /etc/subuid and /etc/subgid, the user can just start using any Podman command that they wish.

User Configuration Files

The Podman configuration files for root reside in /usr/share/containers with overrides in /etc/containers. In the rootless environment they reside in ${XDG_CONFIG_HOME}/containers (usually ~/.config/containers) and are owned by each individual user.

The three main configuration files are containers.conf, storage.conf and registries.conf. The user can modify these files as they wish.

containers.conf

Podman reads

  1. /usr/share/containers/containers.conf
  2. /etc/containers/containers.conf
  3. $HOME/.config/containers/containers.conf

if they exist in that order. Each file can override the previous for particular fields.

storage.conf

For storage.conf the order is

  1. /etc/containers/storage.conf
  2. $HOME/.config/containers/storage.conf

In rootless Podman certain fields in /etc/containers/storage.conf are ignored. These fields are:

graphroot=""
 container storage graph dir (default: "/var/lib/containers/storage")
 Default directory to store all writable content created by container storage programs.

runroot=""
 container storage run dir (default: "/run/containers/storage")
 Default directory to store all temporary writable content created by container storage programs.

In rootless Podman these fields default to

graphroot="$HOME/.local/share/containers/storage"
runroot="$XDG_RUNTIME_DIR/containers"

$XDG_RUNTIME_DIR defaults on most systems to /run/user/$UID.

registries

Registry configuration is read in by this order

  1. /etc/containers/registries.conf
  2. /etc/containers/registries.d/*
  3. HOME/.config/containers/registries.conf

The files in the home directory should be used to configure rootless Podman for personal needs. These files are not created by default. Users can copy the files from /usr/share/containers or /etc/containers and modify them.

Authorization files

The default authorization file used by the podman login and podman logout commands reside in ${XDG_RUNTIME_DIR}/containers/auth.json.

Using volumes

Rootless Podman is not, and will never be, root; it's not a setuid binary, and gains no privileges when it runs. Instead, Podman makes use of a user namespace to shift the UIDs and GIDs of a block of users it is given access to on the host (via the newuidmap and newgidmap executables) and your own user within the containers that Podman creates.

If your container runs with the root user, then root in the container is actually your user on the host. UID/GID 1 is the first UID/GID specified in your user's mapping in /etc/subuid and /etc/subgid, etc. If you mount a directory from the host into a container as a rootless user, and create a file in that directory as root in the container, you'll see it's actually owned by your user on the host.

So, for example,

> whoami
john

# a folder which is empty
host> ls /home/john/folder
host> podman run -v /home/john/folder:/container/volume mycontainer /bin/bash

# Now I'm in the container
root@container> whoami
root
root@container> touch /container/volume/test
root@container> ls -l /container/volume
total 0
-rw-r--r-- 1 root root 0 May 20 21:47 test
root@container> exit

# I check again
host> ls -l /home/john/folder
total 0
-rw-r--r-- 1 john john 0 May 20 21:47 test

We do recognize that this doesn't really match how many people intend to use rootless Podman - they want their UID inside and outside the container to match. Thus, we provide the --userns=keep-id flag, which ensures that your user is mapped to its own UID and GID inside the container.

It is also helpful to distinguish between running Podman as a rootless user, and a container which is built to run rootless. If the container you're trying to run has a USER which is not root, then when mounting volumes you must use --userns=keep-id. This is because the container user would not be able to become root and access the mounted volumes.

Another consideration in regards to volumes:

  • When providing the path of a directory you'd like to bind-mount, the path needs to be provided as an absolute path or a relative path that starts with . (a dot), otherwise the string will be interpreted as the name of a named volume.

More information

If you are still experiencing problems running Podman in a rootless environment, please refer to the Shortcomings of Rootless Podman page which lists known issues and solutions to known issues in this environment.

For more information on Podman and its subcommands, follow the links on the main README.md page or the podman.io web site.