166 Commits

Author SHA1 Message Date
3967c46544 vendor: update opencontainers/runtime-spec
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-08-21 19:06:04 +02:00
bd63a252f3 Don't limit the size on /run for systemd based containers
We had a customer incident where they ran out of space on /run.

If you don't specify size, it will be still limited to 50% or memory
available in the cgroup the container is running in.  If the cgroup is
unlimited then the /run will be limited to 50% of the total memory
on the system.

Also /run is mounted on the host as exec, so no reason for us to mount
it noexec.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-08-18 14:31:00 -04:00
7b3cf0c085 Change /sys/fs/cgroup/systemd mount to rprivate
I used the wrong propagation first time around because I forgot
that rprivate is the default propagation. Oops. Switch to
rprivate so we're using the default.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2020-08-12 09:15:02 -04:00
a064cfc99b Ensure correct propagation for cgroupsv1 systemd cgroup
On cgroups v1 systems, we need to mount /sys/fs/cgroup/systemd
into the container. We were doing this with no explicit mount
propagation tag, which means that, under some circumstances, the
shared mount propagation could be chosen - which, combined with
the fact that we need a mount to mask
/sys/fs/cgroup/systemd/release_agent in the container, means we
would leak a never-ending set of mounts under
/sys/fs/cgroup/systemd/ on container restart.

Fortunately, the fix is very simple - hardcode mount propagation
to something that won't leak.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2020-08-11 09:53:36 -04:00
333d9af77a Ensure WORKDIR from images is created
A recent crun change stopped the creation of the container's
working directory if it does not exist. This is arguably correct
for user-specified directories, to protect against typos; it is
definitely not correct for image WORKDIR, where the image author
definitely intended for the directory to be used.

This makes Podman create the working directory and chown it to
container root, if it does not already exist, and only if it was
specified by an image, not the user.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-08-03 14:44:52 -04:00
044a7cb100 Merge pull request #6991 from mheon/change_passwd_ondisk
Make changes to /etc/passwd on disk for non-read only
2020-07-29 14:27:50 -04:00
a5e37ad280 Switch all references to github.com/containers/libpod -> podman
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-07-28 08:23:45 -04:00
bae6853906 Make changes to /etc/passwd on disk for non-read only
Bind-mounting /etc/passwd into the container is problematic
becuase of how system utilities like `useradd` work. They want
to make a copy and then rename to try to prevent breakage; this
is, unfortunately, impossible when the file they want to rename
is a bind mount. The current behavior is fine for read-only
containers, though, because we expect useradd to fail in those
cases.

Instead of bind-mounting, we can edit /etc/passwd in the
container's rootfs. This is kind of gross, because the change
will show up in `podman diff` and similar tools, and will be
included in images made by `podman commit`. However, it's a lot
better than breaking important system tools.

Fixes #6953

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-07-23 14:27:19 -04:00
4c4a00f63e Support default profile for apparmor
Currently you can not apply an ApparmorProfile if you specify
--privileged.  This patch will allow both to be specified
simultaniosly.

By default Apparmor should be disabled if the user
specifies --privileged, but if the user specifies --security apparmor:PROFILE,
with --privileged, we should do both.

Added e2e run_apparmor_test.go

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-07-22 06:27:20 -04:00
d4d3fbc155 Add --umask flag for create, run
--umask sets the umask inside the container
Defaults to 0022

Co-authored-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Ashley Cui <acui@redhat.com>
2020-07-21 14:22:30 -04:00
020d81f113 Add support for overlay volume mounts in podman.
Add support -v for overlay volume mounts in podman.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Signed-off-by: Qi Wang <qiwan@redhat.com>
2020-07-20 09:48:55 -04:00
1ad7042a34 Preserve passwd on container restart
We added code to create a `/etc/passwd` file that we bind-mount
into the container in some cases (most notably,
`--userns=keep-id` containers). This, unfortunately, was not
persistent, so user-added users would be dropped on container
restart. Changing where we store the file should fix this.

Further, we want to ensure that lookups of users in the container
use the right /etc/passwd if we replaced it. There was already
logic to do this, but it only worked for user-added mounts; it's
easy enough to alter it to use our mounts as well.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-07-15 10:25:46 -04:00
4b784b377c Remove all instances of named return "err" from Libpod
This was inspired by https://github.com/cri-o/cri-o/pull/3934 and
much of the logic for it is contained there. However, in brief,
a named return called "err" can cause lots of code confusion and
encourages using the wrong err variable in defer statements,
which can make them work incorrectly. Using a separate name which
is not used elsewhere makes it very clear what the defer should
be doing.

As part of this, remove a large number of named returns that were
not used anywhere. Most of them were once needed, but are no
longer necessary after previous refactors (but were accidentally
retained).

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-07-09 13:54:47 -04:00
6c6670f12a Add username to /etc/passwd inside of container if --userns keep-id
If I enter a continer with --userns keep-id, my UID will be present
inside of the container, but most likely my user will not be defined.

This patch will take information about the user and stick it into the
container.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-07-07 08:34:31 -04:00
9532509c50 Merge pull request #6836 from ashley-cui/tzlibpod
Add --tz flag to create, run
2020-07-06 13:28:20 -04:00
8489dc4345 move go module to v2
With the advent of Podman 2.0.0 we crossed the magical barrier of go
modules.  While we were able to continue importing all packages inside
of the project, the project could not be vendored anymore from the
outside.

Move the go module to new major version and change all imports to
`github.com/containers/libpod/v2`.  The renaming of the imports
was done via `gomove` [1].

[1] https://github.com/KSubedi/gomove

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-07-06 15:50:12 +02:00
9a1543caec Add --tz flag to create, run
--tz flag sets timezone inside container
Can be set to IANA timezone as well as `local` to match host machine

Signed-off-by: Ashley Cui <acui@redhat.com>
2020-07-02 13:30:59 -04:00
6ee5f740a4 podman: add new cgroup mode split
When running under systemd there is no need to create yet another
cgroup for the container.

With conmon-delegated the current cgroup will be split in two sub
cgroups:

- supervisor
- container

The supervisor cgroup will hold conmon and the podman process, while
the container cgroup is used by the OCI runtime (using the cgroupfs
backend).

Closes: https://github.com/containers/libpod/issues/6400

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-25 17:16:12 +02:00
5b3503c0a1 Add container name to the /etc/hosts within the container
This will allow containers that connect to the network namespace be
able to use the container name directly.

For example you can do something like

podman run -ti --name foobar fedora ping foobar

While we can do this with hostname now, this seems more natural.

Also if another container connects on the network to this container it
can do

podman run --network container:foobar fedora ping foobar

And connect to the original container,without having to discover the name.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-06-20 06:20:46 -04:00
200cfa41a4 Turn on More linters
- misspell
    - prealloc
    - unparam
    - nakedret

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-06-15 07:05:56 -04:00
8ef1b461ae libpod: fix check for slirp4netns netns
fix the check for c.state.NetNS == nil.  Its value is changed in the
first code block, so the condition is always true in the second one
and we end up running slirp4netns twice.

Closes: https://github.com/containers/libpod/issues/6538

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-11 13:06:26 +02:00
6c27e27b8c container: do not set hostname when joining uts
do not set the hostname when joining an UTS namespace, as it could be
owned by a different userns.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-10 14:52:10 +02:00
a389eab8d1 container: make resolv.conf and hosts accessible in userns
when running in a new userns, make sure the resolv.conf and hosts
files bind mounted from another container are accessible to root in
the userns.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-10 14:46:48 +02:00
77e4b077b9 check --user range for rootless containers
Check --user range if it's a uid for rootless containers. Returns error if it is out of the range. From https://github.com/containers/libpod/issues/6431#issuecomment-636124686

Signed-off-by: Qi Wang <qiwan@redhat.com>
2020-06-02 11:28:58 -04:00
35829854a2 Fix mountpont in SecretMountsWithUIDGID
In FIPS Mode we expect to work off of the Mountpath not the Rundir path.
This is causing FIPS Mode checks to fail.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-05-19 16:33:24 -04:00
b69ba30b14 libpod: set hostname from joined container
when joining a UTS namespace, take the hostname from the destination
container.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-27 17:08:53 +02:00
e62d081770 Update podman to use containers.conf
Add more default options parsing

Switch to using --time as opposed to --timeout to better match Docker.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-04-20 16:11:36 -04:00
3a0a727110 userns: support --userns=auto
automatically pick an empty range and create an user namespace for the
container.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-06 16:32:36 +02:00
4352d58549 Add support for containers.conf
vendor in c/common config pkg for containers.conf

Signed-off-by: Qi Wang qiwan@redhat.com
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-03-27 14:36:03 -04:00
b6954758bb Attempt manual removal of CNI IP allocations on refresh
We previously attempted to work within CNI to do this, without
success. So let's do it manually, instead. We know where the
files should live, so we can remove them ourselves instead. This
solves issues around sudden reboots where containers do not have
time to fully tear themselves down, and leave IP address
allocations which, for various reasons, are not stored in tmpfs
and persist through reboot.

Fixes #5433

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-03-19 17:20:31 -04:00
b41c864d56 Ensure that exec sessions inherit supplemental groups
This corrects a regression from Podman 1.4.x where container exec
sessions inherited supplemental groups from the container, iff
the exec session did not specify a user.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-02-28 11:32:56 -05:00
97323808ed Add network options to podman pod create
Enables most of the network-related functionality from
`podman run` in `podman pod create`. Custom CNI networks can be
specified, host networking is supported, DNS options can be
configured.

Also enables host networking in `podman play kube`.

Fixes #2808
Fixes #3837
Fixes #4432
Fixes #4718
Fixes #4770

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-02-19 11:29:30 -05:00
67165b7675 make lint: enable gocritic
`gocritic` is a powerful linter that helps in preventing certain kinds
of errors as well as enforcing a coding style.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-01-13 14:27:02 +01:00
225c7ae6c9 Correctly export the root file-system changes
When doing a checkpoint with --export the root file-system diff was not
working as expected. Instead of getting the changes from the running
container to the highest storage layer it got the changes from the
highest layer to that parent's layer. For a one layer container this
could mean that the complete root file-system is part of the checkpoint.

With this commit this changes to use the same functionality as 'podman
diff'. This actually enables to correctly diff the root file-system
including tracking deleted files.

This also removes the non-working helper functions from libpod/diff.go.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-12-09 13:29:36 +01:00
b0b9103cca Allow chained network namespace containers
The code currently assumes that the container we delegate network
namespace to will never further delegate to another container, so
when looking up things like /etc/hosts and /etc/resolv.conf we
won't pull the correct files from the chained dependency. The
changes to resolve this are relatively simple - just need to keep
looking until we find a container without NetNsCtr set.

Fixes #4626

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-12-03 10:27:15 -05:00
e4275b3453 Merge pull request #4493 from mheon/add_removing_state
Add ContainerStateRemoving
2019-12-02 16:31:11 +01:00
5e43c7cde1 Disable checkpointing of containers started with --rm
Trying to checkpoint a container started with --rm works, but it makes
no sense as the container, including the checkpoint, will be deleted
after writing the checkpoint. This commit inhibits checkpointing
containers started with '--rm' unless '--export' is used. If the
checkpoint is exported it can easily be restored from the exported
checkpoint, even if '--rm' is used. To restore a container from a
checkpoint it is even necessary to manually run 'podman rm' if the
container is not started with '--rm'.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-11-28 20:25:45 +01:00
25cc43c376 Add ContainerStateRemoving
When Libpod removes a container, there is the possibility that
removal will not fully succeed. The most notable problems are
storage issues, where the container cannot be removed from
c/storage.

When this occurs, we were faced with a choice. We can keep the
container in the state, appearing in `podman ps` and available for
other API operations, but likely unable to do any of them as it's
been partially removed. Or we can remove it very early and clean
up after it's already gone. We have, until now, used the second
approach.

The problem that arises is intermittent problems removing
storage. We end up removing a container, failing to remove its
storage, and ending up with a container permanently stuck in
c/storage that we can't remove with the normal Podman CLI, can't
use the name of, and generally can't interact with. A notable
cause is when Podman is hit by a SIGKILL midway through removal,
which can consistently cause `podman rm` to fail to remove
storage.

We now add a new state for containers that are in the process of
being removed, ContainerStateRemoving. We set this at the
beginning of the removal process. It notifies Podman that the
container cannot be used anymore, but preserves it in the DB
until it is fully removed. This will allow Remove to be run on
these containers again, which should successfully remove storage
if it fails.

Fixes #3906

Signed-off-by: Matthew Heon <mheon@redhat.com>
2019-11-19 15:38:03 -05:00
368d2ecfb6 container-restore: Fix restore with user namespace
When restoring a container with user namespace, the user namespace is
created by the OCI runtime, and the network namespace is created after
the user namespace to ensure correct ownership.

In this case PostConfigureNetNS will be set and the value of
c.state.NetNS would be nil. Hence, the following error occurs:

    $ sudo podman run --name cr \
	   --uidmap 0:1000:500 \
	   -d docker.io/library/alpine \
	   /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'

    $ sudo podman container checkpoint cr
    $ sudo podman container restore cr
    ...
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x13a5e3c]

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2019-11-17 00:34:02 +00:00
2497b6c77b podman: add support for specifying MAC
I basically copied and adapted the statements for setting IP.

Closes #1136

Signed-off-by: Jakub Filak <jakub.filak@sap.com>
2019-11-06 16:22:19 +01:00
2a149ad90a Vendor in latest containers/buildah
Pull in changes to pkg/secrets/secrets.go that adds the
logic to disable fips mode if a pod/container has a
label set.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2019-11-01 09:41:09 -04:00
11c282ab02 add libpod/config
Refactor the `RuntimeConfig` along with related code from libpod into
libpod/config.  Note that this is a first step of consolidating code
into more coherent packages to make the code more maintainable and less
prone to regressions on the long runs.

Some libpod definitions were moved to `libpod/define` to resolve
circular dependencies.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2019-10-31 17:42:37 +01:00
0d5d6dab57 systemd: mask /sys/fs/cgroup/systemd/release_agent
when running in systemd mode on cgroups v1, make sure the
/sys/fs/cgroup/systemd/release_agent is masked otherwise the container
is able to modify it and execute scripts on the host.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-10-25 21:50:29 +02:00
b6a7d88397 When restoring containers, reset cgroup path
Previously, `podman checkport restore` with exported containers,
when told to create a new container based on the exported
checkpoint, would create a new container, with a new container
ID, but not reset CGroup path - which contained the ID of the
original container.

If this was done multiple times, the result was two containers
with the same cgroup paths. Operations on these containers would
this have a chance of crossing over to affect the other one; the
most notable was `podman rm` once it was changed to use the --all
flag when stopping the container; all processes in the cgroup,
including the ones in the other container, would be stopped.

Reset cgroups on restore to ensure that the path matches the ID
of the container actually being run.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-10-10 14:53:29 -04:00
6f630bc09b Move OCI runtime implementation behind an interface
For future work, we need multiple implementations of the OCI
runtime, not just a Conmon-wrapped runtime matching the runc CLI.

As part of this, do some refactoring on the interface for exec
(move to a struct, not a massive list of arguments). Also, add
'all' support to Kill and Stop (supported by runc and used a bit
internally for removing containers).

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-10-10 10:19:32 -04:00
e4835f6b01 Merge pull request #4086 from mheon/cni_del_on_refresh
Force a CNI Delete on refreshing containers
2019-09-25 09:35:40 +02:00
b57d2f4cc7 Force a CNI Delete on refreshing containers
CNI expects that a DELETE be run before re-creating container
networks. If a reboot occurs quickly enough that containers can't
stop and clean up, that DELETE never happens, and Podman
currently wipes the old network info and thinks the state has
been entirely cleared. Unfortunately, that may not be the case on
the CNI side. Some things - like IP address reservations - may
not have been cleared.

To solve this, manually re-run CNI Delete on refresh. If the
container has already been deleted this seems harmless. If not,
it should clear lingering state.

Fixes: #3759

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-09-24 09:52:11 -04:00
5813c8246e rootless: Rearrange setup of rootless containers
In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to run rootless containers by:
  1. create a network namespace
  2. pass the netns persistent mount path to the slirp4netns
     to create the tap inferface
  3. pass the netns path to the OCI spec, so the runtime can
     enter the netns

Closes #2897

Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
2019-09-24 11:01:28 +02:00
fb353f6f42 execuser: look at the source for /etc/{passwd,group} overrides
look if there are bind mounts that can shadow the /etc/passwd and
/etc/group files.  In that case, look at the bind mount source.

Closes: https://github.com/containers/libpod/pull/4068#issuecomment-533782941

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-21 22:11:09 +02:00
e42e1c45ae container: make sure $HOME is always set
If the HOME environment variable is not set, make sure it is set to
the configuration found in the container /etc/passwd file.

It was previously depending on a runc behavior that always set HOME
when it is not set.  The OCI runtime specifications do not require
HOME to be set so move the logic to libpod.

Closes: https://github.com/debarshiray/toolbox/issues/266

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-20 16:01:38 +02:00