58 Commits

Author SHA1 Message Date
b87bdced1f Fix up handling of user defined network namespaces
If user specifies network namespace and the /etc/netns/XXX/resolv.conf
exists, we should use this rather then /etc/resolv.conf

Also fail cleaner if the user specifies an invalid Network Namespace.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-02-23 05:47:27 -05:00
7141f97270 OpenTracing support added to start, stop, run, create, pull, and ps
Drop context.Context field from cli.Context

Signed-off-by: Sebastian Jug <sejug@redhat.com>
2019-02-18 09:57:08 -05:00
52df1fa7e0 Fix volume handling in podman
iFix builtin volumes to work with podman volume

Currently builtin volumes are not recored in podman volumes when
they are created automatically. This patch fixes this.

Remove container volumes when requested

Currently the --volume option on podman remove does nothing.
This will implement the changes needed to remove the volumes
if the user requests it.

When removing a volume make sure that no container uses the volume.

Signed-off-by: Daniel J Walsh dwalsh@redhat.com
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-02-14 13:21:52 -05:00
5c86efb289 Merge pull request #2138 from giuseppe/rootless-pod-fix
rootless: fix usage of create --pod=new:FOO
2019-01-11 15:42:21 -08:00
b3e7be7a0b spec: add nosuid,noexec,nodev to ro bind mount
runc fails to change the ro mode of a rootless bind mount if the other
flags are not kept.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-01-11 10:34:30 +01:00
3966d3bf4e Replace tab with spaces in MarshalIndent in libpod
The json-iterator package will panic on attempting to use
MarshalIndent with a non-space indentation. This is sort of silly
but swapping from tabs to spaces is not a big issue for us, so
let's work around the silly panic.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-01-10 15:48:09 -05:00
167d50a9fa Move all libpod/ JSON references over to jsoniter
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-01-10 15:48:09 -05:00
edb285d176 apparmor: apply default profile at container initialization
Apply the default AppArmor profile at container initialization to cover
all possible code paths (i.e., podman-{start,run}) before executing the
runtime.  This allows moving most of the logic into pkg/apparmor.

Also make the loading and application of the default AppArmor profile
versio-indepenent by checking for the `libpod-default-` prefix and
over-writing the profile in the run-time spec if needed.

The intitial run-time spec of the container differs a bit from the
applied one when having started the container, which results in
displaying a potentially outdated AppArmor profile when inspecting
a container.  To fix that, load the container config from the file
system if present and use it to display the data.

Fixes: #2107
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2019-01-09 22:18:11 +01:00
7b9d4f1c92 Merge pull request #2061 from adrianreber/static-ip
Use existing interface to request IP address during restore
2019-01-09 07:41:47 -08:00
2553dad766 Use existing interface to request IP address during restore
The initial implementation to request the same IP address for a
container during a restore was based on environment variables
influencing CNI.

With this commit the IP address selection switches to Podman's internal
static IP API.

This commit does a comment change in libpod/container_easyjson.go to
avoid unnecessary re-generation of libpod/container_easyjson.go during
build as this fails in CI. The reason for this is that make sees that
libpod/container_easyjson.go needs to be re-created. The commit,
however, only changes a part of libpod/container.go which is marked as
'ffjson: skip'.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-01-09 07:34:57 +01:00
f6a2b6bf2b hooks: Add pre-create hooks for runtime-config manipulation
There's been a lot of discussion over in [1] about how to support the
NVIDIA folks and others who want to be able to create devices
(possibly after having loaded kernel modules) and bind userspace
libraries into the container.  Currently that's happening in the
middle of runc's create-time mount handling before the container
pivots to its new root directory with runc's incorrectly-timed
prestart hook trigger [2].  With this commit, we extend hooks with a
'precreate' stage to allow trusted parties to manipulate the config
JSON before calling the runtime's 'create'.

I'm recycling the existing Hook schema from pkg/hooks for this,
because we'll want Timeout for reliability and When to avoid the
expense of fork/exec when a given hook does not need to make config
changes [3].

[1]: https://github.com/opencontainers/runc/pull/1811
[2]: https://github.com/opencontainers/runc/issues/1710
[3]: https://github.com/containers/libpod/issues/1828#issuecomment-439888059

Signed-off-by: W. Trevor King <wking@tremily.us>
2019-01-08 21:06:17 -08:00
df99522c67 Fixes to handle /dev/shm correctly.
We had two problems with /dev/shm, first, you mount the
container read/only then /dev/shm was mounted read/only.
This is a bug a tmpfs directory should be read/write within
a read-only container.

The second problem is we were ignoring users mounted /dev/shm
from the host.

If user specified

podman run -d -v /dev/shm:/dev/shm ...

We were dropping this mount and still using the internal mount.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-12-24 09:03:53 -05:00
aa9507054d Containers sharing a netns should share resolv/hosts
When sharing a network namespace, containers should also share
resolv.conf and /etc/hosts in case a container process made
changes to either (for example, if I set up a VPN client in
container A and join container B to its network namespace, I
expect container B to use the DNS servers from A to ensure it can
see everything on the VPN).

Resolves: #1546

Signed-off-by: Matthew Heon <mheon@redhat.com>
2018-12-11 16:56:11 -05:00
bc57ecec42 Prevent a second lookup of user for image volumes
Instead of forcing another user lookup when mounting image
volumes, just use the information we looked up when we started
generating the spec.

This may resolve #1817

Signed-off-by: Matthew Heon <mheon@redhat.com>
2018-12-11 13:36:50 -05:00
39a036e24d bind mount /etc/resolv.conf|hosts in pods
containers inside pods need to make sure they get /etc/resolv.conf
and /etc/hosts bind mounted when network is expected

Signed-off-by: baude <bbaude@redhat.com>
2018-12-06 13:56:57 -06:00
be74acee1c Merge pull request #1940 from wking/numeric-gid
libpod/container_internal_linux: Allow gids that aren't in the group file
2018-12-05 08:09:58 -08:00
650f95cb06 libpod/container_internal_linux: Allow gids that aren't in the group file
When an image config sets config.User [1] to a numeric group (like
1000:1000), but those values do not exist in the container's
/etc/group, libpod is currently breaking:

  $ podman run --rm registry.svc.ci.openshift.org/ci-op-zvml7cd6/pipeline:installer --help
  error creating temporary passwd file for container 228f6e9943d6f18b93c19644e9b619ec4d459a3e0eb31680e064eeedf6473678: unable to get gid 1000 from group file: no matching entries in group file

However, the OCI spec requires converters to copy numeric uid and gid
to the runtime config verbatim [2].

With this commit, I'm frontloading the "is groupspec an integer?"
check and only bothering with lookup.GetGroup when it was not.

I've also removed a few .Mounted checks, which are originally from
00d38cb3 (podman create/run need to load information from the image,
2017-12-18, #110).  We don't need a mounted container filesystem to
translate integers.  And when the lookup code needs to fall back to
the mounted root to translate names, it can handle erroring out
internally (and looking it over, it seems to do that already).

[1]: https://github.com/opencontainers/image-spec/blame/v1.0.1/config.md#L118-L123
[2]: https://github.com/opencontainers/image-spec/blame/v1.0.1/conversion.md#L70

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-12-04 12:00:42 -08:00
a4b483c848 libpod/container_internal: Deprecate implicit hook directories
Part of the motivation for 800eb863 (Hooks supports two directories,
process default and override, 2018-09-17, #1487) was [1]:

> We only use this for override. The reason this was caught is people
> are trying to get hooks to work with CoreOS. You are not allowed to
> write to /usr/share... on CoreOS, so they wanted podman to also look
> at /etc, where users and third parties can write.

But we'd also been disabling hooks completely for rootless users.  And
even for root users, the override logic was tricky when folks actually
had content in both directories.  For example, if you wanted to
disable a hook from the default directory, you'd have to add a no-op
hook to the override directory.

Also, the previous implementation failed to handle the case where
there hooks defined in the override directory but the default
directory did not exist:

  $ podman version
  Version:       0.11.2-dev
  Go Version:    go1.10.3
  Git Commit:    "6df7409cb5a41c710164c42ed35e33b28f3f7214"
  Built:         Sun Dec  2 21:30:06 2018
  OS/Arch:       linux/amd64
  $ ls -l /etc/containers/oci/hooks.d/test.json
  -rw-r--r--. 1 root root 184 Dec  2 16:27 /etc/containers/oci/hooks.d/test.json
  $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook
  time="2018-12-02T21:31:19-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d"
  time="2018-12-02T21:31:19-08:00" level=warning msg="failed to load hooks: {}%!(EXTRA *os.PathError=open /usr/share/containers/oci/hooks.d: no such file or directory)"

With this commit:

  $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook
  time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d"
  time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /etc/containers/oci/hooks.d"
  time="2018-12-02T21:33:07-08:00" level=debug msg="added hook /etc/containers/oci/hooks.d/test.json"
  time="2018-12-02T21:33:07-08:00" level=debug msg="hook test.json matched; adding to stages [prestart]"
  time="2018-12-02T21:33:07-08:00" level=warning msg="implicit hook directories are deprecated; set --hooks-dir="/etc/containers/oci/hooks.d" explicitly to continue to load hooks from this directory"
  time="2018-12-02T21:33:07-08:00" level=error msg="container create failed: container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"process_linux.go:382: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: oh, noes!\\\\n\\\"\""

(I'd setup the hook to error out).  You can see that it's silenly
ignoring the ENOENT for /usr/share/containers/oci/hooks.d and
continuing on to load hooks from /etc/containers/oci/hooks.d.

When it loads the hook, it also logs a warning-level message
suggesting that callers explicitly configure their hook directories.
That will help consumers migrate, so we can drop the implicit hook
directories in some future release.  When folks *do* explicitly
configure hook directories (via the newly-public --hooks-dir and
hooks_dir options), we error out if they're missing:

  $ podman --hooks-dir /does/not/exist run --rm docker.io/library/alpine echo 'successful container'
  error setting up OCI Hooks: open /does/not/exist: no such file or directory

I've dropped the trailing "path" from the old, hidden --hooks-dir-path
and hooks_dir_path because I think "dir(ectory)" is already enough
context for "we expect a path argument".  I consider this name change
non-breaking because the old forms were undocumented.

Coming back to rootless users, I've enabled hooks now.  I expect they
were previously disabled because users had no way to avoid
/usr/share/containers/oci/hooks.d which might contain hooks that
required root permissions.  But now rootless users will have to
explicitly configure hook directories, and since their default config
is from ~/.config/containers/libpod.conf, it's a misconfiguration if
it contains hooks_dir entries which point at directories with hooks
that require root access.  We error out so they can fix their
libpod.conf.

[1]: https://github.com/containers/libpod/pull/1487#discussion_r218149355

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-12-03 12:54:30 -08:00
f3289fed2e Merge pull request #1880 from baude/f29fixes
Fix golang formatting issues
2018-11-28 08:18:24 -08:00
ade0b30844 Merge pull request #1846 from cgwalters/netns-dns-localhost
Use host's resolv.conf if no network namespace enabled
2018-11-28 07:58:55 -08:00
61d4db4806 Fix golang formatting issues
Whe running unittests on newer golang versions, we observe failures with some
formatting types when no declared correctly.

Signed-off-by: baude <bbaude@redhat.com>
2018-11-28 09:26:24 -06:00
0592558289 Use also a struct to pass options to Restore()
This is basically the same change as

 ff47a4c2d5485fc49f937f3ce0c4e2fd6bdb1956 (Use a struct to pass options to Checkpoint())

just for the Restore() function. It is used to pass multiple restore
options to the API and down to conmon which is used to restore
containers. This is for the upcoming changes to support checkpointing
and restoring containers with '--tcp-established'.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-28 08:00:37 +01:00
870eed9378 Use host's resolv.conf if no network namespace enabled
My host system runs Fedora Silverblue 29 and I have NetworkManager's
`dns=dnsmasq` setting enabled, so my `/etc/resolv.conf` only has
`127.0.0.1`.

I also run my development podman containers with `--net=host`
for various reasons.

If we have a host network namespace, there's no reason not to just
use the host's nameserver configuration either.

This fixes e.g. accessing content on a VPN, and is also faster
since the container is using cached DNS.

I know this doesn't solve the bigger picture issue of localhost-DNS
conflicting with bridged networking, but that's far more involved,
probably requiring a DNS proxy in the container.  This patch
makes my workflow a lot nicer and was easy to write.

Signed-off-by: Colin Walters <walters@verbum.org>
2018-11-27 15:28:09 -05:00
049defa984 Merge pull request #1850 from vrothberg/mount-propagation
set root propagation based on volume properties
2018-11-27 03:29:17 -08:00
1d3e24239a Merge pull request #1734 from rhatdan/network
libpod should know if the network is disabled
2018-11-27 03:29:07 -08:00
0e2042ebd7 set root propagation based on volume properties
Set the root propagation based on the properties of volumes and default
mounts.  To remain compatibility, follow the semantics of Docker.  If a
volume is shared, keep the root propagation shared which works for slave
and private volumes too.  For slave volumes, it can either be shared or
rshared.  Do not change the root propagation for private volumes and
stick with the default.

Fixes: #1834
Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
2018-11-26 13:55:02 +01:00
b0572d6229 Added option to keep containers running after checkpointing
CRIU supports to leave processes running after checkpointing:

  -R|--leave-running    leave tasks in running state after checkpoint

runc also support to leave containers running after checkpointing:

   --leave-running      leave the process running after checkpointing

With this commit the support to leave a container running after
checkpointing is brought to Podman:

   --leave-running, -R  leave the container running after writing checkpoint to disk

Now it is possible to checkpoint a container at some point in time
without stopping the container. This can be used to rollback the
container to an early state:

$ podman run --tmpfs /tmp --name podman-criu-test -d docker://docker.io/yovfiatbeb/podman-criu-test
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
3
$ podman container checkpoint -R -l
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
4
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
5
$ podman stop -l
$ podman container restore -l
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
4

So after checkpointing the container kept running and was stopped after
some time. Restoring this container will restore the state right at the
checkpoint.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-20 17:25:44 +01:00
ff47a4c2d5 Use a struct to pass options to Checkpoint()
For upcoming changes to the Checkpoint() functions this commit switches
checkpoint options from a boolean to a struct, so that additional
options can be passed easily to Checkpoint() without changing the
function parameters all the time.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-20 17:25:44 +01:00
bb6c1cf8d1 libpod should know if the network is disabled
/etc/resolv.conf and /etc/hosts should not be created and mounted when the
network is disabled.

We should not be calling the network setup and cleanup functions when it is
disabled either.

In doing this patch, I found that all of the bind mounts were particular to
Linux along with the generate functions, so I moved them to
container_internal_linux.go

Since we are checking if we are using a network namespace, we need to check
after the network namespaces has been created in the spec.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-11-13 06:33:10 -05:00
7e15084d19 Accurately update state if prepare() partially fails
We are seeing some issues where, when part of prepare() fails
(originally noticed due to a bad static IP), the other half does
not successfully clean up, and the state can be left in a bad
place (not knowing about an active SHM mount for example).

Signed-off-by: Matthew Heon <mheon@redhat.com>
2018-11-08 16:51:57 -05:00
1370c311f5 Merge pull request #1771 from baude/prepare
move defer'd function declaration ahead of prepare error return
2018-11-07 10:55:51 -08:00
e022efa0f8 move defer'd function declaration ahead of prepare error return
Signed-off-by: baude <bbaude@redhat.com>
2018-11-07 10:44:33 -06:00
f813881b81 rootless: mount /sys/fs/cgroup/systemd from the host
systemd requires /sys/fs/cgroup/systemd to be writeable.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-07 16:10:34 +01:00
11c5b0237b rootless: don't bind mount /sys/fs/cgroup/systemd in systemd mode
it is not writeable by non-root users so there is no point in having
access to it from a container.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-07 16:10:33 +01:00
1dd7f13dfb get user and group information using securejoin and runc's user library
for the purposes of performance and security, we use securejoin to contstruct
the root fs's path so that symlinks are what they appear to be and no pointing
to something naughty.

then instead of chrooting to parse /etc/passwd|/etc/group, we now use the runc user/group
methods which saves us quite a bit of performance.

Signed-off-by: baude <bbaude@redhat.com>
2018-10-29 08:59:46 -05:00
3efa068528 Merge pull request #1699 from baude/rund
run performance improvements
2018-10-25 05:59:31 -07:00
6246942d37 Increase security and performance when looking up groups
We implement the securejoin method to make sure the paths to /etc/passwd and
/etc/group are not symlinks to something naughty or outside the container
image. And then instead of actually chrooting, we use the runc functions to
get information about a user.  The net result is increased security and
a a performance gain from 41ms to 100us.

Signed-off-by: baude <bbaude@redhat.com>
2018-10-25 06:42:43 -05:00
e2aef6341d run prepare in parallel
run prepare() -- which consists of creating a network namespace and
mounting the container image is now run in parallel.   This saves 25-40ms.

Signed-off-by: baude <bbaude@redhat.com>
2018-10-25 06:34:23 -05:00
8f6fb79ba8 Use the CRIU version check in checkpoint/restore
The newly introduced CRIU version check is now used to make sure
checkpointing and restoring is only used if the CRIU version is new
enough.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-10-23 12:52:03 +02:00
4a60656dbb Fix CGroup paths used for systemd CGroup mount
We already have functions for retrieving the container's CGroup
path, so use them instead of manually generating a path.

Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
2018-10-17 10:45:58 -04:00
57a8c2e5e8 Mount proper cgroup for systemd to manage inside of the container.
We are still requiring oci-systemd-hook to be installed in order to run
systemd within a container.  This patch properly mounts

/sys/fs/cgroup/systemd/libpod_parent/libpod-UUID on /sys/fs/cgroup/systemd inside of container.

Since we need the UUID of the container, we needed to move Systemd to be a config option of the
container.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-10-15 16:19:11 -04:00
f7c8fd8a3d Add support to checkpoint/restore containers
runc uses CRIU to support checkpoint and restore of containers. This
brings an initial checkpoint/restore implementation to podman.

None of the additional runc flags are yet supported and container
migration optimization (pre-copy/post-copy) is also left for the future.

The current status is that it is possible to checkpoint and restore a
container. I am testing on RHEL-7.x and as the combination of RHEL-7 and
CRIU has seccomp troubles I have to create the container without
seccomp.

With the following steps I am able to checkpoint and restore a
container:

 # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd
 # curl -I 10.22.0.78:8080
 HTTP/1.1 403 Forbidden # <-- this is actually a good answer
 # podman container checkpoint <container>
 # curl -I 10.22.0.78:8080
 curl: (7) Failed connect to 10.22.0.78:8080; No route to host
 # podman container restore <container>
 # curl -I 10.22.0.78:8080
 HTTP/1.1 403 Forbidden

I am using CRIU, runc and conmon from git. All required changes for
checkpoint/restore support in podman have been merged in the
corresponding projects.

To have the same IP address in the restored container as before
checkpointing, CNI is told which IP address to use.

If the saved network configuration cannot be found during restore, the
container is restored with a new IP address.

For CRIU to restore established TCP connections the IP address of the
network namespace used for restore needs to be the same. For TCP
connections in the listening state the IP address can change.

During restore only one network interface with one IP address is handled
correctly. Support to restore containers with more advanced network
configuration will be implemented later.

v2:
 * comment typo
 * print debug messages during cleanup of restore files
 * use createContainer() instead of createOCIContainer()
 * introduce helper CheckpointPath()
 * do not try to restore a container that is paused
 * use existing helper functions for cleanup
 * restructure code flow for better readability
 * do not try to restore if checkpoint/inventory.img is missing
 * git add checkpoint.go restore.go

v3:
 * move checkpoint/restore under 'podman container'

v4:
 * incorporated changes from latest reviews

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-10-03 21:41:39 +02:00
7ee6bf1573 Disable problematic SELinux code causing runc issues
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1541
Approved by: baude
2018-09-25 19:32:17 +00:00
52c1365f32 Add --mount option for create & run command
Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Closes: #1524
Approved by: mheon
2018-09-21 21:33:41 +00:00
fbfcc7842e Add new field to libpod to indicate whether or not to use labelling
Also update some missing fields libpod.conf obtions in man pages.

Fix sort order of security options and add a note about disabling
labeling.

When a process requests a new label.  libpod needs to reserve all
labels to make sure that their are no conflicts.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Closes: #1406
Approved by: mheon
2018-09-20 16:01:29 +00:00
2cbb8c216a Bind Mounts should be mounted read-only when in read-only mode
We don't want to allow users to write to /etc/resolv.conf or /etc/hosts if in read
only mode.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Closes: #1510
Approved by: TomSweeneyRedHat
2018-09-20 13:55:35 +00:00
663ee91eec Fix Mount Propagation
Default mount propagation inside of containes should be private

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Closes: #1305
Approved by: mheon
2018-08-27 13:26:28 +00:00
0e6266858a Fixing network ns segfault
As well as small style corrections, update pod_top_test to use CreatePod, and move handling of adding a container to the pod's namespace from container_internal_linux to libpod/option.

Signed-off-by: haircommander <pehunt@redhat.com>

Closes: #1187
Approved by: mheon
2018-08-23 18:16:28 +00:00
2a7449362f Change pause container to infra container
Signed-off-by: haircommander <pehunt@redhat.com>

Closes: #1187
Approved by: mheon
2018-08-23 18:16:28 +00:00
d5e690914d Added option to share kernel namespaces in libpod and podman
A pause container is added to the pod if the user opts in. The default pause image and command can be overridden. Pause containers are ignored in ps unless the -a option is present. Pod inspect and pod ps show shared namespaces and pause container. A pause container can't be removed with podman rm, and a pod can be removed if it only has a pause container.

Signed-off-by: haircommander <pehunt@redhat.com>

Closes: #1187
Approved by: mheon
2018-08-23 18:16:28 +00:00