221 Commits

Author SHA1 Message Date
72f03f0c25 Add support to disable creation of network config files
Specifically, we want to be able to specify whether resolv.conf
and /etc/hosts will be create and bind-mounted into the
container.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-03-27 10:12:18 -04:00
7f6f2f3f4a userns: use the intermediate mountns for volumes
when --uidmap is used, the user won't be able to access
/var/lib/containers/storage/volumes.  Use the intermediate mount
namespace, that is accessible to root in the container, for mounting
the volumes inside the container.

Closes: https://github.com/containers/libpod/issues/2713

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-03-21 21:18:13 +01:00
9d81be9614 Make sure buildin volumes have the same ownership and permissions as image
When creating a new image volume to be mounted into a container, we need to
make sure the new volume matches the Ownership and permissions of the path
that it will be mounted on.

For example if a volume inside of a containre image is owned by the database
UID, we want the volume to be mounted onto the image to be owned by the
database UID.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-03-15 10:44:44 -04:00
1c45b42e9f Merge pull request #2585 from giuseppe/build-honor-net
build: honor --net
2019-03-12 12:19:47 -07:00
66a72d9283 Ensure that tmpfs mounts do not have symlinks
When mounting a tmpfs, runc attempts to make the directory it
will be mounted at. Unfortunately, Golang's os.MkdirAll deals
very poorly with symlinks being part of the path. I looked into
fixing this in runc, but it's honestly much easier to just ensure
we don't trigger the issue on our end.

Fixes BZ #1686610

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-03-11 14:39:29 -04:00
e6139b4824 slirp4netns: add builtin DNS server to resolv.conf
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-03-11 11:42:01 +01:00
bb0377eb3d Don't delete another container's resolv and hosts files
The logic of deleting and recreating /etc/hosts and
/etc/resolv.conf only makes sense when we're the one that creates
the files - when we don't, it just removes them, and there's
nothing left to use.

Fixes #2602

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-03-10 12:18:12 -04:00
2f3875d009 Move secrets package to buildah
Trying to remove circular dependencies between libpod and buildah.

First step to move pkg content from libpod to buildah.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-03-08 16:08:44 -05:00
6c8f2072aa Append hosts to dependency container's /etc/hosts file
Before, any container with a netNS dependency simply used its dependency container's hosts file, and didn't abide its configuration (mainly --add-host). Fix this by always appending to the dependency container's hosts file, creating one if necessary.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-03-05 13:15:25 -05:00
43fe2bf064 Verify that used OCI runtime supports checkpoint
To be able to use OCI runtimes which do not implement checkpoint/restore
this adds a check to the checkpoint code path and the checkpoint/restore
tests to see if it knows about the checkpoint subcommand. If the used
OCI runtime does not implement checkpoint/restore the tests are skipped
and the actual 'podman container checkpoint' returns an error.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-03-01 08:08:55 +01:00
0a8a1deed1 Label CRIU log files correctly
CRIU creates a log file during checkpointing in .../userdata/dump.log.
The problem with this file is, is that CRIU injects a parasite code into
the container processes and this parasite code also writes to the same
log file. At this point a process from the inside of the container is
trying to access the log file on the outside of the container and
SELinux prohibits this. To enable writing to the log file from the
injected parasite code, this commit creates an empty log file and labels
the log file with c.MountLabel(). CRIU uses existing files when writing
it logs so the log file label persists and now, with the correct label,
SELinux no longer blocks access to the log file.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-02-26 11:28:54 +01:00
e45c442080 Merge pull request #2358 from rhatdan/namespace
Fix up handling of user defined network namespaces
2019-02-25 21:31:50 +01:00
c83e78277a In shared networkNS /etc/resolv.conf&/etc/hosts should be shared
We should just bind mount the original containers /etc/resolv.conf and /etchosts
into the new container.  Changes in the resolv.conf and hosts should be seen
by all containers,  This matches Docker behaviour.

In order to make this work the labels on these files need to have a shared
SELinux label.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-02-23 07:52:10 -05:00
b87bdced1f Fix up handling of user defined network namespaces
If user specifies network namespace and the /etc/netns/XXX/resolv.conf
exists, we should use this rather then /etc/resolv.conf

Also fail cleaner if the user specifies an invalid Network Namespace.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-02-23 05:47:27 -05:00
7141f97270 OpenTracing support added to start, stop, run, create, pull, and ps
Drop context.Context field from cli.Context

Signed-off-by: Sebastian Jug <sejug@redhat.com>
2019-02-18 09:57:08 -05:00
52df1fa7e0 Fix volume handling in podman
iFix builtin volumes to work with podman volume

Currently builtin volumes are not recored in podman volumes when
they are created automatically. This patch fixes this.

Remove container volumes when requested

Currently the --volume option on podman remove does nothing.
This will implement the changes needed to remove the volumes
if the user requests it.

When removing a volume make sure that no container uses the volume.

Signed-off-by: Daniel J Walsh dwalsh@redhat.com
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-02-14 13:21:52 -05:00
5c86efb289 Merge pull request #2138 from giuseppe/rootless-pod-fix
rootless: fix usage of create --pod=new:FOO
2019-01-11 15:42:21 -08:00
b3e7be7a0b spec: add nosuid,noexec,nodev to ro bind mount
runc fails to change the ro mode of a rootless bind mount if the other
flags are not kept.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-01-11 10:34:30 +01:00
3966d3bf4e Replace tab with spaces in MarshalIndent in libpod
The json-iterator package will panic on attempting to use
MarshalIndent with a non-space indentation. This is sort of silly
but swapping from tabs to spaces is not a big issue for us, so
let's work around the silly panic.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-01-10 15:48:09 -05:00
167d50a9fa Move all libpod/ JSON references over to jsoniter
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-01-10 15:48:09 -05:00
edb285d176 apparmor: apply default profile at container initialization
Apply the default AppArmor profile at container initialization to cover
all possible code paths (i.e., podman-{start,run}) before executing the
runtime.  This allows moving most of the logic into pkg/apparmor.

Also make the loading and application of the default AppArmor profile
versio-indepenent by checking for the `libpod-default-` prefix and
over-writing the profile in the run-time spec if needed.

The intitial run-time spec of the container differs a bit from the
applied one when having started the container, which results in
displaying a potentially outdated AppArmor profile when inspecting
a container.  To fix that, load the container config from the file
system if present and use it to display the data.

Fixes: #2107
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2019-01-09 22:18:11 +01:00
7b9d4f1c92 Merge pull request #2061 from adrianreber/static-ip
Use existing interface to request IP address during restore
2019-01-09 07:41:47 -08:00
2553dad766 Use existing interface to request IP address during restore
The initial implementation to request the same IP address for a
container during a restore was based on environment variables
influencing CNI.

With this commit the IP address selection switches to Podman's internal
static IP API.

This commit does a comment change in libpod/container_easyjson.go to
avoid unnecessary re-generation of libpod/container_easyjson.go during
build as this fails in CI. The reason for this is that make sees that
libpod/container_easyjson.go needs to be re-created. The commit,
however, only changes a part of libpod/container.go which is marked as
'ffjson: skip'.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-01-09 07:34:57 +01:00
f6a2b6bf2b hooks: Add pre-create hooks for runtime-config manipulation
There's been a lot of discussion over in [1] about how to support the
NVIDIA folks and others who want to be able to create devices
(possibly after having loaded kernel modules) and bind userspace
libraries into the container.  Currently that's happening in the
middle of runc's create-time mount handling before the container
pivots to its new root directory with runc's incorrectly-timed
prestart hook trigger [2].  With this commit, we extend hooks with a
'precreate' stage to allow trusted parties to manipulate the config
JSON before calling the runtime's 'create'.

I'm recycling the existing Hook schema from pkg/hooks for this,
because we'll want Timeout for reliability and When to avoid the
expense of fork/exec when a given hook does not need to make config
changes [3].

[1]: https://github.com/opencontainers/runc/pull/1811
[2]: https://github.com/opencontainers/runc/issues/1710
[3]: https://github.com/containers/libpod/issues/1828#issuecomment-439888059

Signed-off-by: W. Trevor King <wking@tremily.us>
2019-01-08 21:06:17 -08:00
df99522c67 Fixes to handle /dev/shm correctly.
We had two problems with /dev/shm, first, you mount the
container read/only then /dev/shm was mounted read/only.
This is a bug a tmpfs directory should be read/write within
a read-only container.

The second problem is we were ignoring users mounted /dev/shm
from the host.

If user specified

podman run -d -v /dev/shm:/dev/shm ...

We were dropping this mount and still using the internal mount.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-12-24 09:03:53 -05:00
aa9507054d Containers sharing a netns should share resolv/hosts
When sharing a network namespace, containers should also share
resolv.conf and /etc/hosts in case a container process made
changes to either (for example, if I set up a VPN client in
container A and join container B to its network namespace, I
expect container B to use the DNS servers from A to ensure it can
see everything on the VPN).

Resolves: #1546

Signed-off-by: Matthew Heon <mheon@redhat.com>
2018-12-11 16:56:11 -05:00
bc57ecec42 Prevent a second lookup of user for image volumes
Instead of forcing another user lookup when mounting image
volumes, just use the information we looked up when we started
generating the spec.

This may resolve #1817

Signed-off-by: Matthew Heon <mheon@redhat.com>
2018-12-11 13:36:50 -05:00
39a036e24d bind mount /etc/resolv.conf|hosts in pods
containers inside pods need to make sure they get /etc/resolv.conf
and /etc/hosts bind mounted when network is expected

Signed-off-by: baude <bbaude@redhat.com>
2018-12-06 13:56:57 -06:00
be74acee1c Merge pull request #1940 from wking/numeric-gid
libpod/container_internal_linux: Allow gids that aren't in the group file
2018-12-05 08:09:58 -08:00
650f95cb06 libpod/container_internal_linux: Allow gids that aren't in the group file
When an image config sets config.User [1] to a numeric group (like
1000:1000), but those values do not exist in the container's
/etc/group, libpod is currently breaking:

  $ podman run --rm registry.svc.ci.openshift.org/ci-op-zvml7cd6/pipeline:installer --help
  error creating temporary passwd file for container 228f6e9943d6f18b93c19644e9b619ec4d459a3e0eb31680e064eeedf6473678: unable to get gid 1000 from group file: no matching entries in group file

However, the OCI spec requires converters to copy numeric uid and gid
to the runtime config verbatim [2].

With this commit, I'm frontloading the "is groupspec an integer?"
check and only bothering with lookup.GetGroup when it was not.

I've also removed a few .Mounted checks, which are originally from
00d38cb3 (podman create/run need to load information from the image,
2017-12-18, #110).  We don't need a mounted container filesystem to
translate integers.  And when the lookup code needs to fall back to
the mounted root to translate names, it can handle erroring out
internally (and looking it over, it seems to do that already).

[1]: https://github.com/opencontainers/image-spec/blame/v1.0.1/config.md#L118-L123
[2]: https://github.com/opencontainers/image-spec/blame/v1.0.1/conversion.md#L70

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-12-04 12:00:42 -08:00
a4b483c848 libpod/container_internal: Deprecate implicit hook directories
Part of the motivation for 800eb863 (Hooks supports two directories,
process default and override, 2018-09-17, #1487) was [1]:

> We only use this for override. The reason this was caught is people
> are trying to get hooks to work with CoreOS. You are not allowed to
> write to /usr/share... on CoreOS, so they wanted podman to also look
> at /etc, where users and third parties can write.

But we'd also been disabling hooks completely for rootless users.  And
even for root users, the override logic was tricky when folks actually
had content in both directories.  For example, if you wanted to
disable a hook from the default directory, you'd have to add a no-op
hook to the override directory.

Also, the previous implementation failed to handle the case where
there hooks defined in the override directory but the default
directory did not exist:

  $ podman version
  Version:       0.11.2-dev
  Go Version:    go1.10.3
  Git Commit:    "6df7409cb5a41c710164c42ed35e33b28f3f7214"
  Built:         Sun Dec  2 21:30:06 2018
  OS/Arch:       linux/amd64
  $ ls -l /etc/containers/oci/hooks.d/test.json
  -rw-r--r--. 1 root root 184 Dec  2 16:27 /etc/containers/oci/hooks.d/test.json
  $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook
  time="2018-12-02T21:31:19-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d"
  time="2018-12-02T21:31:19-08:00" level=warning msg="failed to load hooks: {}%!(EXTRA *os.PathError=open /usr/share/containers/oci/hooks.d: no such file or directory)"

With this commit:

  $ podman --log-level=debug run --rm docker.io/library/alpine echo 'successful container' 2>&1 | grep -i hook
  time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d"
  time="2018-12-02T21:33:07-08:00" level=debug msg="reading hooks from /etc/containers/oci/hooks.d"
  time="2018-12-02T21:33:07-08:00" level=debug msg="added hook /etc/containers/oci/hooks.d/test.json"
  time="2018-12-02T21:33:07-08:00" level=debug msg="hook test.json matched; adding to stages [prestart]"
  time="2018-12-02T21:33:07-08:00" level=warning msg="implicit hook directories are deprecated; set --hooks-dir="/etc/containers/oci/hooks.d" explicitly to continue to load hooks from this directory"
  time="2018-12-02T21:33:07-08:00" level=error msg="container create failed: container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"process_linux.go:382: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: oh, noes!\\\\n\\\"\""

(I'd setup the hook to error out).  You can see that it's silenly
ignoring the ENOENT for /usr/share/containers/oci/hooks.d and
continuing on to load hooks from /etc/containers/oci/hooks.d.

When it loads the hook, it also logs a warning-level message
suggesting that callers explicitly configure their hook directories.
That will help consumers migrate, so we can drop the implicit hook
directories in some future release.  When folks *do* explicitly
configure hook directories (via the newly-public --hooks-dir and
hooks_dir options), we error out if they're missing:

  $ podman --hooks-dir /does/not/exist run --rm docker.io/library/alpine echo 'successful container'
  error setting up OCI Hooks: open /does/not/exist: no such file or directory

I've dropped the trailing "path" from the old, hidden --hooks-dir-path
and hooks_dir_path because I think "dir(ectory)" is already enough
context for "we expect a path argument".  I consider this name change
non-breaking because the old forms were undocumented.

Coming back to rootless users, I've enabled hooks now.  I expect they
were previously disabled because users had no way to avoid
/usr/share/containers/oci/hooks.d which might contain hooks that
required root permissions.  But now rootless users will have to
explicitly configure hook directories, and since their default config
is from ~/.config/containers/libpod.conf, it's a misconfiguration if
it contains hooks_dir entries which point at directories with hooks
that require root access.  We error out so they can fix their
libpod.conf.

[1]: https://github.com/containers/libpod/pull/1487#discussion_r218149355

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-12-03 12:54:30 -08:00
f3289fed2e Merge pull request #1880 from baude/f29fixes
Fix golang formatting issues
2018-11-28 08:18:24 -08:00
ade0b30844 Merge pull request #1846 from cgwalters/netns-dns-localhost
Use host's resolv.conf if no network namespace enabled
2018-11-28 07:58:55 -08:00
61d4db4806 Fix golang formatting issues
Whe running unittests on newer golang versions, we observe failures with some
formatting types when no declared correctly.

Signed-off-by: baude <bbaude@redhat.com>
2018-11-28 09:26:24 -06:00
0592558289 Use also a struct to pass options to Restore()
This is basically the same change as

 ff47a4c2d5485fc49f937f3ce0c4e2fd6bdb1956 (Use a struct to pass options to Checkpoint())

just for the Restore() function. It is used to pass multiple restore
options to the API and down to conmon which is used to restore
containers. This is for the upcoming changes to support checkpointing
and restoring containers with '--tcp-established'.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-28 08:00:37 +01:00
870eed9378 Use host's resolv.conf if no network namespace enabled
My host system runs Fedora Silverblue 29 and I have NetworkManager's
`dns=dnsmasq` setting enabled, so my `/etc/resolv.conf` only has
`127.0.0.1`.

I also run my development podman containers with `--net=host`
for various reasons.

If we have a host network namespace, there's no reason not to just
use the host's nameserver configuration either.

This fixes e.g. accessing content on a VPN, and is also faster
since the container is using cached DNS.

I know this doesn't solve the bigger picture issue of localhost-DNS
conflicting with bridged networking, but that's far more involved,
probably requiring a DNS proxy in the container.  This patch
makes my workflow a lot nicer and was easy to write.

Signed-off-by: Colin Walters <walters@verbum.org>
2018-11-27 15:28:09 -05:00
049defa984 Merge pull request #1850 from vrothberg/mount-propagation
set root propagation based on volume properties
2018-11-27 03:29:17 -08:00
1d3e24239a Merge pull request #1734 from rhatdan/network
libpod should know if the network is disabled
2018-11-27 03:29:07 -08:00
0e2042ebd7 set root propagation based on volume properties
Set the root propagation based on the properties of volumes and default
mounts.  To remain compatibility, follow the semantics of Docker.  If a
volume is shared, keep the root propagation shared which works for slave
and private volumes too.  For slave volumes, it can either be shared or
rshared.  Do not change the root propagation for private volumes and
stick with the default.

Fixes: #1834
Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
2018-11-26 13:55:02 +01:00
b0572d6229 Added option to keep containers running after checkpointing
CRIU supports to leave processes running after checkpointing:

  -R|--leave-running    leave tasks in running state after checkpoint

runc also support to leave containers running after checkpointing:

   --leave-running      leave the process running after checkpointing

With this commit the support to leave a container running after
checkpointing is brought to Podman:

   --leave-running, -R  leave the container running after writing checkpoint to disk

Now it is possible to checkpoint a container at some point in time
without stopping the container. This can be used to rollback the
container to an early state:

$ podman run --tmpfs /tmp --name podman-criu-test -d docker://docker.io/yovfiatbeb/podman-criu-test
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
3
$ podman container checkpoint -R -l
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
4
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
5
$ podman stop -l
$ podman container restore -l
$ curl 10.88.64.253:8080/examples/servlets/servlet/HelloWorldExample
4

So after checkpointing the container kept running and was stopped after
some time. Restoring this container will restore the state right at the
checkpoint.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-20 17:25:44 +01:00
ff47a4c2d5 Use a struct to pass options to Checkpoint()
For upcoming changes to the Checkpoint() functions this commit switches
checkpoint options from a boolean to a struct, so that additional
options can be passed easily to Checkpoint() without changing the
function parameters all the time.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-20 17:25:44 +01:00
bb6c1cf8d1 libpod should know if the network is disabled
/etc/resolv.conf and /etc/hosts should not be created and mounted when the
network is disabled.

We should not be calling the network setup and cleanup functions when it is
disabled either.

In doing this patch, I found that all of the bind mounts were particular to
Linux along with the generate functions, so I moved them to
container_internal_linux.go

Since we are checking if we are using a network namespace, we need to check
after the network namespaces has been created in the spec.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-11-13 06:33:10 -05:00
7e15084d19 Accurately update state if prepare() partially fails
We are seeing some issues where, when part of prepare() fails
(originally noticed due to a bad static IP), the other half does
not successfully clean up, and the state can be left in a bad
place (not knowing about an active SHM mount for example).

Signed-off-by: Matthew Heon <mheon@redhat.com>
2018-11-08 16:51:57 -05:00
1370c311f5 Merge pull request #1771 from baude/prepare
move defer'd function declaration ahead of prepare error return
2018-11-07 10:55:51 -08:00
e022efa0f8 move defer'd function declaration ahead of prepare error return
Signed-off-by: baude <bbaude@redhat.com>
2018-11-07 10:44:33 -06:00
f813881b81 rootless: mount /sys/fs/cgroup/systemd from the host
systemd requires /sys/fs/cgroup/systemd to be writeable.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-07 16:10:34 +01:00
11c5b0237b rootless: don't bind mount /sys/fs/cgroup/systemd in systemd mode
it is not writeable by non-root users so there is no point in having
access to it from a container.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-07 16:10:33 +01:00
1dd7f13dfb get user and group information using securejoin and runc's user library
for the purposes of performance and security, we use securejoin to contstruct
the root fs's path so that symlinks are what they appear to be and no pointing
to something naughty.

then instead of chrooting to parse /etc/passwd|/etc/group, we now use the runc user/group
methods which saves us quite a bit of performance.

Signed-off-by: baude <bbaude@redhat.com>
2018-10-29 08:59:46 -05:00
3efa068528 Merge pull request #1699 from baude/rund
run performance improvements
2018-10-25 05:59:31 -07:00
6246942d37 Increase security and performance when looking up groups
We implement the securejoin method to make sure the paths to /etc/passwd and
/etc/group are not symlinks to something naughty or outside the container
image. And then instead of actually chrooting, we use the runc functions to
get information about a user.  The net result is increased security and
a a performance gain from 41ms to 100us.

Signed-off-by: baude <bbaude@redhat.com>
2018-10-25 06:42:43 -05:00