Commit Graph

2770 Commits

Author SHA1 Message Date
OpenShift Merge Robot
ccf4ec295b Merge pull request #3671 from openSUSE/runtime-path-discovery
Add runtime and conmon path discovery
2019-08-01 10:04:19 +02:00
Sascha Grunert
7dfaef7766 Add runtime and conmon path discovery
The `$PATH` environment variable will now used as fallback if no valid
runtime or conmon path matches. The debug logs has been updated to state
the used executable.

Signed-off-by: Sascha Grunert <sgrunert@suse.com>
2019-08-01 08:32:25 +02:00
Giuseppe Scrivano
223fe64dc0 systemd, cgroupsv2: not bind mount /sys/fs/cgroup/systemd
when running on a cgroups v2 system, do not bind mount
the named hierarchy /sys/fs/cgroup/systemd as it doesn't exist
anymore.  Instead bind mount the entire /sys/fs/cgroup.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-08-01 07:31:06 +02:00
Matthew Heon
9dcd76e369 Ensure we generate a 'stopped' event on force-remove
When forcibly removing a container, we are initiating an explicit
stop of the container, which is not reflected in 'podman events'.
Swap to using our standard 'stop()' function instead of a custom
one for force-remove, and move the event into the internal stop
function (so internal calls also register it).

This does add one more database save() to `podman remove`. This
should not be a terribly serious performance hit, and does have
the desirable side effect of making things generally safer.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-31 17:29:14 -04:00
Matthew Heon
cc63aff571 System events are valid, don't error on them
The logfile driver was not aware that system events existed.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-31 17:28:42 -04:00
Matthew Heon
cdd5639d56 Expose Null eventer and allow its use in the Podman CLI
We need this specifically for tests, but others may find it
useful if they don't explicitly need events and don't want the
performance implications of using them.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-31 17:28:42 -04:00
Matthew Heon
8e8d1ac193 Add a flag to set events logger type
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-31 17:28:42 -04:00
Matthew Heon
ebacfbd091 podman: fix memleak caused by renaming and not deleting
the exit file

If the container exit code needs to be retained, it cannot be retained
in tmpfs, because libpod runs in a memcg itself so it can't leave
traces with a daemon-less design.

This wasn't a memleak detectable by kmemleak for example. The kernel
never lost track of the memory and there was no erroneous refcounting
either. The reference count dependencies however are not easy to track
because when a refcount is increased, there's no way to tell who's
still holding the reference. In this case it was a single page of
tmpfs pagecache holding a refcount that kept pinned a whole hierarchy
of dying memcg, slab kmem, cgropups, unrechable kernfs nodes and the
respective dentries and inodes. Such a problem wouldn't happen if the
exit file was stored in a regular filesystem because the pagecache
could be reclaimed in such case under memory pressure. The tmpfs page
can be swapped out, but that's not enough to release the memcg with
CONFIG_MEMCG_SWAP_ENABLED=y.

No amount of more aggressive kernel slab shrinking could have solved
this. Not even assigning slab kmem of dying cgroups to alive cgroup
would fully solve this. The only way to free the memory of a dying
cgroup when a struct page still references it, would be to loop over
all "struct page" in the kernel to find which one is associated with
the dying cgroup which is a O(N) operation (where N is the number of
pages and can reach billions). Linking all the tmpfs pages to the
memcg would cost less during memcg offlining, but it would waste lots
of memory and CPU globally. So this can't be optimized in the kernel.

A cronjob running this command can act as workaround and will allow
all slab cache to be released, not just the single tmpfs pages.

    rm -f /run/libpod/exits/*

This patch solved the memleak with a reproducer, booting with
cgroup.memory=nokmem and with selinux disabled. The reason memcg kmem
and selinux were disabled for testing of this fix, is because kmem
greatly decreases the kernel effectiveness in reusing partial slab
objects. cgroup.memory=nokmem is strongly recommended at least for
workstation usage. selinux needs to be further analyzed because it
causes further slab allocations.

The upstream podman commit used for testing is
1fe2965e4f (v1.4.4).

The upstream kernel commit used for testing is
f16fea666898dbdd7812ce94068c76da3e3fcf1e (v5.2-rc6).

Reported-by: Michele Baldessari <michele@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

<Applied with small tweaks to comments>
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-31 17:28:42 -04:00
Gabi Beyer
80dcd4bebc rootless: Rearrange setup of rootless containers
In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to run rootless containers by:
  1. create a network namespace
  2. pass the netns persistent mount path to the slirp4netns
     to create the tap inferface
  3. pass the netns path to the OCI spec, so the runtime can
     enter the netns

Closes #2897

Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
2019-07-30 23:28:52 +00:00
Gabi Beyer
ef8834aeab Add comment to describe postConfigureNetNS
Provide information stating what the postConfigureNetNS option
is used for.

Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
2019-07-30 23:28:52 +00:00
Daniel J Walsh
141c7a5165 Vendor in buildah 1.9.2
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-07-30 16:48:18 -04:00
OpenShift Merge Robot
680a383874 Merge pull request #3672 from petejohanson/32bit-build-fixes
Build fix for 32-bit systems.
2019-07-30 22:07:32 +02:00
Pete Johanson
32aaf8da56 Build fix for 32-bit systems.
* Fixes #3664.

Signed-off-by: Pete Johanson <peter@peterjohanson.com>
2019-07-30 12:25:36 -04:00
OpenShift Merge Robot
1a008958d4 Merge pull request #3661 from openSUSE/nixos-friendly-config
Update libpod.conf to be more friendly to NixOS
2019-07-30 16:33:48 +02:00
Sascha Grunert
52ae51c79f Update libpod.conf to be NixOS friendly
NixOS links the current system state to `/run/current-system`, so we
have to add these paths to the configuration files as well to work out
of the box.

Signed-off-by: Sascha Grunert <sgrunert@suse.com>
2019-07-30 12:59:11 +02:00
OpenShift Merge Robot
7d635ac1c5 Merge pull request #3656 from jwhonce/wip/env
Fix commit --changes env=X=Y
2019-07-29 21:57:08 +02:00
TomSweeneyRedHat
5779e89809 Touch up XDG, add rootless links
Touch up a number of formating issues for XDG_RUNTIME_DIRS in a number
of man pages.  Make use of the XDG_CONFIG_HOME environment variable
in a rootless environment if available, or set it if not.

Also added a number of links to the Rootless Podman config page and
added the location of the auth.json files to that doc.

Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
2019-07-29 11:29:41 -04:00
OpenShift Merge Robot
6665269ab8 Merge pull request #3233 from wking/fatal-requested-hook-directory-does-not-exist
libpod/container_internal: Make all errors loading explicitly configured hook dirs fatal
2019-07-29 16:39:08 +02:00
Jhon Honce
40bf0649af Fix commit --changes env=X=Y
Signed-off-by: Jhon Honce <jhonce@redhat.com>
2019-07-26 16:04:17 -07:00
OpenShift Merge Robot
0c4dfcfe57 Merge pull request #3639 from giuseppe/user-ns-container
podman: support --userns=ns|container
2019-07-26 15:06:06 +02:00
Giuseppe Scrivano
1d72f651e4 podman: support --userns=ns|container
allow to join the user namespace of another container.

Closes: https://github.com/containers/libpod/issues/3629

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-25 23:04:55 +02:00
Sascha Grunert
7630f1b52e Fix possible runtime panic if image history len is zero
We now return an empty string for the `Comment` field if an OCI v1 image
contains no history.

Signed-off-by: Sascha Grunert <sgrunert@suse.com>
2019-07-25 12:45:08 +02:00
Matthew Heon
f747a06d53 When retrieving volumes, only use exact names
We should not be fuzzy matching on volume names. Docker doesn't
do  it, and it doesn't make much sense. Everything requires exact
matches for names - only IDs allow partial matches.

Fixes #3635

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-24 22:30:16 -04:00
OpenShift Merge Robot
2283471f8d Merge pull request #3626 from mheon/fix_ps_segfault
Fix a segfault on Podman no-store commands with refresh
2019-07-24 14:45:01 +02:00
Peter Hunt
01a8483a59 refactor to reduce duplicated error parsing
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 16:49:04 -04:00
Matthew Heon
5fb4feb36a Fix a segfault on Podman no-store commands with refresh
When a command (like `ps`) requests no store be created, but also
requires a refresh be performed, we have to ignore its request
and initialize the store anyways to prevent segfaults. This work
was done in #3532, but that missed one thing - initializing a
storage service. Without the storage service, Podman will still
segfault. Fix that oversight here.

Fixes #3625

Signed-off-by: Matthew Heon <mheon@redhat.com>
2019-07-23 13:30:30 -04:00
Peter Hunt
479eeac62c move editing of exitCode to runtime
There's no way to get the error if we successfully get an exit code (as it's just printed to stderr instead).
instead of relying on the error to be passed to podman, and edit based on the error code, process it on the varlink side instead

Also move error codes to define package

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:29:33 -04:00
baude
a793bccae6 golangci-lint cleanup
a PR slipped through without running the new linter.  this cleans things
up for the master branch.

Signed-off-by: baude <bbaude@redhat.com>
2019-07-23 10:13:04 -05:00
OpenShift Merge Robot
26749204d5 Merge pull request #3621 from baude/golangcilint4
golangci-lint phase 4
2019-07-23 10:21:41 +02:00
baude
0c3038d4b5 golangci-lint phase 4
clean up some final linter issues and add a make target for
golangci-lint. in addition, begin running the tests are part of the
gating tasks in cirrus ci.

we cannot fully shift over to the new linter until we fix the image on
the openshift side.  for short term, we will use both

Signed-off-by: baude <bbaude@redhat.com>
2019-07-22 15:44:04 -05:00
Peter Hunt
a1a79c08b7 Implement conmon exec
This includes:
	Implement exec -i and fix some typos in description of -i docs
	pass failed runtime status to caller
	Add resize handling for a terminal connection
	Customize exec systemd-cgroup slice
	fix healthcheck
	fix top
	add --detach-keys
	Implement podman-remote exec (jhonce)
	* Cleanup some orphaned code (jhonce)
	adapt remote exec for conmon exec (pehunt)
	Fix healthcheck and exec to match docs
		Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error
		Use these different errors in branching for exit code in healthcheck and exec
	Set conmon to use new api version

Signed-off-by: Jhon Honce <jhonce@redhat.com>

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-22 15:57:23 -04:00
baude
db826d5d75 golangci-lint round #3
this is the third round of preparing to use the golangci-lint on our
code base.

Signed-off-by: baude <bbaude@redhat.com>
2019-07-21 14:22:39 -05:00
Daniel J Walsh
20302cb65d Cleanup Pull Message
Currently the pull message on failure is UGLY.  This patch removes a lot of the noice
when pulling an image from multiple registries to make the user experience better.

Our current messages are way too verbose and need to be dampened down.  Still has
verbose mode if you turn on log-level=debug.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-07-20 06:08:22 -04:00
Daniel J Walsh
8ae97b2f57 Add support for listing read/only and read/write images
When removing --all images prune images only attempt to remove read/write images,
ignore read/only images

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-07-19 06:59:49 -04:00
OpenShift Merge Robot
deb087d7b1 Merge pull request #3443 from adrianreber/rootfs-changes-migration
Include changes to the container's root file-system in the checkpoint archive
2019-07-19 02:38:26 +02:00
OpenShift Merge Robot
22e62e8691 Merge pull request #3595 from mheon/fix_exec_leak
Remove exec PID files after use to prevent memory leaks
2019-07-18 15:52:57 +02:00
Matthew Heon
5bbede9d9f Remove exec PID files after use to prevent memory leaks
We have another patch running to do the same for exit files, with
a much more in-depth explanation of why it's necessary. Suffice
to say that persistent files in tmpfs tied to container CGroups
lead to significant memory allocations that last for the lifetime
of the file.

Based on a patch by Andrea Arcangeli (aarcange@redhat.com).

Signed-off-by: Matthew Heon <mheon@redhat.com>
2019-07-18 09:06:11 -04:00
Matthew Heon
c91bc31570 Populate inspect with security-opt settings
We can infer no-new-privileges. For now, manually populate
seccomp (can't infer what file we sourced from) and
SELinux/Apparmor (hard to tell if they're enabled or not).

Signed-off-by: Matthew Heon <mheon@redhat.com>
2019-07-17 16:48:38 -04:00
Matthew Heon
156b6ef222 Properly retrieve Conmon PID
Our previous method (just read the PID that we spawned) doesn't
work - Conmon double-forks to daemonize, so we end up with a PID
pointing to the first process, which dies almost immediately.

Reading from the PID file gets us the real PID.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-17 16:48:38 -04:00
Matthew Heon
1e3e99f2fe Move the HostConfig portion of Inspect inside libpod
When we first began writing Podman, we ran into a major issue
when implementing Inspect. Libpod deliberately does not tie its
internal data structures to Docker, and stores most information
about containers encoded within the OCI spec. However, Podman
must present a CLI compatible with Docker, which means it must
expose all the information in 'docker inspect' - most of which is
not contained in the OCI spec or libpod's Config struct.

Our solution at the time was the create artifact. We JSON'd the
complete CreateConfig (a parsed form of the CLI arguments to
'podman run') and stored it with the container, restoring it when
we needed to run commands that required the extra info.

Over the past month, I've been looking more at Inspect, and
refactored large portions of it into Libpod - generating them
from what we know about the OCI config and libpod's (now much
expanded, versus previously) container configuration. This path
comes close to completing the process, moving the last part of
inspect into libpod and removing the need for the create
artifact.

This improves libpod's compatability with non-Podman containers.
We no longer require an arbitrarily-formatted JSON blob to be
present to run inspect.

Fixes: #3500

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-17 16:48:38 -04:00
Stefan Becker
5ed2de158f healthcheck: reject empty commands
An image with "HEALTHCHECK CMD ['']" is valid but as there is no command
defined the healthcheck will fail. Reject such a configuration.

Fixes #3507

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
2019-07-16 07:01:43 +03:00
Stefan Becker
dd0ea08cef healthcheck: improve command list parser
- remove duplicate check, already called in HealthCheck()
- reject zero-length command list and empty command string as errorneous
- support all Docker command list keywords: NONE, CMD or CMD-SHELL
- use Docker default "/bin/sh -c" for CMD-SHELL

Fixes #3507

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
2019-07-16 07:01:43 +03:00
OpenShift Merge Robot
547cb4e55e Merge pull request #3532 from mheon/ensure_store_on_refresh
Ensure we have a valid store when we refresh
2019-07-15 21:26:16 +02:00
dom finn
ee76ba5e68 Improves STD output/readability in combination
with debug output.

Added \n char to specific standard output

Signed-off-by: dom finn <dom.finn00@gmail.com>
2019-07-14 16:03:49 +10:00
OpenShift Merge Robot
20f11718de Merge pull request #3558 from mheon/fix_pod_remove
Fix a bug where ctrs could not be removed from pods
2019-07-11 21:35:53 +02:00
OpenShift Merge Robot
d614372c2f Merge pull request #3552 from baude/golangcilint2
golangci-lint pass number 2
2019-07-11 21:35:45 +02:00
Matthew Heon
8713483362 Fix a bug where ctrs could not be removed from pods
Using pod removal worked, but container removal was missing the
most critical step - the actual removal. Must have been
accidentally removed during a refactor.

Fixes #3556

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-11 10:17:33 -04:00
baude
a78c885397 golangci-lint pass number 2
clean up and prepare to migrate to the golangci-linter

Signed-off-by: baude <bbaude@redhat.com>
2019-07-11 09:13:06 -05:00
Adrian Reber
05549e8b29 Add --ignore-rootfs option for checkpoint/restore
The newly added functionality to include the container's root
file-system changes into the checkpoint archive can now be explicitly
disabled. Either during checkpoint or during restore.

If a container changes a lot of files during its runtime it might be
more effective to migrated the root file-system changes in some other
way and to not needlessly increase the size of the checkpoint archive.

If a checkpoint archive does not contain the root file-system changes
information it will automatically be skipped. If the root file-system
changes are part of the checkpoint archive it is also possible to tell
Podman to ignore these changes.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-07-11 14:43:35 +02:00
Adrian Reber
1a32074884 Fix typo in checkpoint/restore related texts
Signed-off-by: Adrian Reber <areber@redhat.com>
2019-07-11 14:43:35 +02:00