podman

mirror of https://github.com/containers/podman.git synced 2025-12-05 04:40:47 +08:00

Author	SHA1	Message	Date
Matthew Heon	ebacfbd091	podman: fix memleak caused by renaming and not deleting the exit file If the container exit code needs to be retained, it cannot be retained in tmpfs, because libpod runs in a memcg itself so it can't leave traces with a daemon-less design. This wasn't a memleak detectable by kmemleak for example. The kernel never lost track of the memory and there was no erroneous refcounting either. The reference count dependencies however are not easy to track because when a refcount is increased, there's no way to tell who's still holding the reference. In this case it was a single page of tmpfs pagecache holding a refcount that kept pinned a whole hierarchy of dying memcg, slab kmem, cgropups, unrechable kernfs nodes and the respective dentries and inodes. Such a problem wouldn't happen if the exit file was stored in a regular filesystem because the pagecache could be reclaimed in such case under memory pressure. The tmpfs page can be swapped out, but that's not enough to release the memcg with CONFIG_MEMCG_SWAP_ENABLED=y. No amount of more aggressive kernel slab shrinking could have solved this. Not even assigning slab kmem of dying cgroups to alive cgroup would fully solve this. The only way to free the memory of a dying cgroup when a struct page still references it, would be to loop over all "struct page" in the kernel to find which one is associated with the dying cgroup which is a O(N) operation (where N is the number of pages and can reach billions). Linking all the tmpfs pages to the memcg would cost less during memcg offlining, but it would waste lots of memory and CPU globally. So this can't be optimized in the kernel. A cronjob running this command can act as workaround and will allow all slab cache to be released, not just the single tmpfs pages. rm -f /run/libpod/exits/* This patch solved the memleak with a reproducer, booting with cgroup.memory=nokmem and with selinux disabled. The reason memcg kmem and selinux were disabled for testing of this fix, is because kmem greatly decreases the kernel effectiveness in reusing partial slab objects. cgroup.memory=nokmem is strongly recommended at least for workstation usage. selinux needs to be further analyzed because it causes further slab allocations. The upstream podman commit used for testing is `1fe2965e4f` (v1.4.4). The upstream kernel commit used for testing is f16fea666898dbdd7812ce94068c76da3e3fcf1e (v5.2-rc6). Reported-by: Michele Baldessari <michele@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> <Applied with small tweaks to comments> Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2019-07-31 17:28:42 -04:00
OpenShift Merge Robot	680a383874	Merge pull request #3672 from petejohanson/32bit-build-fixes Build fix for 32-bit systems.	2019-07-30 22:07:32 +02:00
Pete Johanson	32aaf8da56	Build fix for 32-bit systems. * Fixes #3664. Signed-off-by: Pete Johanson <peter@peterjohanson.com>	2019-07-30 12:25:36 -04:00
OpenShift Merge Robot	1a008958d4	Merge pull request #3661 from openSUSE/nixos-friendly-config Update libpod.conf to be more friendly to NixOS	2019-07-30 16:33:48 +02:00
Sascha Grunert	52ae51c79f	Update libpod.conf to be NixOS friendly NixOS links the current system state to `/run/current-system`, so we have to add these paths to the configuration files as well to work out of the box. Signed-off-by: Sascha Grunert <sgrunert@suse.com>	2019-07-30 12:59:11 +02:00
OpenShift Merge Robot	7d635ac1c5	Merge pull request #3656 from jwhonce/wip/env Fix commit --changes env=X=Y	2019-07-29 21:57:08 +02:00
OpenShift Merge Robot	6665269ab8	Merge pull request #3233 from wking/fatal-requested-hook-directory-does-not-exist libpod/container_internal: Make all errors loading explicitly configured hook dirs fatal	2019-07-29 16:39:08 +02:00
Jhon Honce	40bf0649af	Fix commit --changes env=X=Y Signed-off-by: Jhon Honce <jhonce@redhat.com>	2019-07-26 16:04:17 -07:00
OpenShift Merge Robot	0c4dfcfe57	Merge pull request #3639 from giuseppe/user-ns-container podman: support --userns=ns\|container	2019-07-26 15:06:06 +02:00
Giuseppe Scrivano	1d72f651e4	podman: support --userns=ns\|container allow to join the user namespace of another container. Closes: https://github.com/containers/libpod/issues/3629 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-07-25 23:04:55 +02:00
Sascha Grunert	7630f1b52e	Fix possible runtime panic if image history len is zero We now return an empty string for the `Comment` field if an OCI v1 image contains no history. Signed-off-by: Sascha Grunert <sgrunert@suse.com>	2019-07-25 12:45:08 +02:00
Matthew Heon	f747a06d53	When retrieving volumes, only use exact names We should not be fuzzy matching on volume names. Docker doesn't do it, and it doesn't make much sense. Everything requires exact matches for names - only IDs allow partial matches. Fixes #3635 Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2019-07-24 22:30:16 -04:00
OpenShift Merge Robot	2283471f8d	Merge pull request #3626 from mheon/fix_ps_segfault Fix a segfault on Podman no-store commands with refresh	2019-07-24 14:45:01 +02:00
Peter Hunt	01a8483a59	refactor to reduce duplicated error parsing Signed-off-by: Peter Hunt <pehunt@redhat.com>	2019-07-23 16:49:04 -04:00
Matthew Heon	5fb4feb36a	Fix a segfault on Podman no-store commands with refresh When a command (like `ps`) requests no store be created, but also requires a refresh be performed, we have to ignore its request and initialize the store anyways to prevent segfaults. This work was done in #3532, but that missed one thing - initializing a storage service. Without the storage service, Podman will still segfault. Fix that oversight here. Fixes #3625 Signed-off-by: Matthew Heon <mheon@redhat.com>	2019-07-23 13:30:30 -04:00
Peter Hunt	479eeac62c	move editing of exitCode to runtime There's no way to get the error if we successfully get an exit code (as it's just printed to stderr instead). instead of relying on the error to be passed to podman, and edit based on the error code, process it on the varlink side instead Also move error codes to define package Signed-off-by: Peter Hunt <pehunt@redhat.com>	2019-07-23 13:29:33 -04:00
baude	a793bccae6	golangci-lint cleanup a PR slipped through without running the new linter. this cleans things up for the master branch. Signed-off-by: baude <bbaude@redhat.com>	2019-07-23 10:13:04 -05:00
OpenShift Merge Robot	26749204d5	Merge pull request #3621 from baude/golangcilint4 golangci-lint phase 4	2019-07-23 10:21:41 +02:00
baude	0c3038d4b5	golangci-lint phase 4 clean up some final linter issues and add a make target for golangci-lint. in addition, begin running the tests are part of the gating tasks in cirrus ci. we cannot fully shift over to the new linter until we fix the image on the openshift side. for short term, we will use both Signed-off-by: baude <bbaude@redhat.com>	2019-07-22 15:44:04 -05:00
Peter Hunt	a1a79c08b7	Implement conmon exec This includes: Implement exec -i and fix some typos in description of -i docs pass failed runtime status to caller Add resize handling for a terminal connection Customize exec systemd-cgroup slice fix healthcheck fix top add --detach-keys Implement podman-remote exec (jhonce) * Cleanup some orphaned code (jhonce) adapt remote exec for conmon exec (pehunt) Fix healthcheck and exec to match docs Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error Use these different errors in branching for exit code in healthcheck and exec Set conmon to use new api version Signed-off-by: Jhon Honce <jhonce@redhat.com> Signed-off-by: Peter Hunt <pehunt@redhat.com>	2019-07-22 15:57:23 -04:00
baude	db826d5d75	golangci-lint round #3 this is the third round of preparing to use the golangci-lint on our code base. Signed-off-by: baude <bbaude@redhat.com>	2019-07-21 14:22:39 -05:00
Daniel J Walsh	20302cb65d	Cleanup Pull Message Currently the pull message on failure is UGLY. This patch removes a lot of the noice when pulling an image from multiple registries to make the user experience better. Our current messages are way too verbose and need to be dampened down. Still has verbose mode if you turn on log-level=debug. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2019-07-20 06:08:22 -04:00
Daniel J Walsh	8ae97b2f57	Add support for listing read/only and read/write images When removing --all images prune images only attempt to remove read/write images, ignore read/only images Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2019-07-19 06:59:49 -04:00
OpenShift Merge Robot	deb087d7b1	Merge pull request #3443 from adrianreber/rootfs-changes-migration Include changes to the container's root file-system in the checkpoint archive	2019-07-19 02:38:26 +02:00
OpenShift Merge Robot	22e62e8691	Merge pull request #3595 from mheon/fix_exec_leak Remove exec PID files after use to prevent memory leaks	2019-07-18 15:52:57 +02:00
Matthew Heon	5bbede9d9f	Remove exec PID files after use to prevent memory leaks We have another patch running to do the same for exit files, with a much more in-depth explanation of why it's necessary. Suffice to say that persistent files in tmpfs tied to container CGroups lead to significant memory allocations that last for the lifetime of the file. Based on a patch by Andrea Arcangeli (aarcange@redhat.com). Signed-off-by: Matthew Heon <mheon@redhat.com>	2019-07-18 09:06:11 -04:00
Matthew Heon	c91bc31570	Populate inspect with security-opt settings We can infer no-new-privileges. For now, manually populate seccomp (can't infer what file we sourced from) and SELinux/Apparmor (hard to tell if they're enabled or not). Signed-off-by: Matthew Heon <mheon@redhat.com>	2019-07-17 16:48:38 -04:00
Matthew Heon	156b6ef222	Properly retrieve Conmon PID Our previous method (just read the PID that we spawned) doesn't work - Conmon double-forks to daemonize, so we end up with a PID pointing to the first process, which dies almost immediately. Reading from the PID file gets us the real PID. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2019-07-17 16:48:38 -04:00
Matthew Heon	1e3e99f2fe	Move the HostConfig portion of Inspect inside libpod When we first began writing Podman, we ran into a major issue when implementing Inspect. Libpod deliberately does not tie its internal data structures to Docker, and stores most information about containers encoded within the OCI spec. However, Podman must present a CLI compatible with Docker, which means it must expose all the information in 'docker inspect' - most of which is not contained in the OCI spec or libpod's Config struct. Our solution at the time was the create artifact. We JSON'd the complete CreateConfig (a parsed form of the CLI arguments to 'podman run') and stored it with the container, restoring it when we needed to run commands that required the extra info. Over the past month, I've been looking more at Inspect, and refactored large portions of it into Libpod - generating them from what we know about the OCI config and libpod's (now much expanded, versus previously) container configuration. This path comes close to completing the process, moving the last part of inspect into libpod and removing the need for the create artifact. This improves libpod's compatability with non-Podman containers. We no longer require an arbitrarily-formatted JSON blob to be present to run inspect. Fixes: #3500 Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2019-07-17 16:48:38 -04:00
Stefan Becker	5ed2de158f	healthcheck: reject empty commands An image with "HEALTHCHECK CMD ['']" is valid but as there is no command defined the healthcheck will fail. Reject such a configuration. Fixes #3507 Signed-off-by: Stefan Becker <chemobejk@gmail.com>	2019-07-16 07:01:43 +03:00
Stefan Becker	dd0ea08cef	healthcheck: improve command list parser - remove duplicate check, already called in HealthCheck() - reject zero-length command list and empty command string as errorneous - support all Docker command list keywords: NONE, CMD or CMD-SHELL - use Docker default "/bin/sh -c" for CMD-SHELL Fixes #3507 Signed-off-by: Stefan Becker <chemobejk@gmail.com>	2019-07-16 07:01:43 +03:00
OpenShift Merge Robot	547cb4e55e	Merge pull request #3532 from mheon/ensure_store_on_refresh Ensure we have a valid store when we refresh	2019-07-15 21:26:16 +02:00
dom finn	ee76ba5e68	Improves STD output/readability in combination with debug output. Added \n char to specific standard output Signed-off-by: dom finn <dom.finn00@gmail.com>	2019-07-14 16:03:49 +10:00
OpenShift Merge Robot	20f11718de	Merge pull request #3558 from mheon/fix_pod_remove Fix a bug where ctrs could not be removed from pods	2019-07-11 21:35:53 +02:00
OpenShift Merge Robot	d614372c2f	Merge pull request #3552 from baude/golangcilint2 golangci-lint pass number 2	2019-07-11 21:35:45 +02:00
Matthew Heon	8713483362	Fix a bug where ctrs could not be removed from pods Using pod removal worked, but container removal was missing the most critical step - the actual removal. Must have been accidentally removed during a refactor. Fixes #3556 Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2019-07-11 10:17:33 -04:00
baude	a78c885397	golangci-lint pass number 2 clean up and prepare to migrate to the golangci-linter Signed-off-by: baude <bbaude@redhat.com>	2019-07-11 09:13:06 -05:00
Adrian Reber	05549e8b29	Add --ignore-rootfs option for checkpoint/restore The newly added functionality to include the container's root file-system changes into the checkpoint archive can now be explicitly disabled. Either during checkpoint or during restore. If a container changes a lot of files during its runtime it might be more effective to migrated the root file-system changes in some other way and to not needlessly increase the size of the checkpoint archive. If a checkpoint archive does not contain the root file-system changes information it will automatically be skipped. If the root file-system changes are part of the checkpoint archive it is also possible to tell Podman to ignore these changes. Signed-off-by: Adrian Reber <areber@redhat.com>	2019-07-11 14:43:35 +02:00
Adrian Reber	1a32074884	Fix typo in checkpoint/restore related texts Signed-off-by: Adrian Reber <areber@redhat.com>	2019-07-11 14:43:35 +02:00
Adrian Reber	217f2e77f8	Include root file-system changes in container migration One of the last limitations when migrating a container using Podman's 'podman container checkpoint --export=/path/to/archive.tar.gz' was that it was necessary to manually handle changes to the container's root file-system. The recommendation was to mount everything as --tmpfs where the root file-system was changed. This extends the checkpoint export functionality to also include all changes to the root file-system in the checkpoint archive. The checkpoint archive now includes a tarstream of the result from 'podman diff'. This tarstream will be applied to the restored container before restoring the container. With this any container can now be migrated, even it there are changes to the root file-system. There was some discussion before implementing this to base the root file-system migration on 'podman commit', but it seemed wrong to do a 'podman commit' before the migration as that would change the parent layer the restored container is referencing. Probably not really a problem, but it would have meant that a migrated container will always reference another storage top layer than it used to reference during initial creation. Signed-off-by: Adrian Reber <areber@redhat.com>	2019-07-11 14:43:34 +02:00
Adrian Reber	d5f1caaf50	Add function to get a filtered tarstream diff The newly added function GetDiffTarStream() mirrors the GetDiff() function. It tries to get the correct layer ID from getLayerID() and it filters out containerMounts from the tarstream. Thus the behavior is the same as GetDiff(), but it returns a tarstream. This also adds the function ApplyDiffTarStream() to apply the tarstream generated by GetDiffTarStream(). These functions are targeted to support container migration with root file-system changes. Signed-off-by: Adrian Reber <areber@redhat.com>	2019-07-11 14:43:34 +02:00
OpenShift Merge Robot	144567b42d	Merge pull request #3527 from adrianreber/finish Correctly set FinishedTime for checkpointed container	2019-07-11 10:23:19 +02:00
Adrian Reber	f187bab497	Correctly set FinishedTime for checkpointed container During 'podman container checkpoint' the finished time was not set. This resulted in a strange container status after checkpointing: Exited (0) 292 years ago During checkpointing FinishedTime is now set to time.now(). Signed-off-by: Adrian Reber <areber@redhat.com>	2019-07-11 07:35:38 +02:00
OpenShift Merge Robot	e2e8477f83	Merge pull request #3521 from baude/golangcilint1 first pass of corrections for golangci-lint	2019-07-11 01:22:30 +02:00
baude	e053e0e05e	first pass of corrections for golangci-lint Signed-off-by: baude <bbaude@redhat.com>	2019-07-10 15:52:17 -05:00
Giuseppe Scrivano	18c4d73867	runtime: drop spurious message log fix a regression introduced by `1d36501f96` Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-07-10 15:47:38 +02:00
Matthew Heon	5ef972d87b	Ensure we have a valid store when we refresh Fixes #3520 Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2019-07-10 08:55:48 -04:00
OpenShift Merge Robot	76aa8f6d2d	Merge pull request #3529 from giuseppe/healthcheck-rootless healthcheck: support rootless mode	2019-07-09 16:09:37 +02:00
Giuseppe Scrivano	c6c637da00	healthcheck: support rootless mode now that dbus authentication works fine from a user namespace (systemd 241 works fine), we can enable rootless healthchecks. It uses "systemd-run --user" for creating the healthcheck timer and communicates with the user instance of systemd listening at $XDG_RUNTIME_DIR/systemd/private. Closes: https://github.com/containers/libpod/issues/3523 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-07-09 14:20:20 +02:00
OpenShift Merge Robot	fce2e6577e	Merge pull request #3497 from QazerLab/bugfix/systemd-generate-pidfile Use conmon pidfile in generated systemd unit as PIDFile.	2019-07-08 23:39:42 +02:00

1 2 3 4 5 ...

1409 Commits