PouchContainer with LXCFS for Highly Reliable Isolation of Containers

Alibaba Cloud
5 min readSep 6, 2018

PouchContainer is an open-source container operation product of Alibaba, with the latest version being 0.3.0. Its code can be found at https://github.com/alibaba/pouch. PouchContainer is designed with LXCFS for highly reliable isolation of containers. Linux uses cgroup for resource isolation. In a container, however, the /proc file system of a host is mounted, and users get host information when reading files from /proc/meminfo in the container. Lack of /proc view isolation in containers brings a series of problems, and slows down or impedes enterprises' effort to containerize services.

LXCFS (https://github.com/lxc/lxcfs) is an open-source Filesystem in Userspace (FUSE) designed to address the issue of /proc view isolation and make containers operate like virtual machines at the presentation layer. This document describes the use cases and mechanism of LXCFS, as well as the integration of LXCFS in PouchContainer.

LXCFS Use Cases

With the introduction of physical and virtual machines, enterprises gradually develop a tool chain including compiling and packaging, application deployment, and unified monitoring, which provide stable services for the applications deployed on physical and virtual machines. The following describes the functions of LXCFS related to monitoring, O&M tools, and application deployment during the service containerization process.

Monitoring and O&M Tools

Many monitoring tools depend on the /proc file system to get system information. For example, some basic monitoring tools of Alibaba use tsar (https://github.com/alibaba/tsar) to collect information. Collecting memory and CPU information by tsar depends on the /proc file system. You can download the tsar source code to view the files in the /proc directory referenced by tsar.

$ git remote -v
origin https://github.com/alibaba/tsar.git (fetch)
origin https://github.com/alibaba/tsar.git (push)
$ grep -r cpuinfo .
./modules/mod_cpu.c: if ((ncpufp = fopen("/proc/cpuinfo", "r")) == NULL) {
:tsar letty$ grep -r meminfo .
./include/define.h:#define MEMINFO "/proc/meminfo"
./include/public.h:#define MEMINFO "/proc/meminfo"
./info.md: The memory counter is in /proc/meminfo, which contains key items.
./modules/mod_proc.c: /* read total mem from /proc/meminfo */
./modules/mod_proc.c: fp = fopen("/proc/meminfo", "r");
./modules/mod_swap.c: * Read swapping statistics from /proc/vmstat & /proc/meminfo.
./modules/mod_swap.c: /* read /proc/meminfo */
$ grep -r diskstats .
./include/public.h:#define DISKSTATS "/proc/diskstats"
./info.md: The I/O counter file is /proc/diskstats, for example:
./modules/mod_io.c:#define IO_FILE "/proc/diskstats"
./modules/mod_io.c:FILE *iofp; /* /proc/diskstats*/
./modules/mod_io.c: handle_error("Can't open /proc/diskstats", !iofp);

Process, I/O, and CPU monitoring by tsar depends on the /proc file system.

When the /proc file system in a container provides host resource information, the preceding monitoring feature cannot monitor intra-container information. To meet service requirements, you may need to adapt the container monitoring feature or even need to develop a set of monitoring tools for intra-container monitoring. Such changes will slow down and even impede enterprises’ efforts to containerize existing services. The container technology is expected to be compatible with enterprises’ existing tool chains as much as possible and consider engineers’ use habits.

The preceding issues are addressed by LXCFS supported by PouchContainer. Deployment in a container or on a host is transparent to monitoring and O&M tools dependent on the /proc file system. Existing monitoring and O&M tools can be migrated smoothly to containers without adaptation or redevelopment, which enables intra-container monitoring and O&M.

The following command installs PouchContainer 0.3.0 on a Ubuntu virtual machine:

# uname -a
Linux p4 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

systemd pulls pouchd. LXCFS is disabled by default and unavailable for use by the created container. The following commands list the /proc files in the container:

# systemctl start pouch
# head -n 5 /proc/meminfo
MemTotal: 2039520 kB
MemFree: 203028 kB
MemAvailable: 777268 kB
Buffers: 239960 kB
Cached: 430972 kB
root@p4:~# cat /proc/uptime
2594341.81 2208722.33
# pouch run -m 50m -it registry.hub.docker.com/library/busybox:1.28
/ # head -n 5 /proc/meminfo
MemTotal: 2039520 kB
MemFree: 189096 kB
MemAvailable: 764116 kB
Buffers: 240240 kB
Cached: 433928 kB
/ # cat /proc/uptime
2594376.56 2208749.32

The /proc/meminfo and uptime files in the container have the same output as that of the host. Though memory is specified to be 50 MB during container startup, the /proc/meminfo file does not show the memory limit of the container.

The following commands start the LXCFS service on the host, pull the pouchd process manually, and set LXCFS parameters:

# systemctl start lxcfs
# pouchd -D --enable-lxcfs --lxcfs /usr/bin/lxcfs >/tmp/1 2>&1 &
[1] 32707
# ps -ef |grep lxcfs
root 698 1 0 11:08 ? 00:00:00 /usr/bin/lxcfs /var/lib/lxcfs/
root 724 32144 0 11:08 pts/22 00:00:00 grep --color=auto lxcfs
root 32707 32144 0 11:05 pts/22 00:00:00 pouchd -D --enable-lxcfs --lxcfs /usr/bin/lxcfs

The following commands start the container and get file content:

# pouch run --enableLxcfs -it -m 50m registry.hub.docker.com/library/busybox:1.28
/ # head -n 5 /proc/meminfo
MemTotal: 51200 kB
MemFree: 50804 kB
MemAvailable: 50804 kB
Buffers: 0 kB
Cached: 4 kB
/ # cat /proc/uptime
10.00 10.00

Use LXCFS to start the container and read the /proc file to get information from the container.

Service Applications

Many applications highly dependent on systems require a startup program to obtain system information such as memory and CPU for configuration purposes. When the /proc file in a container lacks accurate information about resources in the container, this situation will have a great impact on the preceding applications.

For example, the startup scripts of some Java applications access the /proc/meminfo file for the stack size dynamically allocated to running programs. When the container memory limit is less than the host memory, program startup may fail due to failed memory allocation. For applications related to DPDK, application tools get CPU information from the /proc/cpuinfo file to acquire the CPU logic core required to initialize the environment abstraction layer (EAL). If the preceding information cannot be obtained accurately from the container, modification must be made to DPDK application tools.

PouchContainer Integrated with LXCFS

PouchContainer 0.1.0 and later versions supports LXCFS. For implementation details, see https://github.com/alibaba/pouch/pull/502.

When a container starts, the LXCFS mount point /var/lib/lxc/lxcfs/proc/ on the host is mounted to the virtual /proc file system directory in the container by using -v. The /proc directory of the container lists proc files such as meminfo, uptime, swaps, stat, diskstats, and cpuinfo. Related parameters are as follows:

-v /var/lib/lxc/:/var/lib/lxc/:shared
-v /var/lib/lxc/lxcfs/proc/uptime:/proc/uptime
-v /var/lib/lxc/lxcfs/proc/swaps:/proc/swaps
-v /var/lib/lxc/lxcfs/proc/stat:/proc/stat
-v /var/lib/lxc/lxcfs/proc/diskstats:/proc/diskstats
-v /var/lib/lxc/lxcfs/proc/meminfo:/proc/meminfo
-v /var/lib/lxc/lxcfs/proc/cpuinfo:/proc/cpuinfo

To simplify use, the pouch create and run command lines provide the --enableLxcfs parameter. When creating a container, you can set this parameter without having to deal with complex -v parameters.

After a period of use and testing, we discover the connect failed error message returned to /proc access in containers. It is due to the re-creation of proc and cgroup after LXCFS restarts. To improve the LXCFS stability, we refine LXCFS management in PR:https://github.com/alibaba/pouch/pull/885 to transfer the management to systemd. Specifically, we add ExecStartPost to lxcfs.service for the remount operation, traverse LXCFS-enabled containers, and perform the remount operation in containers.


PouchContainer supports LXCFS to implement view isolation of the /proc file systems in containers, which minimizes the changes to existing tool chains and O&M practices of enterprises during containerization of inventory applications, and accelerates the containerization progress. LXCFS effectively supports enterprises’ efforts to smoothly transform from traditional virtualization to container virtualization.





Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com