Exploring Alibaba Group’s PouchContainer Resource Management APIs — Part 2

PouchContainer is Alibaba Group’s efficient, open source, enterprise-class container engine technology featuring strong isolation, high portability and low resource consumption. This article will introduce you to the common APIs of PouchContainer resource management and corresponding underlying kernel APIs.

The following is a detailed description of each resource management API. Test cases are provided for the sake of understandability. PouchContainer 0.4.0 is used in these cases. If the stress command is not available in your image, you can install the stress tool via the command sudo apt-get install stress.

1. Memory Resource Management

1.1 -m, — memory

It can limit the amount of memory used by the container, and the corresponding cgroup file is cgroup/memory/memory.limit_in_bytes.

Unit: B, KB, MB, GB

By default, a container can consume an unlimited amount of memory until the host’s memory resources are exhausted.

Run the following command to confirm that the cgroup file corresponds to the resource management of the container memory.

It can be seen that when the memory is limited to 100 MB, the corresponding value of the cgroup file is 104,857,600 in bytes, which is equal to 100 MB.

The local memory environment is as follows:

We use the stress tool to verify whether the memory limit is in effect. The following command will create a process in the container, in which memory is constantly being occupied (malloc) or freed (free). Theoretically, as long as the memory used is less than the limit, the container will work normally. Note that if you try to use a boundary value, that is, occupying 100 MB of memory in the container using the stress tool, this operation usually fails because there are other processes running in the container.

Then we attempt to perform an operation that occupies 150 MB memory on a container where memory usage is limited to 100 MB, but the operation is normal and no OOM occurs.

Check the memory usage of the system with the following command; you will find that the memory usage of Swap has increased, indicating that the “ — memory” option does not limit the amount of Swap memory usage.

When we try to close Swap using the swapoff -a command, we execute the previous command again. As you can see from the following log, an error occurs when the container uses memory that exceeds the limit.

1.2 — memory-swap

It can limit the total amount of swap partition and memory used by the container, and the corresponding cgroup file is cgroup/memory/memory.memsw.limit_in_bytes.

Value range: greater than the memory limit

Unit: B, KB, MB, GB

Run the following command to confirm that the cgroup file corresponds to the resource management of the container swap partition. It can be seen that when the memory swap is limited to 1 GB, the corresponding value of the cgroup file is 1,073,741,824 in bytes, which is equal to 1 GB.

As shown below, the container throws an exception when it tries to occupy more memory than is available in the memory swap.

1.3 — memory-swappiness

This API sets the trend for the container to use the swap partition, which is an integer ranging from 0 to 100 (inclusive). 0 indicates that the container does not use the swap partition, and 100 indicates that the container uses the swap partition as much as possible. The corresponding cgroup file is cgroup/memory/memory.swappiness.

1.4 — memory-wmark-ratio

Used to calculate low_wmark, low_wmark = memory.limit_in_bytes * MemoryWmarkRatio. When memory.usage_in_bytes is greater than low_wmark, the kernel thread is triggered to perform memory reclamation. When the memory.usage_in_bytes is less than high_wmark, the reclamation is stopped. The corresponding cgroup file is cgroup/memory/memory.wmark_ratio.

1.5 — oom-kill-disable

When out-of-memory (OOM) occurs, the system kills the container process by default. If you do not want the container process to be killed, you can use this API. This API corresponds to the cgroup file cgroup/memory/memory.oom_control.

OOM is triggered when the container attempts to use a memory that exceeds the limit. Then there are two cases: one is that the API — oom-kill-disable=false, in which the container will be killed; the other is that the API — oom-kill-disable=true, in which the container will be suspended.

The following command sets the container’s memory usage limit to 20 MB and sets the value of the API — oom-kill-disable to true. Check that the cgroup file corresponding to the API, and the value of oom_kill_disable is 1.

oom_kill_disable: A value of 0 or 1. 1 indicates that when the container tries to use a memory that exceeds the limit (i.e. 20 MB), the container will be suspended.

Under_oom: A value is 0 or 1. When the value is 1, OOM has already occurred in the container.

Use x=a; while true; do x=$x$x$x$x; done to occupy as much memory as possible and force OOM to be triggered, the log is as follows.

As can be seen from the above log, when the container’s memory is exhausted, the container exits with an exit code of 137. Because the container tries to use a memory value that exceeds the limit, the system will trigger OOM, the container will be killed, and the under_oom value will be 1. We can view the value of under_oom through the cgroup file (/sys/fs/cgroup/memory/docker/${container_id}/memory.oom_control) in the system (oom_kill_disable 1, under_oom 1).

When — oom-kill-disable=true, the container will not be killed, but will be suspended by the system.

1.6 — oom-score-adj

The parameter — oom-score-adj sets the possibility that the container process will trigger OOM. The larger the value, the easier the OOM of the container process will be triggered. When the value is -1000, the container process does not trigger OOM at all. This option corresponds to the underlying API /proc/$pid/oom_score_adj.

2. CPU resource management

2.1 — cpu-period

The cycle of the kernel default Linux CFS (Completely Fair Scheduler) is 100 ms; we use — cpu-period to set the CPU usage cycle for the container, and the API — cpu-period needs to be used with the API — cpu-quota. The API — cpu-quota sets the value of CPU usage. CFS is the default scheduling model used by the kernel to allocate CPU resources to running processes. For multi-core CPUs, the value of — cpu-quota is adjusted as needed.

It corresponds to the cgroup file cgroup/cpu/cpu.cfs_period_us. The following command creates a container, and sets the container's CPU usage time to 50,000 (in microseconds), and verifies the value corresponding to the cgroup file corresponding to the API.

The following command sets the value of — cpu-period to 50,000 and the value of — cpu-quota to 25,000. The container can get 50% of the CPU resource at runtime.

As can be seen from the last line of the log, the CPU usage of the container is about 50.0%, which is in line with expectations.

2.2 — cpu-quota

It corresponds to the cgroup file cgroup/cpu/cpu.cfs_quota_us.

The API — cpu-quota sets the value of CPU usage. Normally it needs to be used with the API — cpu-period. For detailed use, refer to the option for — cpu-period.

2.3 — cpu-share

It corresponds to the cgroup file cgroup/cpu/cpu.shares.

The — cpu-shares sets the weight for the container using the CPU. This weight setting is for CPU-intensive processes. If the process in a container is idle, then other containers can use the CPU resources that would otherwise be occupied by the idle container. That is, the — cpu-shares setting is only applied when two or more containers are trying to occupy the entire CPU resource.

We use the following command to create two containers with weights of 1024 and 512 respectively.

As can be seen from the log of the top command, the PID generated by the first container is 10513, and the CPU usage is 65.1%; the PID generated by the second container is 10687, and the CPU usage is 34.9%. The CPU usage of the two containers is approximately 2:1, and the test results are in line with expectations.

2.4 — cpuset-cpus

This API corresponds to the cgroup file cgroup/cpuset/cpuset.cpus.

In the virtual machine with multi-core CPU, start a container, set the container to use only CPU core 1, and check that the corresponding cgroup file of the API is modified to 1, the log is as follows.

Use the following command to specify that the container uses CPU core 1 and use the stress command.

The log of the top command for viewing CPU resource is as follows. It should be noted that after typing the top command and pressing the Enter key, then pressing the number key 1, the status of each CPU core can be displayed in the terminal.

From the above log, only the load of CPU core 1 is 100%, while other CPU cores are idle, and the result is in line with expectations.

2.5 — cpuset-mems

This API corresponds to the cgroup file cgroup/cpuset/cpuset.mems.

The following command will restrict the container process from using the memory of memory nodes 1 and 3.

The following command will restrict the container process from using the memory of memory nodes 0, 1 and 2.

3. IO resource management

3.1 — blkio-weight

The container’s block device IO weight can be set by the API — blkio-weight, which is an integer ranging from 10 to 1,000 (inclusive). By default, all containers get the same weight value (500). It corresponds to the cgroup file cgroup/blkio/blkio.weight. The following command sets the IO weight of the container block device to 10, and you can see that the value of the corresponding cgroup file is 10 in the log.

Use the following two commands to create containers for different block device IO weight values.

Block device operations (such as the following command) are simultaneously performed in two containers. You will find that the time spent is inversely proportional to the IO weight of the block device for the container.

3.2 — blkio-weight-device

The container’s specific block device IO weight can be set by the API — blkio-weight-device=”deviceName:weight”, which is an integer ranging from 10 to 1,000 (inclusive).

It corresponds to the cgroup file cgroup/blkio/blkio.weight_device.

The “8:0” in the above log indicates the device number of the SDA. You can use the stat command to obtain the device number of a device. You can see that the primary device number corresponding to /dev/sda is 8 and the secondary device number is 0.

If the API — blkio-weight-device is used with the API — blkio-weight, the docker will use the value of — blkio-weight as the default weight and then use the value of the — blkio-weight-device to set the weight for a specified device. The previously set default weight will not take effect in this specific device.

As can be seen from the above log, when the API — blkio-weight is used with the API — blkio-weight-device, the weight of the /dev/sda device is determined by the value set by — blkio-weight-device.

3.3 — device-read-bps

This API limits the read rate of a specified device. The unit can be KB, MB, or GB. It corresponds to the cgroup file cgroup/blkio/blkio.throttle.read_bps_device.

The above log shows 8:0 1000, in which 8:0 means /dev/sda, and the value of the cgroup file corresponding to this API is 1,048,576, which is the number of bytes corresponding to 1 MB, i.e. the square of 1,024.

Use the API — device-read-bps to set the device read rate to 500 KB/s when creating the container. As can be seen from the following log, the read rate is limited to 498 KB/s, which is in line with expectations.

3.4 — device-write-bps

This API limits the write rate of a specified device. The unit can be KB, MB, or GB. It corresponds to the cgroup file cgroup/blkio/blkio.throttle.write_bps_device.

The above log shows 8:0 1000, in which 8:0 means /dev/sda, and the value of the cgroup file corresponding to this API is 1,048,576, which is the number of bytes corresponding to 1 MB, i.e. the square of 1,024.

Use the API — device-write-bps to set the device write rate to 1 MB/s when creating the container. As can be seen from the following log, the read rate is limited to 1.0 MB/s, which is in line with expectations.

Rate limiting operation:

3.5 — device-read-iops

This API sets the IO read rate of the device, and corresponds to the cgroup file cgroup/blkio/blkio.throttle.read_iops_device.

The IO read rate of the SDA can be limited by “ — device-read-iops /dev/sda:400” (400 times/second), and the log is as follows.

As can be seen from the above log, the number of reads of IO per second is 400, and a total of 1,024 reads (line 2 in the log: count=1024) are needed. The test result shows that the execution time is 2.51044 seconds, which is close to the expected value 2.56 (1,024/400) seconds.

3.6 — device-write-iops

This API sets the IO write rate of the device, and corresponds to the cgroup file cgroup/blkio/blkio.throttle.write_iops_device.

The IO write rate of the SDA can be limited by “ — device-write-iops /dev/sda:400” (400 times/second), and the log is as follows.

As can be seen from the above log, the number of writes of IO per second is 400, and a total of 1,024 writes (line 2 in the log: count=1024) are needed. The test result shows that the execution time is 2.50754 seconds, which is close to the expected value of 2.56 (1,024/400) seconds.

3.7 Other resource management APIs

— pids-limit

— pids-limit limits the number of PIDs within a container, and corresponds to the cgroup API cgroup/pids/pids.max.

If new processes are continuously created inside the container, the system will prompt the following error.

4. Summary

PouchContainer’s resource management depends on the underlying technology of the Linux kernel. You can add some targeted tests as needed to learn more. The implementation of the kernel technology on which it depends is far beyond the scope of this article. You can read the kernel manual for more information: PouchContainer community document.

Reference:

https://www.alibabacloud.com/blog/exploring-alibaba-group%27s-pouchcontainer-resource-management-apis-%E2%80%93-part-2_593886?spm=a2c41.11873439.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store