Choosing the Right ECS Instance through Benchmarking
We are spoiled with choices when it comes to choosing a cloud compute instance. The luxury of having these choices also means that choosing the right instance is not an easy task. Apart from deciding the right cloud vendor, we also need to identify the appropriate instance’s family type as well as right instance type. The considerations when choosing the right instance shall include (and not limited to) budget allocation, target workload behavior, service level agreement, regulatory, and compliance.
Generally, to decide on the right instance, we first need to understand the target workload behavior. For example, does the workload involve heavy calculation, or does it involve analysis of huge sets of historical data files? Obviously, domain knowledge and experience play a key role in understanding the workload behavior. If you lack experience regarding the target workload, talk to someone that have experience with it. In my opinion, experience is of remarkable importance in determining the workload behavior.
Most cloud providers provide different types of servers to cater for various scenarios. Table 1 shows the high level use cases of generic family types. For more details on Alibaba Cloud ECS family type, you may refer to this link.
Table 1: Family Type and Use Cases
Once you’ve narrowed down to a particular family type, it’s time to select the appropriate instance type that meets the minimum workload capacity. Usually, the capacity requirement could be derived from business requirements or is resulted from certain service level agreement. Again, it’s our responsibility to figure it out. Figure 1 suggests a general flow on right instance selection.
Figure 1: Instance Selection Flowchart
This post presents an approach that may be beneficial in the instance type selection process. In a nutshell, the suggested approach is to benchmark the selected instance type’s performance with a benchmarking test suite — Phoronix. The article will subsequently propose a desired instance type based on the outcome of the said benchmarking activity. Please note that the configuration recommendations in this post is only for reference and may exclude other kinds of considerations like regulation, and cloud vendor’s SLA. In reality, those considerations are vital in drawing conclusion on instance type selection as well. There are tons of good guidelines and discussions out there (such as my previous post ) that discusses on general considerations when choosing the right instance.
2. Benchmarking Environment Setup
For the purpose of discussion, we presumed that there is a request to prepare an on-cloud compute instance for running video clip transcoding services in Singapore region. In addition, we assumed that there is neither a reference system nor prior setup on such service for us to refer to.
In general, the high level requirements for a video clip transcoding service is as such:
The service would be served via Web API (web application) and transcoding task would be performed with open transcoding libraries like ffmpeg. The transcoded output would be available for download once it’s ready and there is no need to store the transcoded output. In addition, the video size uploaded by user is limited to 50MB. The service would be running for a temporary period only, and hence, it would be better if there is no lock-in period for compute instance. At the same time, the service is running on best effort mode and there is no committed service duration for each transcoding request. Lastly, the maximum budget allocated for the compute instance shall not more than USD 250 per month (pretty reasonable, ya?).
Goal: Ultimately, we want a compute instance that fulfilled the
provided requirement with maximum cost efficiency!
3. Selecting the Right Instance
3.1. Family Type Selection
The presumed scenario is pretty straightforward (yes, I know), and obviously, the requirements for real-world applications would be much more complicated than that. Anyway, based on these requirements, we understand that the service (video transcoding) workload should be mainly CPU bound (I believe). Hence, we’ll tentatively explore the instance type under the “CPU Optimized” family. By the way, it’s possible to revisit/reconsider other family types if none of the instance type from this family type is within our budget.
Tentative Family Type: CPU Optimized
3.2 Instance Type Selection
Specifically for Alibaba Cloud Singapore region, there are numerous choices for instances type under “Compute Optimized” family, such as “se1”, “cm4”, “ce4”, and “c4”. To find out more details on each instance type, please visit this link. Table 2 shows the cost of relevant pay-as-you-go (on-demand) instance type with 100GB SSD storage. We decided on 100GB storage as there is no storage requirement in the workload and we would like to keep our cost low while enjoying reasonable IOPS. Refer to this link to find out more about storage performance.
Table 2: Cost for Relevant Instance Type
Based on the monthly costs, there are two instance types, namely ecs.c4.xlarge and ecs.cm4.xlarge, that could be considered as our compute instance. Benchmarking would be carried out on the two mentioned instance types in order to figure out the best cost efficiency instance type. Besides, for comparison purpose, we’ll also be benchmarking an instance type (ecs.n4.xlarge) from “General Purpose” family. The detail specifications of each instance type are shown in Table 3.
Benchmarked instance types: ecs.c4.xlarge, ecs.cm4.xlarge, ecs.n4.xlarge
Table 3: Instance Type Specifications
3.3 Benchmarking Tool and Criteria
Benchmarking would be carried out on CPU, Memory, Storage, and Network Throughput. However, we shall place more emphasis on the benchmark result of CPU since the requested workload (video transcoding) is mainly CPU bound (hypothesized). As stated earlier, this post would rely on Phoronix Test Suite for benchmarking. Briefly, Phoronix Test Suite is an open-source benchmarking tool that widely used by industries to test server performance. For more details, you can visit this link.
The following test Profiles have been setup in Phoronix Test Suite for benchmarking activity:
Table 4: Phoronix Test Profile
4. Benchmarking Result
The benchmarking result are shown in the following section. Alternatively, you may visit the complete result at this link. By the way, the setup step for Phoronix Test Suite is available under Appendix section.
The ecs.n4.xlarge (Yes, the General Purpose Type!) outperformed the other two instances type in most of the CPU test profiles. Particularly, for “pts/ffmpeg” test, it’s 17.93% better than the worst performing instance (ecs.cm4.xlarge), which is a “Compute Optimized” instance that has the same number of cores. The benchmark result suggests that ecs.n4.xlarge is the most appealing instance type (cheapest and best performing so far) for our workload. Anyway, we’ll hold our decision until we’ve verified the test profiles of other resources. Table 5 summarizes the CPU benchmarking result.
Table 5: Benchmark — CPU Performance
i. Test Profile — pts/ffmpeg
ii. Test Profile — pts/openssl-1.8.0
iii. Test Profile — pts/compress-gzip
iv. Test Profile — pts/apache
v. Test Profile — pts/stress-ng
The ecs.cm4.xlarge instance type claimed the top place in the memory test profile benchmarking. It’s kind of expected since the instance type has the most amount of RAM (16GB). However, the cheaper ecs.n4.xlarge instance type is almost on-par with the best performing instance type. Table 6 shows the memory benchmark result.
Table 6: Benchmark — Memory
i. Test Profile — pts/stream (Copy)
ii. Test Profile — pts/stream (Scale)
iii. Test Profile — pts/stream (Triad)
iv. Test Profile — pts/stream (Add)
v. Test Profile — pts/ramspeed (Integer)
vi. Test Profile — pts/ramspeed (Floating Point)
There is no significance difference in terms of storage benchmarking. All of the instance types storage performance are comparable. Table 7 shows the storage benchmark result.
Table 7: Benchmark — Storage
i.Test Profile — pts/fio (Random Read — MB/s)
ii.Test Profile — pts/fio (Random Read — IOPS)
iii.Test Profile — pts/fio (Random Write — MB/s)
iv.Test Profile — pts/fio (Random Write — IOPS)
v.Test Profile — pts/fio (Sequential Read — MB/s)
vi.Test Profile — pts/fio (Sequential Read — IOPS)
vii.Test Profile — pts/fio (Sequential Write — MB/s)
viii.Test Profile — pts/fio (Sequential Write — IOPS)
4.4. Network Throughput
The instance type from “Compute Optimized” family are clearly the winner for this benchmarking testing. The difference between the top and the lowest benchmark is up to 47.13%. However, the actual impact for our application could be small since the video size is limited to 50MB. Table 8 shows the network throughput benchmark result.
Table 8: Benchmark — Network Throughput
i. Test Profile — pts/iperf (TCP)
ii. Test Profile — pts/iperf (UDP)
The benchmark’s result obtained above suggests that ecs.n4.xlarge is the best, cost-efficient (performance/cost) instance type among the other benchmarked instance types. For our workload, this instance type is the best fit. Additionally, I couldn’t help but to reiterate how affordable it is for its performance!
What does this result mean? Does it mean that Alibaba Cloud incorrectly labeled their servers? Not so much. In fact, the definition of “compute optimized” may vary significantly based on the actual application. For example, the server may have unimpressive performances for each category, but when put as a whole, it may be the perfect one for your application. Such result also indirectly indicates that we shouldn’t solely rely on “Family Type” when choosing an instance type. Like this post, the “tentative” selected family type — “Compute Optimized” ended up not being the most cost-efficient instance type in our context. This is also influenced by the way we define the performance, to some extent.
In short, treat the “Family Type” only as reference. If there is a specific feature that you are looking for in your server, then you should perform benchmarking according to the actual workload behavior. You will then be able to find your RIGHT INSTANCE objectively!
The steps and setup for Phoronix Test Suite used in the benchmarking activity above are shown below:
1.Install Phoronix Test Suite at the benchmarking instance.
sudo apt-get install --yes ./phoronix-test-suite_7.6.0_all.deb unzip
2.Install selected test profile the benchmarking instance.
phoronix-test-suite install pts/openssl-1.8.0
phoronix-test-suite install pts/apache
phoronix-test-suite install pts/stress-ng
phoronix-test-suite install pts/compress-gzip
phoronix-test-suite install pts/ffmpeg
phoronix-test-suite install pts/stream
phoronix-test-suite install pts/ramspeed
phoronix-test-suite install pts/fio
phoronix-test-suite install pts/iperf
3.Configure and “batch” mode setup accordingly.
4.Execute the tests accordingly.
phoronix-test-suite batch-run pts/openssl-1.8.0 pts/apache pts/stress-ng pts/compress-gzip pts/ffmpeg pts/stream pts/ramspeed pts/fio pts/iperf
5.For pts/iperf test, setup corresponding iperf server for data packet transferring. Install the iperf3 in another server (same instance type as the benchmark instance) before executing step 4.
sudo apt-get install --yes iperf3