By Cao Liang (Kehuai)
Caching is one of the essential methods by which programmers build a high-performance and high-concurrency system. It is often used to solve performance bottlenecks. Programmers must understand caches and demonstrate their knowledge during job interviews.
Although product managers may not focus on system performance, programmers must consider system concurrency and user volumes while drawing up implementation requirements. Caching is mainly used to solve performance bottlenecks, but its improper use may cause the system to crash. This article focuses on outlining the what, why, where, and when of caches.
With the rapid iteration of Internet businesses and the surge in the user base, application architectures must constantly adjust and even restructure to adapt to rapid business growth. As data volumes increase rapidly, business logic becomes more complex, and service links continuously increase, the response time (RT) will become too long and the service performance must gradually improve to provide a better user experience. Two methods are usually used to optimize the system architecture: scale-up and scale-out. Scale-out is commonly referred to as horizontal scaling, which designs application services to be stateless. Horizontal scaling can help reduce access pressure by adding hardware. Scale-up improves the performance of a single service link to improve the QPS and system throughput. In the pursuit of better performance, caches provide a common solution since the majority of business scenarios involve more reads than writes.
1) What Is a Cache?
According to a standard definition on Wikipedia, a cache is a collection of data duplicating original values stored elsewhere on a computer, usually for easier access.
This definition implies that a cache must exist as a copy of the existing data. Also, the cache is designed for scenarios with quick data access (reading) requirements. In existing Internet applications, caches serve as a key technology that accelerates the service response time. They are also a non-functional requirement that product managers generally do not consider. While designing technical solutions, it’s imperative to perform a forward-looking evaluation of business scenarios. This allows determining whether a cache is required in the technical architecture to address non-functional requirements.
In the computer field, various caches are used. For example, the CPU cache is used to solve the imbalance between the CPU computing speed and memory data reading. The CPU operation speed is far faster than the memory read/write speed. To reduce the time the CPU spends waiting on data read/write operations, an L1, L2, and L3 multi-level cache is incorporated into the CPU.
The file cache in Linux is another example. When we talk about data memory addressing during programming, we only use virtual addresses, but not physical addresses. The memory management unit (MMU) and page tables in the computer convert virtual addresses to physical addresses. In the field of computer hardware, many cache-related applications have already proven successful. In fact, the cache design in the software architecture borrows many ideas from the designs of computer hardware caches.
2) Why Do We Need a Cache?
Users will use software services if they trust them and these services provide product value and solve users’ difficulties. As mentioned in “Growth Hacker”, whether or not products give users an “aha moment” determines whether users will use them frequently and continuously. The user experience is considered to be the key factor that increases user loyalty to software products.
2.1 What Is the User Experience?
The user experience was defined and advocated by Donald Norman in the 1990s. It is primarily important in the field of human-machine interaction. It is basically consistent with the three traditional availability indicators: efficiency, effectiveness, and basic satisfaction.
The ISO 9241–210 standard officially defines the user experience as “a person’s perceptions and responses that result from the use or anticipated use of a product, system, or service”. The user experience concerns users’ subjective feelings about software products. Specifically, it includes the user’s emotions, preferences, cognitive impressions, psychological reactions, emotional expressions, and other subjective feelings before, during, and after use. As different users have different perspectives and concerns about a product, it is challenging for software products to provide a good user experience for the majority of users.
In the industry, the user experience is usually divided into three categories: user status, system performance, and the environment of software products. Ensuring a good user status and environment (user environment and the environment of the product which is the product’s place among similar products) requires efforts in many professional fields, such as interactive design and application research. Meanwhile, software developers need to solve system performance problems. The most basic requirement for users is the timeliness of the service content provided by software products during use. If the content is constantly loaded during use, this will definitely lead to poor user experience. Content timeliness is also the most basic requirement for system performance.
Naturally, product managers do not consider system performance problems. This is a non-functional requirement, which requires the careful consideration of developers. There are many metrics for evaluating system performance. Therefore it’s significant to ascertain which performance metrics are essential to improve the user experience.
2.2 Common Performance Metrics
Common metrics to be considered during software architecture design include the response time, latency, throughput, number of concurrent users, and resource utilization.
1) System Response Time: It refers to the time that the system takes to respond to user requests. Different functions require longer or shorter processes and involve different data volumes. As a result, different functions have different response times. Therefore, when measuring the system response time, we usually look at the average response time and maximum response time of all software product functions.
2) Latency: When discussing the system response time, look at the time required by specific processes.
- For example, the “presentation time” is the time that the client takes to render content when receiving data.
- The network transmission time and application latency affect how long the service takes to receive requests sent by users and return the appropriate response. The application latency is the time that the server takes to execute the entire service process. This is also what performance optimization primarily seeks to reduce.
3) Throughput: It refers to the number of requests that are processed per unit time. For non-concurrent applications, throughput is inversely proportional to the request-response time which implies that a longer service latency results in lower throughput.
4) Concurrent Users: The number of concurrent users refers to the number of users to which the system simultaneously provides system functions. This metric is more general, but easier for non-professionals to understand.
5) Resource Utilization: The resource utilization rate reflects the proportion of resources used within a period of time.
2.3 Advantages of Caching
To optimize the user experience, it is necessary to continuously improve the preceding performance indicators and continuously approach the optimal solution in terms of the system experience. Now let’s deep dive into the advantages of caching, and explore whether it’s worth spending a lot of time and effort designing a cache structure that well adapts to current business scenarios.
1) Highly Improved Software User Experience
Software products mainly focus on two core issues, solving the difficulties of target users and increasing product stickiness. In the abstract, software services seek to solve the problem of data transmission along the entire process chain. Implementation focuses on improving the efficiency and smoothness of data flowing. In fact, caching might be applied in all stages of the process, such as browsers, server load balancers, application servers, and databases. When data is stored closer to users, for example, when data replicas exist on the client, requests are responded to faster and it takes less time to present data to the user. Today, users have very short attention spans and very little patience. If a software product fails to grab their attention quickly, it has little chance of succeeding. Therefore, caching allows for better subjective user experience.
2) Increased Throughput
Obtaining service data from the cache for each request during the service process eliminates the need of obtaining a large volume of data from the source application server, which reduces the frequency of network transmission to and from the source server. Given a certain IDC bandwidth, the system may reduce the network transmission time and application latency to support more system access attempts. This, in turn, will improve the overall system throughput and the number of concurrent users it supports. The efficiency of hardware usage will also significantly improve.
In actual scenarios, cache-based system optimization is probably preferred during system performance optimization. It has proven to be an effective method. Caching is also considered as the art of “changing space into time”.
3) Where Are Caches Located in the Service Process?
3.1 Cache Types
The process from request to final response goes through many steps, and a cache is deployed at almost any node throughout the link. According to different features, caches are classified as follows:
1) Location of the cache in the service process
- Client Cache
- Network Cache
- Server Cache
2) Cache architecture deployment method:
- Single-host Cache
- Cache Cluster
- Distributed Cache
3) Cache-enabled memory area:
- Local Cache or in-process Cache
- Inter-process Cache
- Remote Cache
Let’s systematically analyze different cache applications based on the locations of the caches in the service process.
3.2 Client Cache
A client cache is a storage medium that is “closest” to users. This cache is often used together with a network and server cache. Common client cache types are as follows:
1) Webpage Cache: A webpage cache caches some elements obtained from a static page to the local machine so that the next request does not need to obtain these resource files again. Mobile webpages support offline caching and help to specify manifest files on the page. When a browser accesses a file with the manifest attribute, it first obtains the resource file loaded on the page from the application cache and handles the cache update issues through a check mechanism.
2) Browser Cache: The browser cache usually creates a dedicated memory space to store resource copies. When users go back to a previous operation, relevant data is quickly obtained through the browser cache. In HTTP 1.1, e-tags were introduced and used together with expire and cache-control features to support browser caching.
3) App Cache: An app may cache content to the memory or local database. For example, some open-source image libraries employ caching technology. When images and other resource files are obtained from the remote server, they are cached. In this way, users do not have to make repeated requests the next time, which reduces their traffic fees.
Client caches are an important means of frontend performance optimization. As the client is the node closest to users, client caches allow developers to explore the full potential of optimization.
3.3 Network Cache
The network cache is located between the client and the server. It handles the response to data requests through proxies and reduces the back-to-source rate of data requests. There are several general types of network caches:
1) Web Proxy Cache: Common proxy methods include forward proxy, reverse proxy, and transparent proxy. The web proxy cache is usually a forward proxy that stores resource files and hotspot data on the proxy server. When a new request is received and the data is retrievable from the proxy server, the request does not have to be sent to the application server.
2) Edge Cache: Like a forward proxy, a reverse proxy is also be used for caching. For example, Nginx provides a caching function. In addition, if these reverse proxy servers are located in the same network as the user request, the resource acquisition speed is further improved. Such reverse proxy servers are called edge caches. A common type of edge cache is a content delivery network (CDN), which allows users to store static resource files, such as images, on CDN nodes.
3.4 Server Cache
Server caching is a focus of performance optimization in backend development. Common backend performance optimization methods involve the introduction of a cache. Common server cache applications include database query caches, cache frameworks, and application-level caches.
3.4.1 Database Query Cache
As one example of a database query cache, MySQL caches SELECT statements and the corresponding ResultSets. After a SELECT request is received, if MySQL has enabled the Query Cache function, the SELECT statement is hashed as a string and then queried in the cache. If the queried data is found, the result is directly returned. This eliminates the need for subsequent I/O operations on the optimizer and storage engine and greatly improves the response efficiency. To optimize the query cache, consider the following metrics:
query_cache_size:This determines the size of memory areas that caches ResultSets.
query_cache_type:This indicates cache usage scenarios. 0 indicates that the query cache is not used in any scenario, 1 indicates that the query cache is used unless explicitly specified otherwise, and 2 (DEMAND) indicates that the query cache takes effect only when explicitly specified.
Qcache hits:This indicates the number of query cache hits.
Qcache inserts:This indicates the number of times that data is inserted when a request does not hit the query cache.
Qcahce lowmem prunes:This indicates the number of queries that have been cleared due to insufficient space.
Qcache free memory:This indicates the capacity of free memory.
Qcache free blocks:A large value of this metric indicates that there are many memory fragments that must be cleared promptly.
When optimizing Qcache, comprehensively analyze the preceding metrics. For example, judge the current Qcache efficiency by understanding that Qcache hit ratio =
Qcache hits/Qcache hits + Qcache counts. The current Qcache memory usage efficiency is also determined based on Qcahce lowmem prunes, Qcache free memory, and Qcache free blocks.
In addition, to use the InnoDB storage engine, pay attention to the
innodb_buffer_pool_size parameter, which determines whether enough space is available in the cache to store the InnoDB index and data.
table_cache determines the maximum number of tables that can be cached and is another parameter that requires attention.
3.4.2 Cache Framework
During functional development, cache frameworks that provide caching features or class libraries that implement caching features are commonly used for efficient development. Common cache frameworks include Ehcache and Guava. These cache frameworks are easy to configure and are convenient and flexible to use. These open-source cache frameworks support local caching on a single host and flexible scaling by configuring clusters.
3.4.3 Application Cache
When a cache framework doesn’t meet specific needs, it is recommended to introduce an application-level cache, such as Redis, MongoDB, or another type of NoSQL database. Application-level caches have highly available and scalable distributed architectures that support business needs, though it is also very challenging to properly use an application-level cache product.
4) When Is Cache Required?
A cache is not a necessary element in the architecture design or a function required for business development. It is only necessary to use a cache to improve system performance when you encounter performance bottlenecks. Caching is not suitable for all business scenarios. It is more suitable for scenarios with more reads than writes and low data timeliness requirements. Caching is not a magic fix to all performance problems. When misused, a cache may lead to high cache maintenance costs and makes the system more complex and difficult to maintain.
In addition, using a cache for storage is a fatal decision. This shows a misunderstanding of the purpose of a cache and puts the system at risk from the moment the cache is introduced. A deep understanding of appropriate cache usage is required to ensure that the decision to introduce a cache is correct.
Always consider the following components while designing a cache structure:
1) Traffic Volume and Application Scale: For applications with low concurrency and low traffic, the introduction of a cache will not significantly improve performance, but will increase application complexity and produce extremely high O&M costs. Not all data needs to be cached. For example, it is better to use a distributed file system for images, videos, and other files than using a cache. Therefore, before introducing a cache, evaluate the traffic of the current business. In high-concurrency and high-traffic business scenarios, introducing a cache will be more beneficial.
2) Cache Application Selection: Many cache applications, such as Redis, Memcached, and Tair, are available. Therefore, before determining an appropriate cache product, understand the pros and cons, application scopes, memory efficiency, and O&M costs of different cache applications, and even the knowledge structure of the developers.
3) Correct Assessment of Cache Impact Factors: Before introducing a cache, it is important to evaluate and consider multiple factors, such as value sizes, cache memory space, peak QPS, expiration time, cache hit ratio, read/write update policies, key-value distribution routing policies, expiration policies, and data consistency solutions.
4) High-availability Cache Architecture: Like any distributed system, distributed caches need to be highly available. The design of cache clusters and primary-secondary synchronization solutions must be reliable enough to serve business systems and create business value.
5) Comprehensive Monitoring Platform: When the cache is used in the production environment, a monitoring system is required to explicitly observe the operational status of the cache system so that problems are quickly detected. In addition, a hotspot discovery system is also required to solve the unexpected hotspot data caching problem.
6) Cache Proximity Principle: Placing the cached data in the location closest to the user significantly improves the response speed. This is also the core idea of the multi-level cache design approach.
5) How Do You Use Cache Correctly?
There are many factors that affect the overall performance of the cache to a greater or lesser extent, such as the impact of the language characteristics. For example, the impact of GC needs to be considered in Java. There are multiple factors to consider for increasing the cache hit ratio.
- Dive into distributed cache systems:
- Discussions of cache penetration, cache breakdown, and cache avalanche:
- Cache update policies:
- Eventual consistency:
- Cache design: