Understanding Data Caching

Introduction

Caching is an efficient and easy way to capture interactions between your application and the data storage location. To accomplish this effectively, you need to understand the various implementations of cache and their effects on your application. In this article, we will be focusing on the various usage scenarios of data caching without discussing the details of the technical implementation.

Data Caching Features and Application Scenarios

All businesses rely on data, but the relevance of different types of data varies by industry. This blog emphasizes the importance of increasing post-classification data in proportion with business needs.

  • Returning less data
  • Reducing interactions with the under layer
  • Reducing Central Processing Unit (CPU) overhead and using more machine resources

Advantages and Disadvantages of Caching

Two types of cache exist in Java development: local cache and cluster cache.

  • Access speed is very fast
  • Uses very low threshold
  • Well-developed plug-ins like ehcache, oscache, and abd guava cache are easily available in the market. (The ehcache and oscache already provide the cluster cache.)
  • When the local cache is too large, it is easy to trigger Garbage Collection (GC) of servers
  • It engages more mature middleware tools like Tair, Redis, and Memcached.
  • The threshold for use is high, and cache hotspot issues may occur, like the Tair hotspot issues.

Cache Usage Methods

Cache usage varies for different data sizes and scenarios. Three usage methods exist, which are as follows:

  • Full data cache warming or key data cache warming: In this method, results are already available, and the query is run against the data source. The results are retrieved from the cache with no need for recalculation
  • Implementation of Sentinel Caching: Sentinel caching uses multiple integrated layers to retrieve data
    Passive cache processing is the simplest to implement and suitable for scenarios with a small data size. Full data cache warming requires a sound design and is suitable for hotspot access scenarios with a medium data size. Sentinel caching integrates various layers, but the all-round application involves complex technologies.

Local Cache Bottleneck

The decision to use a local cache or cluster cache does not restrict the cache usage methods. While extensions are available, this article does not focus on the cluster cache implementation but on the bottleneck that local cache usage causes. In most cases, a local cache of JVM refers to heap memory.

  • Several hundred megabytes for persistent generation PermSize
  • The “tag-copy” algorithm also becomes cumbersome when local cache usage is too high and too much heap memory is unavailable. Also, when the local cache is used improperly leading to a large object resulting in a compromise of the YGC.
  • Developers often encounter the JVM GC bottleneck at the end of code writing. This causes confrontation and compromise with JVM garbage collection. Workarounds include writing temporary variables of a List and inserting objects into the List. Splitting large objects means investing more designing effort and writing a lot of business logic-independent code.

Problems with Using Multiple Threads

Why does the improper use of multiple threads lead to GC problems?

  • Thread stack: Set through -Xss
  • Socket buffer cache
  • Java Native Interface (JNI) code
  • Virtual machine and GC: The execution of virtual machine and GC code also consumes part of the memory

Preliminary Study on Off-Heap Memory

Remember the following key points when using local cache:

  • Avoid excessive local cache
  • Adjust JVM configuration as appropriate to meet the current project characteristics
  • Imcache: It supports n-level caching hierarchy where it supports various caching methods like heap, offheap, and more. Imcache also supports the well-known caching solution — Redis.
  • SharedHashMap: SharedHashMap is targeted at high-performance off-heap memory storage of entries for low latency applications, to avoid GC overheads on that data
  • EHcache: Only the enterprise edition supports non-heap memory
  • BigMemory: The free edition has a limited service life. The enterprise edition has mature applications, but it is not free.
  • When data storage is in the non-heap memory, although avoidance of GC’s impact is possible, developers need to focus on non-heap memory data overflow problems.
  • When data storage is in the non-heap memory, and there is an access request from the client, the data needs to be retrieved from the non-heap memory to the heap memory, and then returned to the client. The process of retrieving the data still needs to take up a portion of the heap memory.
  • A majority of non-heap memories require the application to have the read and write permissions to a server file directory since the file houses the non-heap memory.

Summary

This blog focuses on the concept of caching and its applications. Two types of cache — local and cluster — were discussed along with their advantages and disadvantages. This article also sheds light on GC and large local caching, as well as the potential of non-heap memory.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.