Implementation of Message Push and Storage Architectures of Modern IM Systems


IM is the abbreviation of instant messaging. In the highly information-based mobile Internet era, IM products have become a must-have item in our life. The most well-known IM products in China are DingTalk, WeChat, and QQ. Among them, WeChat has already developed into an ecosystem, but its core function is still an IM. IM is an inseparable module of some applications that do not take IM as the core business. The most typical ones are online games and social networking applications. The IM feature is essential to applications with social attributes.

  • Message synchronization: Transmitting integrate messages from the sender to the recipient quickly. The most important metrics of a message synchronization system are the instantaneity and integrity of transmitted messages, and the size of messages that can be supported. In terms of functionality, an IM system must at least support online and offline message push. Advanced IM systems also support “multi-terminal synchronization”.
  • Message storage: The persistent storage of messages. This does not mean the local storage of messages at the client-side, but the storage on the cloud. This is the so-called “message roaming” function. The advantage of “message roaming” is that you can log on to your account at any terminals to view all historical messages. This is also one of the unique features of the advanced IM system.

Architecture Design

This section mainly describes the architecture design of Table Store-based modern IM systems. Before describing the architecture design in detail, we will introduce the timeline logic model to abstract and simplify the understanding of the IM synchronization and storage models. After understanding the timeline model, we will talk about the modeling methods for the synchronization and storage of messages based on the timeline model. We also have to make technical trade-offs in various aspects when implementing message synchronization and storage. For example, we have to compare and choose common pull and push modes for message synchronization, and choose the underlying database based on the characteristics of the timeline model.

Comparison between Conventional and Modern Architectures

Timeline Model

Before analyzing the design and implementation of the “message synchronization database” and “message storage database”, we will first introduce the timeline logic model. Understanding the timeline model is helpful to understand of message synchronization and storage models. The design and implementation of a message database is also based on the characteristics and requirements of the timeline model.

  • Each message has a sequence ID (SeqId), and the SeqId of a message in the rear part of a queue is always greater than the SeqId of a message in the front part of the queue. This ensures that the SeqId increments over time, but it does not have to be monotonically increasing.
  • New messages are always added to the end of a queue, ensuring that the SeqId of the new message is always greater than that of existing messages.
  • We can either read a specific message based on the SeqId, or read all messages within a given range.

Message Storage Model

Message Synchronization Model

The message synchronization model is slightly more complicated than the message storage model. Message synchronization is generally implemented in two different modes — pull and push corresponding to different physical timeline models.

  • Pull mode: In the message storage model, all messages of a session are saved in the timeline of the session. In pull mode message synchronization, new messages generated in each session only need to be written once to the storage timeline to allow the receiving terminal to pull such messages from the timeline. The advantage is that a message only needs to be written once. This greatly reduces the number of message writes in comparison with push mode, especially in the case of group chat messages. However, its drawback is also obvious — the logic for a receiving terminal to pull messages could be relatively complicated and inefficient. The receiving terminal must pull messages from every session to get all messages. The reads are amplified. This also introduces a lot of invalid reads because not every session has new messages.
  • Push mode: In push mode, an additional timeline is required for message synchronization for each session. Usually, each receiving terminal has an independent synchronization timeline to store all messages that need to be synchronized to this terminal. In this case, messages in each session are written to both the message storage and message synchronization timelines. In a one-to-one chat scenario, a message is additionally written twice. Apart from being written to the storage timeline of the session, the message must be written to the message synchronization timeline of both recipients. In the group chat scenario, the writes are amplified further. If a group has N participants, each message must be written N+1 times. The push mode message synchronization has an outstanding advantage in that the synchronization logic at the receiving terminal is very simple. The receiving terminal only needs to pull messages from the synchronization timeline once. This greatly reduces the reading pressure for message synchronization. The drawback is that message writes are amplified, especially for group chats.

Message Database Design

Based on the timeline model and the application of the timeline model in message storage and synchronization, let’s take a look at the design of the message synchronization database and the message storage database.

  • Message synchronization database: The message synchronization database is used to store all message synchronization timelines. Each timeline corresponds to one receiving terminal, and is mainly used for the push mode message synchronization. This database does not have to permanently keep all messages that need to be synchronized. Because when a message is synchronized to all terminals, its lifecycle ends, and it can be deleted immediately. However, as mentioned before, a simple multi-terminal message synchronization system does not store the synchronization status of all receiving terminals on the server. The synchronization is proactively done by the terminals. In this case, the server does not know when a message can be deleted. The common practice is to set a fixed lifecycle for messages stored in this database. For example, one week or one month. A message is deleted when its lifecycle ends.
  • Message storage database: The message storage database stores timelines of all sessions, and each timeline holds all messages of a session. This database is mainly used to pull all historical messages of a session during message roaming. It can also be used in the pull mode message synchronization.

Database Selection

The message synchronization database and the message storage database are the core databases of the message system. They have different database requirements:

Architecture Implementation

Let the code speak for itself. For the detailed sample code, click here.


This article mainly describes the implementation of the message push and storage architectures in a modern IM system. Based on the timeline logic model, we can clearly understand the message synchronization and storage architectures. Table Store is a perfect implementation of the timeline model. Its auto-incrementing feature solves the most critical problem of the timeline model — the auto-incrementing SeqId.

  1. How does Table Store ensure high reliability and high availability
  2. How does Table Store implement cross-region disaster tolerance



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: