Speeding Up Logistics with a Lightweight Timer Task Scheduling Engine

Traditional Solution

The common practice for the scenario mentioned above is writing the timer tasks into a database, querying the tasks at a scheduled time with a thread when said tasks are near expiration and then executing logics related to the tasks. The advantage of this solution is ease of implementation; this is especially true for scenarios featuring single machine or small-scale businesses. But its disadvantage is also obvious: more complexity gets introduced for scenarios featuring distributed and large-scale business.

Basics of Timing Wheel Algorithm

The timing wheel solution introduces the concept of a real-life clock into the design of software, the main idea of which is to define a clock cycle (e.g., 12 hours) and the step length (e.g., a step per second). Every time the pointer goes one step, the task mounted on the corresponding degree of timescale will be gotten and executed. The whole structure is as shown in figure 1.

Timing Wheel Timer Tasks’ Storage

We’ve found that the modification to the Alibaba Message Queue (MQ) provides a great solution to this problem. The old solution checks if a message is expired based on regular polling, which is limited for discrete time. For this reason, the former solution can only support a few latency levels of scheduled messages. The Alibaba MQ team has modified the old solution to a new one which is based on timing wheel+linked list to solve this problem. This new solution can enable discrete scheduled messages; it can also satisfy the complexity in the management of individual lists of tasks for individual scale degrees required for the traditional timing wheel.

Lightweight Timer Task Scheduling Engine

The timer task scheduling engine introduces the core idea of timing wheel solution while eliminating the necessity of system’s dependency on the disk. As the data is not to be stored directly on a disk, it will have to depend on specialized storage services for storage. For this, the store at the lower level gets replaced with a structure available for applying the storage service:

Partitioning of Linked List of Tasks- Accelerate the Extraction of a Single Linked List

The advantage of a linked list is that, instead of storing the entire list of tasks, you store a simple ID, which reduces the consumption of storage but also introduces a new issue. It is well-known that a linked list is writing-friendly, but the reading efficiency is not very high; as a result, when a long linked list of tasks is to be mounted at a specific time, the form of a linked list will not allow the improvement in reading efficiency through concurrency.

Clustered Management — Self-Identification of Cluster Nodes

After the issues about timer task storage and task extraction efficiency of the single linked list are solved, it seems that the entire solution is ready for use; however, there is still another problem which is considerable, that is, how to restore a node after its cluster gets deployed and it gets restarted?

Automatic Management of a Cluster — Auto-Perceiving of Resizing and Reduction

Once we have introduced the Master node and knew that all nodes would register themselves into the Master node (including the Master node itself), we can then build a mechanism for clustered management based on the concept of a master node. For an ordinary app, the resizing and reduction of a cluster are quite normal operations; under the conditions of dynamic scheduling these operations are even unaware of the app owner; therefore, will the cluster itself be able to know when it has been resized and reduced? The answer is yes, as there is a Master node, which can perceive the statuses of other nodes within the cluster. For example, the Master node perceives a change in cluster’s capacity and adjusts cluster’s load.

Clustered Extraction of Timer Tasks — Soft Loading of Cluster Pressure

We have discussed previously the solution that splits a single linked list associated with a specific time on the timing wheel into multiple linked lists to improve the concurrent extraction of tasks, and we have also formulated a solution for clustered management. If a node only relies on itself for task extraction, even if the extraction gets done concurrently, the node can experience too high pressure when there are a vast number of tasks to be done for the current node at a specific time.


We have discussed the solution in a progressed way and detailed the issues arising at each phase; some points may need further elaboration. The solution discussed in this article can be summarized with the flowchart below.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com