In-depth Analysis on HLC-based Distributed Transaction Processing

Implementing distributed transactions is one of the most difficult technologies in the distributed database field. Distributed transactions enable consistent data access in distributed databases, ensure the atomicity and isolation of global reads/writes, and provide users with an integrated experience of distributed databases. This article mainly describes clock solutions in distributed databases and distributed transaction management. Hybrid logical clock (HLC) allows obtaining data locally and avoids performance bottlenecks and single point of failures that would otherwise be caused by centralized clocks. At the same time, HLC preserves the causal relationship (happen before) between events or transactions across instances.

This lecture mainly covers the two following topics:

  1. Clock solution
  2. Distributed transaction management

1. Clock Solution

(1) Why Clocks Are Needed in Databases?

(2) Clock in a Distributed Database

(3) Clock Solution

(4) Logical Clock

A logical clock alone can ensure causal consistency and causal sequence. Then what is the biggest problem of a logical clock? In some extreme situations, a relationship will never be established between two nodes, leading to an increasing gap between the logical clocks for the two nodes. If we need to implement query or backup between the two nodes, a relationship must be established between them in a forcible way, causing an increasing gap between the logical clocks for the two nodes.

(5) Hybrid Logical Clock

(6) Differences Between HLC and Centralized Clock

(7) Centralized Clock, Distributed Clock, and TrueTime

2. Distributed Transaction Management

(1) Two-Phase Commit Protocol

(2) Other Distributed Transaction Management Techniques

A professor in the United States put forward the concept of deterministic transactions, established a company based on the deterministic transaction model and created a distributed database (Fauna). Deterministic transactions are complete transactions rather than interactive transactions. For example, all the transactions processed at the internet company Taobao are non-deterministic transactions. Non-deterministic transactions are operations like Begin Transaction and Select Transaction. Each operation is interactive, that is, the app needs to interact with databases. From the database perspective, a database can never predict the next statement. Transactions of this type are non-deterministic. All the logic of a deterministic transaction is written at a time and then sent to the database. When receiving that transaction, the database knows which tables this transaction involves, which records the transaction needs to read, and which operations are to be performed. From the database perspective, the transaction is completely deterministic. When received, deterministic transactions can be sorted in advance. If two transactions process the same records, the transactions are sorted. If they do not process the same records, they are sent out in parallel. This does not require locks and conflict detection during the submit phase later. However, this method requires that transactions are not interactive.

(3) HLC and Two-Phase Commit Protocol

(4) HLC Offset

About the Author

Original Source