Introducing the Redis-full-check Tool

By Zhu Zhao

Redis-full-check is a tool from the Alibaba Cloud Redis & MongoDB team that checks data consistency between two Redis databases and is usually used to check the correctness after Redis data migration (redis-shake).

Basic Principle

redis-full-check performs data verification by conducting a full comparison of the data between the source side and the target side in Redis. This comparison is performed by using the multi-round comparison method: The data from the source side and the target side is fetched for comparing the data differences and inconsistent data is recorded (in sqlite3 db) for the next-round comparison. After multiple rounds of comparison, data is continuously converged to reduce data inconsistency between the source database and the target database due to incremental data synchronization. The final data in sqlite is the final data differences.

The comparison conducted by redis-full-check is unidirectional: redis-full-check fetches data from source database A and checks if the data in A is also present in database B. It will not conduct reverse detection. That is, it checks whether the target database is a subset of the source database. If you want a bidirectional comparison, you need to compare data twice. The first comparison uses A as the source database and B as the target database. The second comparison uses B as the source database and A as the target database.

The following is the basic data flow diagram. redis-full-check uses the multi-round comparison, as shown in the yellow box. For each comparison, keys are fetched. In the first-round comparison, keys are fetched from the source database and the subsequent rounds of comparison fetch keys from sqlite3 db. After keys are fetched, the corresponding field and value of a key are fetched for comparison. Inconsistent data is stored in sqlite3 db for the next round of comparison.

Inconsistency Types

Redis-full-check divides data inconsistency into two types: key inconsistency and value inconsistency.

Key Inconsistency

Key inconsistency falls into the following subtypes:

  • lack_target: A key exists in the source database but does not exist in the target database.
  • type: A key exists both in the source database and the target database, but the type is inconsistent.
  • value: A key exists in both the source database and the target database and is of the same type, but the value is inconsistent.

Value Inconsistency

Different data types have different comparison criteria:

  • string: The value is different.
  • hash: A field exists and meets one of the following conditions:
  • A field exists on the source side but not on the target side.
  • A field exists on the target side but not on the source side.
  • A field exists both on the source side and the target side, but the value is different.
  • set/zset: similar to hash.
  • list: similar to hash.

The field conflict type falls into the following cases (only applicable to keys of types hash, set, zset, and list ):

  • lack_source: A field exists in a source-side key but not in a target-side key.
  • lack_target: A field does not exist in a source-side key, but the field exists in a target-side key.
  • value: A field exists both in a source-side key and a target-side key, but the values of the two fields are different.

Comparison Principle

Three compare modes (comparemode) are available:

  • KeyOutline: only compares if key values are equal.
  • ValueOutline: only compares if values have the equal length.
  • FullValue: compares if key values, value length, and values are equal.

The number of comparison rounds is determined by comparetimes (comparetimes is set to 3 by default):

  • In the first-round comparison, all keys in the source database are found. Then keys are fetched from the source database and the target database respectively.
  • The second round starts the iterative comparison and only compares inconsistent keys and fields found from the last round of comparison.
  • For key inconsistency (including lack_source , lack_target , and type), re-fetch keys and values from the source and the target databases for comparison.
  • For keys of string that have inconsistent values, compare these keys again: Fetch keys and values from the source and target databases.
  • For keys of hash, set, and zset that have inconsistent values, only re-compare inconsistent fields. Fields that have been compared and are found to be consistent do not need to be compared again. This prevents big keys from always failing the verification if updates are frequently performed.
  • For keys of list that have inconsistent values, re-compare keys: Fetch keys and values from the source and target values.
  • There is a specific interval between two rounds of comparison.

For big keys of hash, set, zset, and list, follow these rules:

  • If len is smaller or equal to 5192, use the following commands and fetch all fields and values for comparison: hgetall, smembers, zrange 0 -1 withscores, and lrange 0 -1.
  • If len is greater than 5192, use hscan, sscan, zscan, and lrange to batch-fetch fields and values.

Parameter Description

The following are the main parameters in redis-full-check:

-s, --source=SOURCE               the source Redis database address (ip:port)
-p, --sourcepassword=Password the password of the source Redis database
--sourceauthtype=AUTH-TYPE the management permission of the source database (This parameter is not required in open-source Redis.)
-t, --target=TARGET the target Redis database address (ip:port)
-a, --targetpassword=Password the password of the target Redis database
--targetauthtype=AUTH-TYPE the management permission of the target database (This parameter is not required in open-source Redis.)
-d, --db=Sqlite3-DB-FILE the location in sqlite3 db where inconsistent keys are stored (result.db by default)
--comparetimes=COUNT comparison rounds
-m, --comparemode= comparison mode
--id= used for identifying metrics
--jobid= used for identifying metrics
--taskid= used for identifying metrics
-q, --qps= QPS speed threshold
--interval=Second time interval between two comparison rounds
--batchcount=COUNT the amount of batch-aggregated data
--parallel=COUNT the number of parallel coroutines (5 by default)
--log=FILE log file
--result=FILE inconsistent results are recorded in the result file in this format: "db diff-type key field"
--metric=FILE metric file
-v, --version

For example, the source Redis database is 10.1.1.1:1234 and the target database is 10.2.2.2:5678:

./redis-full-check -s 10.1.1.1:1234 -t 10.2.2.2:5678 -p mock_source_password -a mock_target_password --metric metric --log log --result result

The metric information uses the following format:

type Metric struct {
DateTime string `json:"datetime"` // time format: 2018-01-09T15:30:03Z
Timestamp int64 `json:"timestamp"` // second-level unix timestamp
Id string `json:"id"` // run id
CompareTimes int `json:"comparetimes"` // comparison rounds
Db int32 `json:"db"` // db id
DbKeys int64 `json:"dbkeys"` // the total number of keys in the db
Process int64 `json:"process"` // progress percentage
OneCompareFinished bool `json:"has_finished"` // indicates if this comparison has finished
AllFinished bool `json:"all_finished"` // indicates if all comparisons have finished
KeyScan *CounterStat `json:"key_scan"` // the number of scanned keys
TotalConflict int64 `json:"total_conflict"` // total conflicts, including keys + fields
TotalKeyConflict int64 `json:"total_key_conflict"` // total key conflicts
TotalFieldConflict int64 `json:"total_field_conflict"` // total field conflicts
// For the two following maps, the first-layer key is of type string, including string, hash, list, set, and zset. The second key is the conflict types, including type, value, lack source, lack target, and equal.
KeyMetric map[string]map[string]*CounterStat `json:"key_stat"` // key metric
FieldMetric map[string]map[string]*CounterStat `json:"field_stat"` // field metric
}
type CounterStat struct {
Total int64 `json:"total"` // total
Speed int64 `json:"speed"` // speed
}

Sqlite 3 DB File

Results will be saved in the sqlite3 db file. If no file is specified, the result.db file under the current directory is used. If a third comparison round exists, the three following files are present: result.db. 1, result.db. 2, and result.db. 3.

  • Table key: saves inconsistent keys
  • Table field: saves inconsistent fields of hash, set, zset, and list. The list saves subscript values.
  • The key_id field in the table field is associated with the id field in the table key.
  • Table key_<N> and field_<N>: save the results after the N comparison round (that is, the intermediate results).

Example:

$ sqlite3  result.dbsqlite> select * from key;
id key type conflict_type db source_len target_len
---------- --------------- ---------- ------------- ---------- ---------- ----------
1 keydiff1_string string value 1 6 6
2 keydiff_hash hash value 0 2 1
3 keydiff_string string value 0 6 6
4 key_string_diff string value 0 6 6
5 keylack_string string lack_target 0 6 0
sqlite>
sqlite> select * from field;
id field conflict_type key_id
---------- ---------- ------------- ----------
1 k1 lack_source 2
2 k2 value 2
3 k3 lack_target 2

Reference Materials for the open-source project

Here are some reference Materials for the open-source project:

redis-full-check

Data migration tool redis-shake

Feel free to post your problems or suggestions in Issues on GitHub. You are welcome to join our open-source project development.

Reference:https://www.alibabacloud.com/blog/introducing-the-redis-full-check-tool_594759?spm=a2c41.12843076.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store