Introducing the Redis-full-check Tool
By Zhu Zhao
Redis-full-check is a tool from the Alibaba Cloud Redis & MongoDB team that checks data consistency between two Redis databases and is usually used to check the correctness after Redis data migration (redis-shake).
Basic Principle
redis-full-check
performs data verification by conducting a full comparison of the data between the source side and the target side in Redis. This comparison is performed by using the multi-round comparison method: The data from the source side and the target side is fetched for comparing the data differences and inconsistent data is recorded (in sqlite3 db) for the next-round comparison. After multiple rounds of comparison, data is continuously converged to reduce data inconsistency between the source database and the target database due to incremental data synchronization. The final data in sqlite is the final data differences.
The comparison conducted by redis-full-check is unidirectional: redis-full-check fetches data from source database A and checks if the data in A is also present in database B. It will not conduct reverse detection. That is, it checks whether the target database is a subset of the source database. If you want a bidirectional comparison, you need to compare data twice. The first comparison uses A as the source database and B as the target database. The second comparison uses B as the source database and A as the target database.
The following is the basic data flow diagram. redis-full-check
uses the multi-round comparison, as shown in the yellow box. For each comparison, keys are fetched. In the first-round comparison, keys are fetched from the source database and the subsequent rounds of comparison fetch keys from sqlite3 db. After keys are fetched, the corresponding field and value of a key are fetched for comparison. Inconsistent data is stored in sqlite3 db for the next round of comparison.
Inconsistency Types
Redis-full-check divides data inconsistency into two types: key inconsistency and value inconsistency.
Key Inconsistency
Key inconsistency falls into the following subtypes:
lack_target
: A key exists in the source database but does not exist in the target database.type
: A key exists both in the source database and the target database, but the type is inconsistent.value
: A key exists in both the source database and the target database and is of the same type, but the value is inconsistent.
Value Inconsistency
Different data types have different comparison criteria:
- string: The value is different.
- hash: A field exists and meets one of the following conditions:
- A field exists on the source side but not on the target side.
- A field exists on the target side but not on the source side.
- A field exists both on the source side and the target side, but the value is different.
- set/zset: similar to hash.
- list: similar to hash.
The field conflict type falls into the following cases (only applicable to keys of types hash, set, zset, and list ):
lack_source
: A field exists in a source-side key but not in a target-side key.lack_target
: A field does not exist in a source-side key, but the field exists in a target-side key.value
: A field exists both in a source-side key and a target-side key, but the values of the two fields are different.
Comparison Principle
Three compare modes (comparemode
) are available:
- KeyOutline: only compares if key values are equal.
- ValueOutline: only compares if values have the equal length.
- FullValue: compares if key values, value length, and values are equal.
The number of comparison rounds is determined by comparetimes
(comparetimes
is set to 3 by default):
- In the first-round comparison, all keys in the source database are found. Then keys are fetched from the source database and the target database respectively.
- The second round starts the iterative comparison and only compares inconsistent keys and fields found from the last round of comparison.
- For key inconsistency (including
lack_source
,lack_target
, andtype
), re-fetch keys and values from the source and the target databases for comparison. - For keys of
string
that have inconsistentvalues
, compare these keys again: Fetch keys and values from the source and target databases. - For keys of
hash
,set
, andzset
that have inconsistentvalues
, only re-compare inconsistent fields. Fields that have been compared and are found to be consistent do not need to be compared again. This prevents big keys from always failing the verification if updates are frequently performed. - For keys of
list
that have inconsistentvalues
, re-compare keys: Fetch keys and values from the source and target values. - There is a specific
interval
between two rounds of comparison.
For big keys of hash
, set
, zset
, and list
, follow these rules:
- If len is smaller or equal to 5192, use the following commands and fetch all fields and values for comparison:
hgetall
,smembers
,zrange 0 -1 withscores
, andlrange 0 -1
. - If len is greater than 5192, use
hscan
,sscan
,zscan
, andlrange
to batch-fetch fields and values.
Parameter Description
The following are the main parameters in redis-full-check:
-s, --source=SOURCE the source Redis database address (ip:port)
-p, --sourcepassword=Password the password of the source Redis database
--sourceauthtype=AUTH-TYPE the management permission of the source database (This parameter is not required in open-source Redis.)
-t, --target=TARGET the target Redis database address (ip:port)
-a, --targetpassword=Password the password of the target Redis database
--targetauthtype=AUTH-TYPE the management permission of the target database (This parameter is not required in open-source Redis.)
-d, --db=Sqlite3-DB-FILE the location in sqlite3 db where inconsistent keys are stored (result.db by default)
--comparetimes=COUNT comparison rounds
-m, --comparemode= comparison mode
--id= used for identifying metrics
--jobid= used for identifying metrics
--taskid= used for identifying metrics
-q, --qps= QPS speed threshold
--interval=Second time interval between two comparison rounds
--batchcount=COUNT the amount of batch-aggregated data
--parallel=COUNT the number of parallel coroutines (5 by default)
--log=FILE log file
--result=FILE inconsistent results are recorded in the result file in this format: "db diff-type key field"
--metric=FILE metric file
-v, --version
For example, the source Redis database is 10.1.1.1:1234
and the target database is 10.2.2.2:5678
:
./redis-full-check -s 10.1.1.1:1234 -t 10.2.2.2:5678 -p mock_source_password -a mock_target_password --metric metric --log log --result result
The metric information uses the following format:
type Metric struct {
DateTime string `json:"datetime"` // time format: 2018-01-09T15:30:03Z
Timestamp int64 `json:"timestamp"` // second-level unix timestamp
Id string `json:"id"` // run id
CompareTimes int `json:"comparetimes"` // comparison rounds
Db int32 `json:"db"` // db id
DbKeys int64 `json:"dbkeys"` // the total number of keys in the db
Process int64 `json:"process"` // progress percentage
OneCompareFinished bool `json:"has_finished"` // indicates if this comparison has finished
AllFinished bool `json:"all_finished"` // indicates if all comparisons have finished
KeyScan *CounterStat `json:"key_scan"` // the number of scanned keys
TotalConflict int64 `json:"total_conflict"` // total conflicts, including keys + fields
TotalKeyConflict int64 `json:"total_key_conflict"` // total key conflicts
TotalFieldConflict int64 `json:"total_field_conflict"` // total field conflicts
// For the two following maps, the first-layer key is of type string, including string, hash, list, set, and zset. The second key is the conflict types, including type, value, lack source, lack target, and equal.
KeyMetric map[string]map[string]*CounterStat `json:"key_stat"` // key metric
FieldMetric map[string]map[string]*CounterStat `json:"field_stat"` // field metric
}type CounterStat struct {
Total int64 `json:"total"` // total
Speed int64 `json:"speed"` // speed
}
Sqlite 3 DB File
Results will be saved in the sqlite3 db file. If no file is specified, the result.db file under the current directory is used. If a third comparison round exists, the three following files are present: result.db. 1
, result.db. 2
, and result.db. 3.
- Table
key
: saves inconsistent keys - Table
field
: saves inconsistent fields of hash, set, zset, and list. The list saves subscript values. - The
key_id
field in the tablefield
is associated with the id field in the table key. - Table
key_<N>
andfield_<N>
: save the results after the N comparison round (that is, the intermediate results).
Example:
$ sqlite3 result.dbsqlite> select * from key;
id key type conflict_type db source_len target_len
---------- --------------- ---------- ------------- ---------- ---------- ----------
1 keydiff1_string string value 1 6 6
2 keydiff_hash hash value 0 2 1
3 keydiff_string string value 0 6 6
4 key_string_diff string value 0 6 6
5 keylack_string string lack_target 0 6 0
sqlite>sqlite> select * from field;
id field conflict_type key_id
---------- ---------- ------------- ----------
1 k1 lack_source 2
2 k2 value 2
3 k3 lack_target 2
Reference Materials for the open-source project
Here are some reference Materials for the open-source project:
Data migration tool redis-shake
Feel free to post your problems or suggestions in Issues on GitHub. You are welcome to join our open-source project development.