Smart Metadata Management Solution Using Table Store

Alibaba Cloud
7 min readMay 21, 2019

By Wang Tantan (Si Ming)

Background

When users store massive documents, media files and other data, it is essential to manage the file metadata. Metadata has multi-dimensional field information. Basic information includes the file size, creation time, user and so on. With the development of artificial intelligence, extracting the core elements of files through AI technology has become an important part of file metadata. Taking an image as an example: Users can use the smart media services to obtain and analyze the core tags of the image, and score the tags. They can also extract face recognition related information, geographic location information and other information, and the extracted information also needs to be stored in the file metadata information. As a result, the amount of information on file metadata is constantly increasing, and the formats and types are also constantly diversified.

Scenario

A smart media management platform provides users with file management services (such as images, and videos). Users can analyze target files through self-developed (or purchased) smart media analysis tools. The original metadata information is enriched with the analyzed information. Therefore, the platform needs an effective metadata management solution to provide users with the functions to manage, analyze, and collect metadata information. Example:

User A — All pictures that meet the following conditions and they are sorted by tag scores: [files of user A] [last year] {tags containing [Happy]}

User B — All videos that meet the following conditions and they are sorted by the similarity of the celebrity: [files of user B] * [XX celebrity has appeared]

A sample of the management system is as follows: Project Sample

Technical Considerations

For smart metadata management systems, the technical factors that need to be considered generally include the following:

  • Query capability: The database should have powerful query capability (such as the multi-type index, and the multi-dimensional combination query), as well as sorting, statistics and other functions;
  • Horizontal scaling (multi-field): The metadata has a wide variety of field types, with frequent field changes, additions and deletions. The database should be schema free to ensure the horizontal scaling capability;
  • Vertical scaling (data volume): Massive files correspond to massive metadata. In the face of data expansion, the database must meet basic requirements, such as scalability and low cost;
  • Service performance: While dealing with high concurrent requests, the database ensures low latency, strong consistency and high availability.

Table Store

Through the SearchIndex solution developed by Table Store, the problem of managing massive metadata can be effectively solved. Table Store is out-of-the-box and pay-as-you-go.

Table Store is a fully-hosted, zero-maintenance and distributed NoSQL data storage service from Alibaba Cloud that provides features such as storage of massive amounts of data, automatic sharding of hot data, and multi-dimensional retrieval of massive amounts of data. Table Store can efficiently solve the data explosion challenge. SearchIndex can be created at any time, which is an appropriate solution for metadata management

At the same time, SearchIndex provides multi-dimensional data search, statistics and other capabilities on the basis of ensuring high availability of user data. You can create multiple indexes for multiple scenarios to achieve retrieval in multiple modes. You can create and activate indexes as needed. Table Store ensures the consistency of data synchronization, greatly reducing the work required for your solution design, service maintenance, and code development.

Overview of the Smart Metadata Management System Built Based on Table Store

The sample is integrated in the Table Store console. You can log on to the console to experiment with the system. (If you are a new Table Store user, you need to click Activate Now for a trial of this service. The service activation is free. Metadata is stored in public instances. A trial doesn’t consume storage, network traffic, or CUs.)

Note: This sample provides file metadata at the scale of 100 million entries. Official console address: Project Sample

Preparation for Building

If you are interested in the smart metadata management system and want to build your own system, you can follow these steps:

(1) Activate Table Store

Activate the Table Store service in the console. Table Store is out-of-the-box (post-paid) and billed on a pay-as-you-go basis. Table Store also provides a free quota that is sufficient for functional tests. For more information, visit Table Store Console and Free quota description.

(2) Create an Instance

Create a Table Store instance in the console and select a region that supports SearchIndex. (Currently the SearchIndex feature has not been commercialized and is supported in the following regions:Beijing, Shanghai, Hangzhou, and Shenzhen. This feature will be gradually available in other regions.)

After the instance is created, open a ticket to apply for the SearchIndex beta test invitation. (After becoming commercialized, SearchIndex will be enabled by default. No fees will be incurred if the feature is not used.)

  • Beta test invitation request: Open a ticket, select”Table Store” > “Product Features and Characteristics” > “Create a Ticket”. The application content is as follows:
  • Question description: Please enter “Apply for SearchIndex beta test invitation”
  • Confidential information: Please enter region + Instance name, for example, Shanghai + myInstanceName

(3) Download SDKs

Use SDKs with SearchIndex (see the official website for more details). Currently, new functions are added for Java, Go, and Node.js SDKs.

Java-SDK

<dependency>
<groupId>com.aliyun.openservices</groupId>
<artifactId>tablestore</artifactId>
<version>4.8.0</version>
</dependency>

Go-SDK

$ go get github.com/aliyun/aliyun-tablestore-go-sdk

Nodejs-SDK

$ npm install tablestore@4.1.0

(4) Design a Table

Table name: order_contract

Start Building (Core Code)

(1) Create a Data Table

To create a smart metadata table, users only need to maintain one instance and create the table under the instance as follows:

Create and manage data tables through the console (users can also directly create data tables through the SDK):

(2) Create a Data Table Index

Table Store automatically synchronizes full and incremental index data: Users can create and manage SearchIndex through the console (or, they can also create it through the SDK):

(3) Data Import

Insert some test data (100 million entries of data are inserted into the console sample. Users can insert a small amount of test data on the console);

File IDFile ID (MD5 primary key)User IDTag (array string)TypeLinkSizef0525357421bce….u05254[{“score”:99.999999,”tag”:”Table Store”},{“score”:78.962224,”tag”:”Hail”},{“score”:18.328385,”tag”:”Happy”},{“score”:16.886812,”tag”:”Snow Mountain”}]imagehttps://prd-console-demo.oss-cn-hangzhou.aliyuncs.com/image/imm1.jpg9022066

(4) Data Reading

Data reading falls into two types:

1. Primary Key Reading

The primary key column is obtained based on the native Table Store: getRow, getRange, batchGetRow. Primary key reading is used for index (automatic) reverse lookup. Users can also provide a single query page for the primary key (File ID MD5). And the query speed is extremely fast at the scale of 100 million entries of data. Multi-dimensional retrieval is not supported for the single primary key query;

2. Index Reading

Query based on the new SearchIndex function: the search interface. Users can freely design multi-dimensional combination queries for index fields. By setting and selecting different query parameters, different query criteria and different sorting methods are built. Currently, exact query, range query, prefix query, match query, wildcard query, phrase match query, word breaking string query nested query, and geo query are supported, and they are combined by boolean AND and OR.

For example, information for the file with [tag: Table Store, creation time (2018–01–01, 2018–12–01)]: (the SDK and the control query)

List<Query> mustQueries = new ArrayList<Query>();// Nested query 
TermQuery termQuery = new TermQuery();
termQuery.setFieldName("tags.tag");
termQuery.setTerm(ColumnValue.fromString("Table Store"));
NestedQuery nestedQuery = new NestedQuery();
nestedQuery.setPath("tags");
nestedQuery.setScoreMode(ScoreMode.Avg);
nestedQuery.setQuery(termQuery);
mustQueries.add(nestedQuery);
// Range query
RangeQuery rangeQuery = new RangeQuery();
rangeQuery.setFieldName("createdAt");
rangeQuery.setFrom(ColumnValue.fromLong(1514793600000, true);
rangeQuery.setTo(ColumnValue.fromLong(1543651200000, false);
mustQueries.add(rangeQuery);
//Exact query
TermQuery termQuery = new TermQuery();
termQuery.setFieldName("type");
termQuery.setTerm(ColumnValue.fromString("image"));
mustQueries.add(termQuery);
BoolQuery boolQuery = new BoolQuery();
boolQuery.setMustQueries(mustQueries);

Reference:https://www.alibabacloud.com/blog/smart-metadata-management-solution-using-table-store_594802?spm=a2c41.12883498.0.0

--

--

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com