A Deep Dive into the Core Concepts of ApsaraDB for MongoDB

Related Concept of MongoDB

In this blog, we will discuss in details the features of ApsaraDB for MongoDB (hereinafter referred to as MongoDB).

In terms of positioning, MongoDB is between Memcached and the relational database management system (RDBMS). In terms of scalability and performance, MongoDB is closer to Memcached. In terms of functionality, MongoDB is similar to RDBMS.

MongoDB Deployment Model

In the production environment, MongoDB is often deployed as a three-node replica set or a sharded cluster.

The left of the figure above shows that when MongoDB is deployed as a replica set, the application directly requests the master node in the replica set, via the driver, to complete read-write operations.

The other two slave nodes will be automatically synchronized with the master node to keep the data updated.

If the master node fails during cluster operation, the two slave nodes will elect a new master node within seconds to continue supporting application read-write operations.

The right of the figure shows that when MongoDB is deployed as a sharded cluster, applications access the routing node through the driver. It means the mongos nodes, based on the shard key values in the read-write operations, distribute the read-write operations to specific shards for execution. Then the node merges the results of the execution and returns them to the application.

How is the data in the cluster distributed? The metadata is recorded in the configuration server, which is also a highly available replica set. Each shard manages a portion of the overall data in the cluster and is also a high-availability replica set. In addition, multiple routing nodes are deployed in the production environment. By doing so, the entire sharded cluster has no single point of failure.

MongoDB Basic Concepts and Its Mappings with Relational Database Management System

As shown in the figure above, RDBMS includes database and tables, which corresponds to database and collection in MongoDB. Data database has parent-child tables, corresponding to the nested sub-document or array of MongoDB. The index is the common part of both. Besides, a piece of data in the RDBMS is called a row, while in MongoDB is called a document, and the column in the former is called the field in the latter. The join used in the RDBMS is often solved by the embedded method in MongoDB. If the linking is used, the $Lookup can also be applied to support left join. Moreover, the view in the system is related to the read-only view and on-demand materialized view, and the multi-record ACID transaction is mapping with the multi-document ACID transaction in MongoDB.

Data Hierarchy of MongoDB

MongoDB data is mainly divided into three layers. They are documents, collections, and databases. Multiple documents are stored in one collection, multiple collections are stored in one database. Each cluster may have multiple databases as well.

Example:

  • Database: Products
  • Collections: Books, Movies, Music

The combination of databases and collections forms the MongoDB namespace:

  • Products.Books
  • Products.Movies
  • Products.Music
  • The database name cannot exceed 64 bytes in length, and the namespace cannot exceed 120 bytes
  • Feature compatibility version (FCV) should be equal to or higher than 4.4, and the namespace length is limited to 255 bytes

Data Structure of MongoDB

MongoDB uses the JSON document structure:

  • The full name of JSON: JavaScript Object Notation.
  • JSON supports the following data format:
  • string: Such as “Thomas”
  • number: Such as 29, 3.7)
  • boolean: True or false
  • null value: Null
  • array: Such as [88.5, 91.3, 67.1]
  • object: Object

Data Storage in BSON Format

MongoDB data types

The preceding figure shows a list of MongoDB data types, and almost all of the common types are supported by MongoDB.

Cluster Deployment

Install the First MongoDB System

First command: Download

Second command: Extract

Third command: Change the directory name

Fourth Command: Nothing!

Run the MongoDB

[Code comment]

[/bin/mongod]: The bin directory of MongoDB installation
[data/db]: Location of MongoDB data file

Access the MongoDB

Create replica sets

1. Create a data directory:

2. Start three MongoDB services

3. Connect to the MongoDB service:

4. Specify replica set configuration

Create sharded cluster instances

There are five steps:

  1. Create configuration server
  2. Create one or more shards, each shard is a replica set
  3. Start one or more Mongos
  4. Access Mongos and add shards to a cluster
  5. Select the shard key and enable shards

The entire sharded cluster has now been deployed.

Production environment deployment suggestions

In the production environment, some best practices for deployment in the production environment should be followed. For example,

  1. Capacity planning: Computing resources, storage capacity, IOPS, Oplog, and network bandwidth
  2. High availability: Deploy replica sets or sharded clusters
  3. Node number: Odd number of nodes are deployed in a replica set to avoid split brain.
  4. Apply the best practices for the production environment, such as
  • Using host name instead of IP
  • File system, XFS is recommended for Linux
  • Disable NUMA
  • Disable THP
  • Raising resource limits
  • Swappiness
  • Readahead
  • Tcp_Keepalive_Time
  • Clock synchronization
  • Security settings

Basic Operations

Insert New Document

Delete the Document

Delete collections through drop

  • Use DB..Drop() to delete a collection
  • All documents in the collection are deleted.
  • The related index in the collection is also deleted.

Delete databases by DropDatabase command

  • To delete a database, run the DB.dropDatabase() command.
  • The corresponding files in the database will also be deleted, and disk space will be released.

Query Data Documents by Find Command

‘Find’ is the basic query command for MongoDB.

Find the cursor that returns data.

SQL query conditions comparison

Operators query

Update Operation

Parameters required for the update operation

Parameters include

  • Parameters query
  • Parameters update

Update Arrays

  • $Push: Adds an object to the bottom of the array
  • $PushAll: Add multiple objects to the bottom of the array.
  • $Pop: Removes an object from the bottom of an array
  • $Pull: If it matches the specified value or condition, the corresponding object is removed from the array.
  • $PullAll: Removes the corresponding object from an array if it matches the specified value or condition.
  • $AddToSet: Adds a value to the array if it does not exist.

Use {Upsert: True} to update or insert

Specify parameter upsert if is null: Parameter true

If there is no matching object, no update will be performed by default

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.