A Deep Dive into the Core Concepts of ApsaraDB for MongoDB

Alibaba Cloud
8 min readAug 18, 2021

--

Related Concept of MongoDB

In this blog, we will discuss in details the features of ApsaraDB for MongoDB (hereinafter referred to as MongoDB).

In terms of positioning, MongoDB is between Memcached and the relational database management system (RDBMS). In terms of scalability and performance, MongoDB is closer to Memcached. In terms of functionality, MongoDB is similar to RDBMS.

MongoDB Deployment Model

In the production environment, MongoDB is often deployed as a three-node replica set or a sharded cluster.

The left of the figure above shows that when MongoDB is deployed as a replica set, the application directly requests the master node in the replica set, via the driver, to complete read-write operations.

The other two slave nodes will be automatically synchronized with the master node to keep the data updated.

If the master node fails during cluster operation, the two slave nodes will elect a new master node within seconds to continue supporting application read-write operations.

The right of the figure shows that when MongoDB is deployed as a sharded cluster, applications access the routing node through the driver. It means the mongos nodes, based on the shard key values in the read-write operations, distribute the read-write operations to specific shards for execution. Then the node merges the results of the execution and returns them to the application.

How is the data in the cluster distributed? The metadata is recorded in the configuration server, which is also a highly available replica set. Each shard manages a portion of the overall data in the cluster and is also a high-availability replica set. In addition, multiple routing nodes are deployed in the production environment. By doing so, the entire sharded cluster has no single point of failure.

MongoDB Basic Concepts and Its Mappings with Relational Database Management System

As shown in the figure above, RDBMS includes database and tables, which corresponds to database and collection in MongoDB. Data database has parent-child tables, corresponding to the nested sub-document or array of MongoDB. The index is the common part of both. Besides, a piece of data in the RDBMS is called a row, while in MongoDB is called a document, and the column in the former is called the field in the latter. The join used in the RDBMS is often solved by the embedded method in MongoDB. If the linking is used, the $Lookup can also be applied to support left join. Moreover, the view in the system is related to the read-only view and on-demand materialized view, and the multi-record ACID transaction is mapping with the multi-document ACID transaction in MongoDB.

Data Hierarchy of MongoDB

MongoDB data is mainly divided into three layers. They are documents, collections, and databases. Multiple documents are stored in one collection, multiple collections are stored in one database. Each cluster may have multiple databases as well.

Example:

  • Database: Products
  • Collections: Books, Movies, Music

The combination of databases and collections forms the MongoDB namespace:

  • Products.Books
  • Products.Movies
  • Products.Music
  • The database name cannot exceed 64 bytes in length, and the namespace cannot exceed 120 bytes
  • Feature compatibility version (FCV) should be equal to or higher than 4.4, and the namespace length is limited to 255 bytes

Data Structure of MongoDB

MongoDB uses the JSON document structure:

  • The full name of JSON: JavaScript Object Notation.
  • JSON supports the following data format:
  • string: Such as “Thomas”
  • number: Such as 29, 3.7)
  • boolean: True or false
  • null value: Null
  • array: Such as [88.5, 91.3, 67.1]
  • object: Object
{
"firstName": "Thomas",
"lastName": "Smith",
"age": 29
}

Data Storage in BSON Format

MongoDB data types

The preceding figure shows a list of MongoDB data types, and almost all of the common types are supported by MongoDB.

Cluster Deployment

Install the First MongoDB System

First command: Download

curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-rhel70-4. 4.2.tgz

Second command: Extract

tar xzvf mongodb-linux-x86_64-rhel70-4.4.2.tgz

Third command: Change the directory name

mv mongodb-linux-x86_64-rhel70-4.4.2 mongodb

Fourth Command: Nothing!

Run the MongoDB

/bin/mongod --dbpath /data/db

[Code comment]

[/bin/mongod]: The bin directory of MongoDB installation
[data/db]: Location of MongoDB data file

Access the MongoDB

$ ./bin/mongo MongoDB
// Bin directory installed
MongoDB shell version: 4.4.2
...
Server has startup warnings:
2020-12-15T04:23:25.268+0000 I CONTROL[initandlisten]
2020-12-15T04:23:25.268+0000 I CONTROL [initandlisten] ** WARNIN
G: Access control is not enabled for the database.
...

Create replica sets

1. Create a data directory:

mkdir rs1 rs2 rs3

2. Start three MongoDB services

mongod --replSet rs --dbpath ./rs1 --port 27017 --fork --logpath ./rs 1/mongod.log 
mongod --replSet rs --dbpath ./rs2 --port 27018 --fork --logpath ./rs 2/mongod.log
mongod --replSet rs --dbpath ./rs3 --port 27019 --fork --logpath ./rs 3/mongod.log

3. Connect to the MongoDB service:

mongo //connect to the default port 27017

4. Specify replica set configuration

rs.initiate() // Initial replication set 
rs.add ('<HOSTNAME>:27018') // Add a node configuration
rs.add('<HOSTNAME>:27019') // Add a node configuration
rs.status()

Create sharded cluster instances

There are five steps:

  1. Create configuration server
  2. Create one or more shards, each shard is a replica set
  3. Start one or more Mongos
  4. Access Mongos and add shards to a cluster
  5. Select the shard key and enable shards

The entire sharded cluster has now been deployed.

Production environment deployment suggestions

In the production environment, some best practices for deployment in the production environment should be followed. For example,

  1. Capacity planning: Computing resources, storage capacity, IOPS, Oplog, and network bandwidth
  2. High availability: Deploy replica sets or sharded clusters
  3. Node number: Odd number of nodes are deployed in a replica set to avoid split brain.
  4. Apply the best practices for the production environment, such as
  • Using host name instead of IP
  • File system, XFS is recommended for Linux
  • Disable NUMA
  • Disable THP
  • Raising resource limits
  • Swappiness
  • Readahead
  • Tcp_Keepalive_Time
  • Clock synchronization
  • Security settings

Basic Operations

Insert New Document

insertOne db.products.insertOne( { item: "card", qty: 15 } );
insertMany
db.products.insertMany( [ { _id: 10, item: "large box", qty: 20 }, { _id: 11, item: "small box", qty: 55 }, { _id: 12, item: "medium box", qty: 30 } ] );
Insert db.collection.insert( <document or array of documents>, { writeConcern: <document>, ordered: <boolean> } )

Delete the Document

deleteOne 
db.orders.deleteOne( { "_id" : ObjectId("563237a41a4d68582c2509da") } );
db.orders.deleteOne( { "expirationTime" : { $lt: ISODate("2015-11-01T12:40:15Z") } } );
deleteMany
db.orders.deleteMany( { "client" : "Crude Traders Inc." } );
remove
db.collection.remove( <query>, <justOne> )

Delete collections through drop

  • Use DB..Drop() to delete a collection
  • All documents in the collection are deleted.
  • The related index in the collection is also deleted.
db.colToBeDropped.drop()

Delete databases by DropDatabase command

  • To delete a database, run the DB.dropDatabase() command.
  • The corresponding files in the database will also be deleted, and disk space will be released.
use tempDB 
db.dropDatabase()
show collections // No collections
show dbs // The db is gone

Query Data Documents by Find Command

‘Find’ is the basic query command for MongoDB.

Find the cursor that returns data.

db.movies.find( { "year" : 1975 } ) // Single-condition query 
db.movies.find( { "year" : 1989, "title" : "Batman" } ) // Multi-condition and query
db.movies.find( { $or: [{"year" : 1989}, {"title" : "Batman"}] } ) // Multi-condition or query
db.movies.find( { $and : [ {"title" : "Batman"}, { "category" : "action" }] } ) // and query
db.movies.find( { "title" : /^B/} ) // Search by regular expression

SQL query conditions comparison

a = 1 -> {a: 1} 
a <> 1 -> {a: {$ne: 1}}
a > 1 -> {a: {$gt: 1}}
a >= 1 -> {a: {$gte: 1}}
a < 1 -> {a: {$lt: 1}}
a <= 1 -> {a: {$lte: 1}}
a = 1 AND b = 1 -> {a: 1, b: 1} or {$and: [{a: 1}, {b: 1}]}
a = 1 OR b = 1 -> {$or: [{a: 1}, {b: 1}]}
a IS NULL -> {a: {$exists: false}}
a IN (1, 2, 3) -> {a: {$in: [1, 2, 3]}}

Operators query

$lt: Exists and is less 
$lte: Exists and is less than or equal to
$gt: Exists and is greater
$gte: Exists and is greater than or equal to
$ne: Does not exist or exists but is not equal to
$in: Exists and in the specified array
$nin: Does not exist or is not in the specified array
$or: Matches one of two or more conditions
$and: Matches all conditions

Update Operation

Parameters required for the update operation

Parameters include

  • Parameters query
  • Parameters update
// insert data 
db.movies.insert( [
{
"title" : "Batman",
"category" : [ "action", "adventure" ],
"imdb_rating" : 7.6,
"budget" : 35
},
{
"title" : "Godzilla",
"category" : [ "action", "adventure", "sci-fi" ],
"imdb_rating" : 6.6 },
{
"title" : "Home Alone",
"category" : [ "family", "comedy" ],
"imdb_rating" : 7.4 }
] )
db.movies.update( { "title" : "Batman" }, { $set : { "imdb_rating" : 7.7 } } )
//"title" : "Batman" : Query Batman
//$set : { "imdb_rating" : 7.7 }: Update IMDB rating field

Update Arrays

  • $Push: Adds an object to the bottom of the array
  • $PushAll: Add multiple objects to the bottom of the array.
  • $Pop: Removes an object from the bottom of an array
  • $Pull: If it matches the specified value or condition, the corresponding object is removed from the array.
  • $PullAll: Removes the corresponding object from an array if it matches the specified value or condition.
  • $AddToSet: Adds a value to the array if it does not exist.

Use {Upsert: True} to update or insert

Specify parameter upsert if is null: Parameter true

If there is no matching object, no update will be performed by default

db.movies.update( { "title" : "Jaws" }, { $inc: { "budget" : 5 } }, 
{ upsert: true } )
// upsert: true : If "Jaws" is not found
// Just add a " Jaws"
“_id” : ObjectId("5847f65f83432667e51e5ea8"),
"title" : "Jaws",
"budget" : 5
}

Original Source:

--

--

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com