Mongodb Security and Optimization 04/26 Update SLTechnology News&Howtos

Mongodb Security and Optimization

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Improve the security of mongodb:

MongoDB does not have a password by default and only allows local access. If you open access to the public network, you must set a password and configure a firewall to specify which ip ports are only allowed to access mongodb ports, otherwise there will be security risks.

Configure the rights management mechanism:

The RBAC mechanism involves three key definitions: Roles, Privileges, and Users.

Privileges refer to resources and operations that can be performed on resources.

A character can have multiple privileges.

A user can be assigned different roles.

1. Create an administrator user:

In Linux or macOS, execute the command "mongo" to open the MongoDB command line client:

Line 1: switch to the admin database. The admin database is the database that comes with MongoDB.

Lines 3-9: create an administrator with the account name of admin, password of kingnameisgenius, role of userAdminAnyDatabase and database of admin

After creating the administrator account, enter "exit" directly in the MongoDB command line client and press enter to exit the MongoDB command line client.

Modify the created configuration file mongodb.conf and add the following two lines:

Security:

Authorization: enabled

Save the configuration file and restart the MongoDB database. Execute the "mongo" command again and find that although you can connect to the database, you can no longer perform regular operations.

To use the command line client properly, you must change the startup command of mongo to:

Mongo-u 'admin'-p' kingnameisgenius'-- authenticationDatabase 'admin'

two。 Create a normal user:

The administrator account does not have permission to operate the ordinary database. To operate a normal database, you also need to create a normal user.

After logging in to the command line client using the administrator account, execute the following command to create a normal user who has read and write access to the chapter_8 database and only read access to chapter_4.

3. Create an administrator user who can operate the database:

The administrator (admin account) can create other users, which seems to have a lot of privileges, but it cannot access any of the databases. So, if necessary, you also need to create a user who has full permissions on all databases.

(1)。 In the command line client of MongoDB, use administrator C admin) to connect to MongoDB, and then execute the following command to create a user with full control over all databases.

(2)。 Use the root user to connect to the database in the visual connection program and set the database to admin:

Performance comparison between bulk insert and itemized insert:

An insert statement may be a good thing for a few milliseconds, but the network transmission time accounts for a large proportion of the process. IO (input / output) operations are always the most time-consuming, whether it's hard disk IO or network IO. With today's broadband technology, uplink and downlink speeds can easily be hundreds of megabytes per second. If you use MangoDB to insert data one by one, a few bytes each, that is a waste of network bandwidth.

If you write to the local MongoDB, the data will be transferred around the network card and then saved to the hard disk.

If you write to a remote MongoDB, the data will first go out from the local network card, then go through the network cable, convert between electromagnetic waves, optical signals, and electrical signals, and through layer-by-layer switches and routers, or even submarine optical cables, circle the earth and then enter the network card of the target server and finally store it in the database.

Of course, bulk insertion takes into account many aspects:

1. (from redis, etc.) the amount of data to be inserted is so large that it is thrown into memory and exceeds the memory space.

2. Data addition (in redis) is suspended, and it will take a long time to continue to add data.

3. Suppose there are 100 million data in redis, what if there is a sudden power outage when reading Section 99999999?

If the data in Redis is persistent, a steady stream of new data will be added to Redis, with intervals ranging from milliseconds to hours. The code can be as follows (python):

Line 11: added a count variable to increment the variable by 1 each time the data in the Redis is fetched.

Line 21: in the case of empty Redis, if there is data in the people_info_list, no matter how much data, as long as the number of requests for Redis is a multiple of 1000, then bulk insert into the database. The advantage of this is to ensure that the data in the people_info_list waits for up to 100 seconds before it is inserted into the database. The'%'is used here to implement the get_count operation,'% 100'. The result is the remainder of get_count divided by 1000 A result of 0 means that get_count is exactly an integral multiple of 1000.

Line 24: if the Redis is found to be empty this time, pause for 0.1 seconds, which can significantly reduce the CPU footprint.

Performance comparison between insert and update:

(note: the salary field is a string, not an integer)

Update the code one by one as follows (python):

Line 7: reads all the data and outputs only the "_ id" field (the default output) and the "salary" field.

Line 8: convert the "salary" field to integer data.

Line 10: update the new "salary" field to the database according to the "_ id" segment.

It takes 68.7 seconds to update 19808 pieces of data one by one, which is longer than inserting data one by one!

Instead of updating data, insert data:

In cases where a large amount of data must be updated one by one, you can also use inserts instead of updates to improve performance.

The basic logic is to insert data into another collection, then delete the original collection, and rename the new collection to the original collection.

Lines 6-8: initialize two connections pointing to the batch collection and the update by _ insert collection, respectively.

Line 14: add the updated data to the new list.

Line 15: bulk insert the new list into the database.

It takes 3 seconds to update 119808 pieces of data and insert them into a new collection.

After the update is completed, delete the original batch collection, and then rename the new collection update _ by_insert to "batch", thus completing the batch update of the data in disguise.

Use indexes to improve query speed:

After the amount of data in a collection reaches the order of ten million, the query speed will become very slow, so it is necessary to use indexes to speed up the query.

An index is a special data structure that records the location of the data in the collection in a form that can be traversed quickly.

If you do not use an index, each query data MongoDB will traverse the entire collection; if an index is used, MongoDB will quickly find the content you need based on the index.

1. Creation of index

Mongodb uses ensureIndex to create indexes, such as:

Db.user.ensureIndex ({"name": 1})

Indicates that the name key in the user collection creates an index, where 1 indicates the direction in which the index is created. Values can be 1 and-1

In this, we do not name the index, mongodb will give us a default name, the rule is keyname1_dir1_keyname2_dir2...keynameN_dirN

Keyname represents the key name, and dir indicates the direction of the index. For example, in the above example, the index name we created is name_1.

Indexes can also be created on multiple keys, that is, federated indexes, such as:

> db.user.ensureIndex ({"name": 1, "age": 1})

This creates a joint index of name and age

In addition to letting mongodb default the name of the index, we can also go to a name that is easy to remember by specifying the value of name for ensureIndex, such as:

> db.user.ensureIndex ({"name": 1}, {"name": "IX_name"})

In this way, the index we created is called IX_name.

two。 Unique index

Similar to RDB, we can define a unique index by specifying the unique key true:

> db.user.ensureIndex ({"name": 1}, {"unique": true})

3. Check out the index we built

The information of the index is stored in the system.indexes collection of each database, which can only be modified by ensureIndex and dropIndexes, and cannot be inserted or modified manually.

You can find most of the indexes in the database through > db.system.indexes.find ():

> db.system.indexes.find ()

{"v": 1, "key": {"_ id": 1}, "ns": "test.entities", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.blog", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.authors", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.papers", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.analytics", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.user", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.food", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.user.info", "name": "_ id_"}

{"v": 1, "key": {"_ id": 1}, "ns": "test.userinfo", "name": "_ id_"}

{"v": 1, "key": {"name": 1}, "ns": "test.user", "name": "IX_name"}

4. Delete index

If the index is useless, you can delete it using dropIndexes:

> db.runCommand ({"dropIndexes": "user", "index": "IX_name"})

{"nIndexesWas": 2, "ok": 1}

Ok indicates that deletion is successful

Introduce Redis to reduce the reading frequency of MongoDB:

Use Redis to reduce the query frequency of MongoDB, so as to improve the crawling efficiency of news crawlers.

(1) read the data of MongoDB and store it in the Redis collection.

(2) use the "sadd" command of the Redis collection to add new data while judging whether the data exists.

Suppose you need to implement a crawler of a news site so that it will crawl news from various news sites and store it in MongoDB. In order not to store duplicate news, the crawler needs to determine whether the news is already in the database according to the news headlines.

If each headline queries MongoDB to see if it has been repeated, it will obviously seriously affect performance. In order to prevent frequent reading of MongoDB, Redis can be introduced to reduce the reading frequency of MongoDB.

Suppose the news is saved in the news collection in the chapter_8 library. At first, there was a lot of news in the news collection.

When the crawler starts, it first reads all the news headlines in news and puts them in a collection called news title in Redis. Next, you don't need to read MongoDB.

Every time a crawler crawls a new piece of news, it first uses the "sadd" command to add it to the collection of Redis:

If 1 is returned, it indicates that there is no previous news and inserts it into the MongoDB.

If 0 is returned, it means that there has been this news before and it is discarded directly.

Line 2: get all the news headlines.

Line 3: add all the news headlines to the collection named news_title in Redis.

Line 7: add and determine whether the news title is already in the newstitle collection. Returns 0 if it already exists, 1 if it does not exist, and adds it to the Redis collection.

Appropriately increase redundant information to improve query speed:

Take the data in one_by_one as an example. Suppose you define an identity "special person" if the age is less than 10 and the salary is greater than 10000.

If you add a field "special_person" when you insert the database, the condition is True, and the condition that is not met is False. Then it is easy to query. You can directly query all the data whose special_person field is True.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.