In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article is about how to solve the problem of MongoDB cursor timeout. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
When we use Python to read data from MongoDB, we might write code like this:
Import pymongohandler = pymongo.MongoClient () .db.colfor row in handler.find (): parse_data (row)
Just 4 lines of code, read each line of data in MongoDB, and then pass it to parse_data for processing. Read the next line after the processing is complete. The logic is clear and simple, what can be the problem? As long as parse_data (row) does not report an error, this piece of code is perfect.
But this is not the case.
Your code may report an error on the line for row in handler.find (). The reason for it is a long story.
To explain this, we first need to know that handler.find () does not return data in the database, but a cursor object. As shown in the following figure:
The cursor actually reads data in the database only when you start iterating over it using the for loop.
However, if you connect to the database with each loop, the network connection will waste a lot of time.
So pymongo will get 100 rows at once, and the first time the for row in handler.find () loop, it will connect to MongoDB, read a hundred pieces of data, and cache them in memory. So the 2nd-100th cycle, the data is obtained directly from memory, will no longer connect to the database.
When the loop goes through 101 times, connect to the database again and read lines 101-200 again.
This logic is very effective in reducing the time-consuming of the network Iripple O.
However, the MongoDB default cursor timeout is 10 minutes. Within 10 minutes, the MongoDB must be connected again to read the refresh cursor time, otherwise, it will lead to
Causes the cursor to time out and report an error:
Pymongo.errors.CursorNotFound: cursor id 211526444773 not found
As shown in the following figure:
So, going back to the original code, if parse_data executes for more than 6 seconds at a time, it takes more than 10 minutes to execute 100 times. At this point, when the program wants to read the 101st line of data, the program will report an error.
In order to solve this problem, we have four ways:
Modify the configuration of MongoDB, extend the cursor timeout, and restart MongoDB. Since the MongoDB in the production environment cannot be casually restarted, this solution is useful but excluded.
Read all the data at once and then process it:
All_data = [row for row in handler.find ()] for row in all_data: parse (row)
The disadvantages of this scheme are also obvious, if the amount of data is very large, you may not be able to put it all in memory. Even if you can put it all in memory, the list push traverses all the data, and then the for loop traverses it again, wasting time.
3. Let the cursor return less than 100 pieces of data at a time, so that the time to consume this batch of data will be less than 10 minutes:
# only 50 rows of data for row in handler.find () .batch_size (50): parse_data (row) are returned each time you connect to the database
However, this scheme will increase the number of connections to the database, which will increase the time-consuming of the database.
4. Let the cursor never time out. By setting the parameter no_cursor_timeout=True, the cursor never times out:
Cursor = handler.find (no_cursor_timeout=True) for row in cursor: parse_data (row) cursor.close () # be sure to close the cursor manually
However, this operation is very dangerous, because if your Python program stops accidentally for some reason, the cursor can no longer be closed! Unless you restart MongoDB, these cursors will remain on the MongoDB, consuming resources.
Of course, some people may say that try...except is used to enclose the place where the data is read, as long as an exception is thrown and the cursor is closed when handling the exception:
Cursor = handler.find (no_cursor_timeout=True) try: for row in cursor: parse_data (row) except Exception: parse_exception () finally: cursor.close () # be sure to close the cursor manually
The code in finally, with or without exceptions, will be executed.
But writing like this will make the code very ugly. To solve this problem, we can use the context manager of the cursor:
With handler.find (no_cursor_timeout=True) as cursor: for row in cursor: parse_data (row)
As soon as the program exits the indentation of with, the cursor automatically closes. If the program reports an error halfway, the cursor will also close.
Its principle can be explained by the following two pieces of code:
Class Test: def _ _ init__ (self): self.x = 1 def echo (self): print (self.x) def _ enter__ (self): print ('enter context') return self def _ _ exit__ (self, * args): print ('exit context') with Test () as t: t.echo () print ('exit indent')
The running effect is shown in the following figure:
Next, artificially create an exception in the indentation of the with:
Class Test: def _ _ init__ (self): self.x = 1 def echo (self): print (self.x) def _ enter__ (self): print ('enter context') return self def _ _ exit__ (self, * args): print ('exit context') with Test () as t: t.echo () 1 +'a'# there must be an error print ('exit indent')
The running effect is shown in the following figure:
No matter what happens in the indentation of with, the code in the _ _ exit__ in the Test class will always run.
Let's take a look at the _ _ exit__ in the cursor object of pymongo, as shown in the following figure:
As you can see, this is the operation to close the cursor.
Therefore, if we use the context manager, we can safely use the no_cursor_timeout=True parameter.
Thank you for reading! This is the end of this article on "how to solve the problem of MongoDB cursor timeout". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.