In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Today, I will talk to you about how to achieve Scrapy+adbapi in Python to improve database writing efficiency, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
One: adbapi in twisted
Both commit () and execute () of database pymysql submit data to the database synchronously. Because the data of scrapy framework is parsed and multithreaded asynchronously, the data parsing speed of scrapy is much faster than that of writing data to the database. If the data is written too slowly, it will block the database writing and affect the efficiency of database writing.
Using the twisted asynchronous IO framework, the asynchronous data writing is realized, and the data writing speed can be improved by writing data in the form of multi-thread asynchronous.
1.1 two main methods
Adbapi.ConnectionPool:
Create a database connection pool object that includes multiple connection objects, each working in a separate thread. Adbapi only provides a programming framework for asynchronous access to the database, while internally it still enables libraries like MySQLdb to access the database.
Dbpool.runInteraction (do_insert,item):
Call the do_insert function asynchronously. Dbpool will select a connection object in the connection pool to call insert_db in a separate thread, where the parameter item will be passed to the second parameter of do_insert, and the first parameter passed to do_insert is a Transaction object. Its interface is similar to the Cursor object. You can call the execute method to execute the SQL statement. After do_insert execution, the connection object will automatically call the commit method.
1.2 use examples
From twisted.enterprise import adbapi# initializes database connection pool (thread pool) # parameter 1: driver of mysql # parameter 2: configuration information of connection mysql dbpool = adbapi.ConnectionPool ('pymysql', * * params) # parameter 1: function to be executed in asynchronous task insert_db # Parameter 2: the parameter query = self.dbpool.runInteraction (self.do_insert, item) passed to the function insert_db # after execute (), there is no need for commit (), and a submission will be performed within the connection pool. Def do_insert (self, cursor, item): insert_sql = "" insert into qa_sample (need_id, need_question_uptime, need_title, need_title_describe, need_answer_uptime, need_answer) values (% s,% s % s) "" params = (item ['need_id'], item [' need_question_uptime'], item ['need_title'], item [' need_title_describe'], item ['need_answer_uptime'], item [' need_answer']) cursor.execute (insert_sql) Params) II: combine pipelines in scrapy
#-*-coding: utf-8-*-from twisted.enterprise import adbapiimport pymysql # Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html class QaSpiderPipeline (object): def process_item (self, item, spider): return item class MysqlTwistedPipeline (object): def _ _ init__ (self Dbpool): self.dbpool = dbpool @ classmethod def from_settings (cls, settings): dbparams = dict (host=settings ['MYSQL_HOST'], db=settings [' MYSQL_DBNAME'], user=settings ['MYSQL_USER'], passwd=settings [' MYSQL_PASSWORD'], charset='utf8', cursorclass=pymysql.cursors.DictCursor Use_unicode=True) dbpool = adbapi.ConnectionPool ('pymysql', * * dbparams) return cls (dbpool) def process_item (self, item, spider): query = self.dbpool.runInteraction (self.do_insert, item) def do_insert (self, cursor, item): insert_sql = "" insert into qa_sample (need_id) Need_question_uptime, need_title, need_title_describe, need_answer_uptime, need_answer) values (% s,% s) "" params = (item ['need_id'] Item ['need_question_uptime'], item [' need_title'], item ['need_title_describe'], item [' need_answer_uptime'], item ['need_answer']) cursor.execute (insert_sql, params) read the above Do you have any further understanding of how to implement Scrapy+adbapi in Python to improve database writing efficiency? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.