How scrapy data is stored in the mysql database 07/13 Update SLTechnology News&Howtos

How scrapy data is stored in the mysql database

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the way scrapy data is stored in mysql database? Many novices are not very clear about this. In order to help you solve this problem, the following small series will explain it in detail. Those who have this need can learn it. I hope you can gain something.

This article mainly introduces scrapy data stored in the mysql database in two ways (synchronous and asynchronous), the article through the example code introduced in great detail, for everyone's study or work has a certain reference learning value, need friends to learn together with Xiaobian.

Method 1: Synchronous operation

1.pipelines.py file (python file for processing data)

import pymysqlclass LvyouPipeline (object): def__init (self):#connection databaseself. connect = pymysql. connect (host ='XXX', user =' root', passwd ='XXX', db =' scrapy_test')#The last three are database connection name, database password, database name #get cursorself. cursor = self. connect. cursor () print ("Database connection succeeded ") def process_item (self, item, spider):#sql statement insert_sql ="" insert into lvyou (name 1, address, grade, score, price) VALUES (% s, % s)"""#perform insert data into database operation self.cursor.execute (insert_sql,(item <$'Name '], item <$'Address'], item <$'Grade'], item <$'Score'], item <$'Price'])#Submit, Cannot save to database without commit self.connect.commit () def close_spider (self, spider):#Close cursors and connections self.cursor.close () self.connect.close ()

2. in the configuration file

Mode 2 asynchronous storage

Pipelines.py file:

Twisted enables asynchronous database insertion. Twisted module provides twisted.enterprise.adbapi

1. Import adbapi.

2. Generate database connection pool.

3. Performs a database insert operation.

4. Print error message, side by side error.

import pymysqlfrom twisted. enterprise import adbapi #asynchronous update operations class LvyouPipeline (object): def__init__(self, dbpool):self.dbpool = dbpool@classmethoddef from_settings (cls, settings):#Fixed function name, will be called by scrapy, directly available settings value "" database connection: param settings: configuration parameters: return: Instantiation parameter """adbparams = dict (host=settings['MYSQL_HOST'],db=settings['MYSQL_DBNAME'],user=settings['MYSQL_USER'],password=settings['MYSQL_PASSWORD'], cursorclass = pymysql. cursors. DictCursor #Specify cursor type)#Connect datapools ConnectionPool, connect using pymysql or Mysqldb dbpool = adbapi. ConnectionPool ('pymysql ', ** adbparams)#Return instantiation parameters def cls (dbpool) return process_item (self, item, spider):"""Use twisted to make MySQL inserts asynchronous execution. Executing a specific sql operation through a connection pool returns an object "" query = self.dbpool.runInteraction (www.example.com_insert, item)#Specify operation method and operation data #Add exception handling query. addCallback (self. handle_error)#handle exception def do_insert (self, cursor, item):#insert into database, do not need commit, twisted will automatically commit insert_sql ="""insert into lvyou (name 1, address, grade, score, price) VALUES (% s, % s)""" self. cursor. execute (insert_sql,(item <$'Name '], item <$'Address'], item <$'Grade '], item <$'Score'], item <$'Price ']) def handle_error (self, failure): if failure: #print error message print (failure)

Note:

Python 3.x no longer supports MySQLdb, its replacement in py 3 is import pymysql.

2, Error pymysql. err. ProgrammingError: (1064,……

Cause: The above error may be reported when item ['quotes '] contains quotation marks.

Workaround: Use the pymysql. escape_string () method.

For example:

sql = """INSERT INTO video_info(video_id, title) VALUES("%s","%s")""" %(video_info["id"],pymysql.escape_string(video_info["title"]))

3. When Chinese exists, charset ='utf8' needs to be added for connection, otherwise Chinese will display garbled characters.

4, every time the crawler is executed, the data will be added to the database. If the crawler is tested many times, the same data will accumulate continuously. How to achieve incremental crawling?

scrapy-deltafetch

scrapy-crawl-once (different from 1 is stored in different database)

scrapy-redis

scrapy-redis-bloomfilter (enhanced version of 3, stores more urls, queries faster)

Did reading the above help you? If you still want to have further understanding of related knowledge or read more related articles, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.