In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces how python crawls bilibili's attention list and database design and operation related knowledge, the content is detailed and easy to understand, simple and fast operation, has a certain reference value, I believe that after reading this python how to climb bilibili attention list and database design and operation article will have a harvest, let's take a look at it.
First, the design and operation of the database 1. Data analysis
Bilibili's watch list is at
Https://api.bilibili.com/x/relation/followings?vmid=UID&pn=1&ps=50&order=desc&order_type=attention
There are up to 50 messages on a page.
Let's roughly analyze the information.
{"code": 0, "message": "0", "ttl": 1, "data": {"list": [{.
First, the contents of the list are stored in data:list.
Second, for each item in the list, there is the following information
"mid": 672353429, "attribute": 2, "mtime": 1630510107, "tag": null, "special": 0, "contract_info": {"is_contractor": false "ts": 0, "is_contract": false, "user_attr": 0}, "uname": "Bella Kira" "face": "http://i2.hdslb.com/bfs/face/668af440f8a8065743d3fa79cfa8f017905d0065.jpg"," sign ":" energetic A-SOUL Dance Tanshen ~ Target TOP IDOL Let's cheer up! " , "official_verify": {"type": 0, "desc": "artists belonging to the virtual idol group A-SOUL"} "vip": {"vipType": 2, "vipDueDate": 1674576000000, "dueRemark": "", "accessStatus": 0, "vipStatus": 1 "vipStatusWarn": "themeType": 0, "label": {"path": "," text ":" Big member of the year " "label_theme": "annual_vip", "text_color": "# FFFFFF", "bg_style": 1, "bg_color": "# FB7299" "border_color": ""}, "avatar_subscript": 1, "nickname_color": "# FB7299" "avatar_subscript_url": "http://i0.hdslb.com/bfs/vip/icon_Certification_big_member_22_3x.png"}
Among them, mid is the user's unique UID,vipType,0 is open nothing, 1 is a big member, 2 is a big member of the year, in official_verify, type 0 represents official authentication,-1 represents no official authentication.
At the same time, we find that if the other party locks the list, it will return
{"code":-400, "message": "request error", "ttl": 1} 2, database design
Based on these, we first design the database, which contains two tables, the basic property table of user information and the concerned relational table.
Def createDB (): link=sqlite3.connect ('BiliFollowDB.db') print ("database open success") UserTableDDL=''' create table if not exists user (UID int PRIMARY KEY NOT NULL, NAME varchar NOT NULL, SIGN varchar DEFAULT NULL, vipType int NOT NULL VerifyType int NOT NULL, verifyDesc varchar DEFAULT NULL)''RelationTableDDL=''' create table if not exists relation (follower int NOT NULL, following int NOT NULL, followTime int NOT NULL, PRIMARY KEY (follower,following)) FOREIGN KEY (follower,following) REFERENCES user (UID,UID)''# create user table link.execute (UserTableDDL) # create relation table link.execute (RelationTableDDL) print ("database create success") link.commit () link.close () 3, database operation
The second is to insert the list of new users. My idea is to climb a person's watch list, throw an entire list to the function to determine whether there are new users, and send them back as the starting point for the next crawler.
Def insertUser (infos): conn=sqlite3.connect ('BiliFollowDB.db') link=conn.cursor () InsertCmd= "insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (?,?);" ExistCmd= "select count (UID) from user where UID='%d' "#% UID newID= [] for info in infos: answer=link.execute (ExistCmd%info ['uid']) for row in answer: exist_ID=row [0] if exist_ID==0: newID.append (info [' uid']) link.execute (InsertCmd, (info ['uid'], info [' name'], info ['vipType'], info [' verifyType'], info ['sign']) Info ['verifyDesc']) conn.commit () conn.close () return newID
Then there is the function of inserting the relationship, which is relatively simple.
Def insertFollowing (uid:int,subscribe): conn=sqlite3.connect ('BiliFollowDB.db') link=conn.cursor () InsertCmd= "insert into relation (follower,following,followTime) values ();" for followin subscribe: link.execute (InsertCmd, (uid,follow [0], follow [1])) conn.commit () conn.close () II.
Through observation, we found that Uncle Rui locked the five-page follow-up list.
Even if the manual operation can only access 5 pages, there is no way, let's climb 5 pages.
Def getFollowingList (uid:int): url= "https://api.bilibili.com/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"#% (UID, Page Number) infos= [] subscribe= [] for i in range (1Magne6): html=requests.get (url% (uid,i)) if html.statusstatus codewords 200: print (" GET ERROR! ") Text=html.text dic=json.loads (text) if dic ['code'] =-400: break list=dic [' data'] ['list'] for usr in list: info= {} info [' uid'] = usr ['mid'] info [' name'] = usr ['uname'] info [' vipType'] = usr ['vip'] ['vipType'] info [' verifyType'] = usr ['official_verify'] [' type'] info ['sign'] = usr [' sign'] if info ['verifyType'] =-1: info [' verifyDesc'] = 'NULL' else: info [' verifyDesc'] = usr ['official_verify'] [' desc'] Subscribe.append ((usr ['mid']) Usr ['mtime']) infos.append (info) newID=insertUser (infos) insertFollowing (uid,subscribe) return newID 3, complete code # by concyclics#-*-coding:UTF-8-*-import sqlite3import jsonimport requestsdef createDB (): link=sqlite3.connect (' BiliFollowDB.db') print ("database open success") UserTableDDL=''' create table if not exists user (UID int PRIMARY KEY NOT NULL NAME varchar NOT NULL, SIGN varchar DEFAULT NULL, vipType int NOT NULL, verifyType int NOT NULL VerifyDesc varchar DEFAULT NULL)''RelationTableDDL=''' create table if not exists relation (follower int NOT NULL, following int NOT NULL, followTime int NOT NULL, PRIMARY KEY (follower,following), FOREIGN KEY (follower) Following) REFERENCES user (UID,UID))'# create user table link.execute (UserTableDDL) # create relation table link.execute (RelationTableDDL) print ("database create success") link.commit () link.close () def insertUser (infos): conn=sqlite3.connect ('BiliFollowDB.db') link=conn.cursor () InsertCmd= "insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (? ?) "ExistCmd=" select count (UID) from user where UID='%d' "#% UID newID= [] for info in infos: answer=link.execute (ExistCmd%info ['uid']) for row in answer: exist_ID=row [0] if exist_ID==0: newID.append (info [' uid']) link.execute (InsertCmd, (info ['uid'], info [' name'], info ['vipType'], info [' verifyType'], info ['sign']) Info ['verifyDesc']) conn.commit () conn.close () return newIDdef insertFollowing (uid:int,subscribe): conn=sqlite3.connect (' BiliFollowDB.db') link=conn.cursor () InsertCmd= "insert into relation (follower,following,followTime) values (? "for follow in subscribe: try: link.execute (InsertCmd, (uid,follow [0], follow [1])) except: print ((uid,follow [0]) Follow [1]) conn.commit () conn.close () def getFollowingList (uid:int): url= "https://api.bilibili.com/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"#% (UID, Page Number) infos= [] subscribe= [] for i in range (1m 6): html=requests.get (url% (uid) I) if html.statusstatus codewords 200: print ("GET ERROR!") Return [] text=html.text dic=json.loads (text) if dic ['code'] =-400: return [] try: list=dic [' data'] ['list'] except: return [] for usr in list: info= {} info [' uid'] = usr ['mid'] Info ['name'] = usr [' uname'] info ['vipType'] = usr [' vip'] ['vipType'] info [' verifyType'] = usr ['official_verify'] [' type'] info ['sign'] = usr [' sign'] if info ['verifyType'] =-1: info [' verifyDesc'] = 'NULL' Else: info ['verifyDesc'] = usr [' official_verify'] ['desc'] subscribe.append ((usr [' mid']) Usr ['mtime']) infos.append (info) newID=insertUser (infos) insertFollowing (uid,subscribe) return newIDdef getFollowingUid (uid:int): url= "https://api.bilibili.com/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"#% (UID, Page Number) for i in range (1m 6): html=requests.get (url% (uid) I) if html.statusstatus codewords 200: print ("GET ERROR!") Return [] text=html.text dic=json.loads (text) if dic ['code'] =-400: return [] try: list=dic [' data'] ['list'] except: return [] IDs= [] for usr in list: IDs.append (usr [' mid']) return IDsdef Work (root): IDlist=root tmplist= [] while len (IDlist)! = 0: tmplist= [] for ID in IDlist: print (ID) tmplist+=getFollowingList (ID) IDlist=tmplistdef rework (): conn=sqlite3.connect ('BiliFollowDB.db') link=conn.cursor () SelectCmd= "select uid from user "answer=link.execute (SelectCmd) IDs= [] for row in answer: IDs.append (row [0]) conn.commit () conn.close () newID= [] print (IDs) for ID in IDs: ids=getFollowingUid (ID) for id in ids: if id not in IDs: newID.append (id) return newIDif _ name__==" _ _ main__ ": createDB () # work ([* * put root UID here** ]) this is the end of the article on "how python crawls bilibili's watch list and the design and operation of the database" Thank you for reading! I believe you all have a certain understanding of "how python crawls bilibili's attention list and the design and operation of the database". If you want to learn more, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.