Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to get Mi Store data by Python

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to obtain Mi Store data by Python". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Preface

Mi Store found the best Android apps and games for users, which are safe and reliable, but it is too troublesome to search one by one to download things. And it's not very fast.

Today, we use multithreading to crawl Mi Store's game module. Get it quickly.

II. Project objectives

Goal: application category-chat social app name, application link, displayed in the console for users to download.

Third, the libraries and websites involved

1. Website: Baidu search-Mi Store, enter the official website.

2. Libraries involved: requests, threading, queue, json, time

3. Software: PyCharm

IV. Project analysis

1. Confirm whether it is loaded dynamically.

Through the partial refresh of the page, right-click to view the source code of the web page, and the search keyword is not found. It is concluded that this site is dynamically loaded and needs to crawl network packet analysis.

2. Using chrome browser, F12 crawls network packets.

1) grab the URL address (Request URL in Headers) that returns json data.

Http://app.mi.com/categotyAllListApi?page={}&categoryId=2&pageSize=30

2) View and analyze the query parameters (Query String Parameters in headers).

Page: 1categoryId: 2pageSize: 30

It is found that only page changes again, 0123... so that we can control the direct splicing of page to concatenate multiple URL addresses that return json data.

V. Project implementation

1. We define a class class to inherit object, then define an init method to inherit self, and then define a main function main inheriting self. Prepare to import the library, url address and request header headers.

Import requestsfrom threading import Threadfrom queue import Queueimport jsonimport timeclass XiaomiSpider (object): def _ _ init__ (self): self.headers = {'User-Agent':'Mozilla/5.0'} self.url =' http://app.mi.com/categotyAllListApi?page={}&categoryId=15&pageSize=30'def main (self): passif _ _ name__ = ='_ main__': imageSpider = XiaomiSpider () imageSpider.main ()

2. Define queues, which are used to store URL addresses

Self.url_queue = Queue ()

3. URL is queued

Def url_in (self): # concatenate multiple URL addresses, and then put () to the queue for i in range (67): self.url.format ((str (I) self.url_queue.put (self.url)

4. Define the thread event function get_page (request data)

Defget_page (self): # first get () URL address and request while True:# to get the url address if not self.url_queue.empty (): url = self.url_queue.get () html = requests.get (url,headers=self.headers). Textself.parse_page (html) else:break when the queue is not empty.

5. Define the function parse_page to parse the json module, extract the application name and apply the link content.

# parsing function def parse_page (self,html): app_json = json.loads (html) for app in app_json ['data']: # Application name name = app [' displayName'] # Application Link link = 'http://app.mi.com/details?id={}'.format(app['packageName']) d = {' name': name,' Link': link} print (d)

6. The main method, which defines t_list = [] to store a list of all threads. Call get_page multithreaded crawling.

Def main (self): self.url_in () # list of all threads t_list = [] for i in range (10): t = Thread (target=self.get_page) t.start () t_list.append (t)

7. For cycles through the list to uniformly recycle threads.

# Unified recycling thread for p in t_list: p.join ()

8. Count the execution time.

Start = time.time () spider = XiaomiSpider () spider.main () end = time.time () print ('execution time:% .2f'% (end-start)) 6. Effect display

1. Run the program. Click run, the game name, download link, execution time, display in the console.

This is the end of the content of "how to get Mi Store data by Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report