Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the crawling method of python spider transaction data and sales data

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, the editor will share with you the relevant knowledge of python spider transaction data and data crawling methods for sale. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

Directory structure:

HomeLinkTest: Android project (used to crack Lianjia's App signature verification content)

JsonSource: Lianjia client json upload content sample, including (transaction list page, transaction details page, more content page) (list page of goods for sale, details page of goods for sale, more content page for goods on sale)

Spider: Lianjia crawler script (python script) (crawling online data on PC, mobile data on sale and transaction data)

Realize the function: one. Web interface crawling

Crawling web interface content on sale https://bj.lianjia.com/ershoufang/ only crawls content on sale (using regular expressions for content matching and result output) (common crawler methods, analysis interface html for content acquisition, using dynamic proxy camouflage client to access specific content into reference code)

Python LianjiaSpider/spider/salingInfoSpider.py

Use a proxy server (open source address):

Https://raw.githubusercontent.com/fate0/proxylist/master/proxy.list

(the content of the proxy server in the project can be used in other projects)

Set the proxy server, grab the content and store the content in the relative directory excle directory. The running diagram is shown in the figure:

(enter the number of pages as the current page of Lianjia's PC page)

The crawl result is shown in the figure (generate a LianJiaSpider.xls excle table in the relative salingInfoSpider.py directory):

two。 Mobile data crawling (sale, transaction)

Based on Lianjia app: https://bj.lianjia.com/ to crack his signature verification.

Get the corresponding json content and crawl automatically (only for technical exchange, please do not engage in commercial applications or other infringing acts)

Data crawling for sale:

Python LianjiaSpider/spider/zaishou/zaiShouSpider.py

Set the number of crawled pages and the number of data per page

The resulting generation generates excle in the sibling directory, as shown in the figure:

Transaction data crawling:

Python LianjiaSpider/spider/zaishou/chengJiaoJiaSpider.py

Modify global settings, log out of manual input, or use manual input:

The number of crawled pages can be set in chengJiaoJiaSpider.py. In fact, the location starts from page 0, so it starts at-100.

The transaction data are shown in the figure:

Automatic crawling of sale and transaction data:

Python LianjiaSpider/spider/Spider_Thread_Manager.py above is all the contents of this article entitled "what is the crawling method of python spider transaction data and sales data?" Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report