Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Spyfari

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is to share with you about how to use Spyfari. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Spyfari

Writing crawler rules with javascrip

Visual crawler software.

I used it to crawl some data from the famous p station.

Here, let's review the steps for the crawler to crawl data:

1. Look at the structure of the website through the browser and extract the tags where you need to get the data. Here, when dealing with all kinds of element nodes, you need to explain the page tags, using class selection, id selection, parent nodes, child nodes, regular expressions and so on. If you are using python, there is a corresponding library, such as BeautifulSoup

2. Deal with automatic login, CAPTCHA, etc.

3. Get the data, or url, and store the data locally or in the database

4. Crawl other data again according to url, or download files (including pictures, videos, text, web pages, etc.) according to url.

It's something like this.

There are two big headaches:

1. Access only by registering a login account.

2. All kinds of verification codes

If you log in, you can get the cookie after login, and then simulate it every time you crawl.

But for CAPTCHA, if you encounter a perverted CAPTCHA, cry.

Like this:

Anyway, I can't break it.

However, with Spyfari, manual click down, this is very easy.

The next time you log in, the login status has been saved, ha.

Unless you need a CAPTCHA at every step, this is unlikely to happen, after all, at the expense of user experience.

This is the advantage of visualization, all kinds of websites can be crawled.

Today, I climbed down the information from the various rankings of the p station and took down the pictures.

Friends who like two dimensions should like to watch it.

P station some of the rankings, I have saved as a json file, every day to update the ranking, it seems that you can regularly climb.

The above pixivRank.js is crawled code, I will package it into spyfari, as an example.

The first contact can be opened directly in spyfari, run, and experience the fun of crawling data.

The following is the format of the data I crawled:

Mainly crawled the ranking, author, picture url, and submission date

This page is loaded asynchronously, and you need to constantly slide to the bottom of the page to get the data, but for spyfari, this is very easy, after all, it is a visual crawler tool, ha, you can fully simulate human operation, and the process is still visible.

Among them, the submission date is obtained asynchronously, which needs to simulate the mouse click and then obtain the data.

It is easy for Spyfari to handle content loaded asynchronously.

This is the picture downloaded from the test code after repairing some bug of Spyfari today.

Let's take a look at the work page.

The top column is:

Scheduled tasks, cloud code sharing, operation instructions, shutting down spyfari.

It hasn't been further developed yet, and I'll continue to improve it next week.

The next step is:

1, need to crawl data URL control, call is my streamlined chrome browser, after all, crawl data the first step is to analyze the structure of the web page, ah, to facilitate debugging code, but also convenient to locate tags.

2. I have integrated an editor where crawling code is written. Simple and easy to use, debug the code in the right browser, copy it directly, save it locally or open the existing crawling code locally.

When the code is finished, click the run button directly.

Happy to work, the right can also see the operation in real time, including some simulation login, simulation of mouse clicks, sliding movements ah ~ at a glance.

3, is some crawl results output, as well as log output, I integrated some api, easy to use.

If there is a download action, such as downloading a picture, it will also be printed automatically. The function of progress prompt will be improved later.

The following picture is the output of the downloaded information:

Thank you for reading! This is the end of the article on "how to use Spyfari". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report