Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Reptiles share Fengyun 2 satellite meteorological photos

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Overview of crawlers

As early as 2016, I used my Aliyun ECS to run a long-term crawler program to collect Fengyun-2 weather satellite photos, and now I finally have time to come back during the holiday to check the results. The simple statistics are as follows (attached picture):

Total number of pictures: 45869 files

Earliest file: 201609131345.jpg

Latest file: 201910091415.jpg

Crawler running process

1, data source:

At that time accidentally obtained a long-term valid data source, its URL parameters, for the time format, that is to say, there is an obvious rule. Suitable for crawler operation

2. Crawler runs:

In fact, there are two operating conditions for this crawler: 1 to download regularly and 2 to run without downtime.

Since it is meteorological data, of course it has to be fetched regularly, so I wrote a windows service that runs continuously in the background. Although there were several interruptions due to other problems in the server, it could not stand for a long time, so a lot of data were collected successfully.

Another requirement is that the computer cannot be turned off and the program needs to run continuously. My final solution is to deploy and run using CVM ECS. Avoid using personal PC to run on for a long time.

3, processing storage:

First of all, since this program is going to run for a long time, it is very taboo that all the results are concentrated in a local folder, so I chose Ali Cloud object storage OSS. At the same time, because I need not to shut down my computer, I end up using the same region for ECS+ object storage OSS,ECS and OSS on the cloud, so the private network can be connected and transferred quickly. In this way, I will only do a temporary storage locally and no longer occupy the local hard disk after OSS, so that I can rest assured that I can start up and run "maintenance-free" for several years.

4, reptile shuts down

I didn't mean to shut down the crawler. I checked the log during the holiday to know that the data source had failed. So this reptile is dead. Although I can look for new and effective data sources, because I don't use the old data well, I won't look for the new data for the time being.

The crawler work is completed and the final result is downloaded.

I choose the OSS client tool recommended by Aliyun. Because I have planned ECS and OSS in the same region in advance, I can download it directly to my ECS via private network. The speed is still very fast, and the tool display is larger than 60MB/s. The screenshot of the tool section is shown in the figure:

In order to avoid the expense of OSS public network download traffic (in fact, it is poor: download 6G files directly, the busy time price is 1.50 yuan), after downloading the OSS data from the private network, I download it back to my local PC with the bandwidth of ECS. View locally:

Original address: updated editors of https://www.opengps.cn/Blog/View.aspx?id=590 articles are subject to this link. Welcome to follow the original articles of Origin Server!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report