In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to execute website crawler regularly in Python? in view of this problem, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.
Write crawler code
Write a crawler that crawls and parses Yahoo using requests and beautifulsoup4 packages! Stock market-listed transaction price ranking with Yahoo! Stock market-the information of OTC transaction price ranking, and then use pandas package to display the parsed data.
Import datetime
Import requests
From bs4 import BeautifulSoup
Import pandas as pd
Def get_price_ranks ():
Current_dt = datetime.datetime.now () .strftime ("% Y-%m-%d% X")
Current_dts = [current_dt for _ in range]
Stock_types = ["tse", "otc"]
Price_rank_urls = ["https://tw.stock.yahoo.com/d/i/rank.php?t=pri&e={}&n=100".format(st) for st in stock_types]
Tickers = []
Stocks = []
Prices = []
Volumes = []
Mkt_values = []
Ttl_steps = 100.100
Each_step = 10
For pr_url in price_rank_urls:
R = requests.get (pr_url)
Soup = BeautifulSoup (r.text, 'html.parser')
Ticker = [i.text.split () [0] for i in soup.select (".name a")]
Tickers + = ticker
Stock = [i.text.split () [1] for i in soup.select (".name a")]
Stocks + = stock
Price = [float (soup.find_all ("td") [2] .find _ all ("td") [I] .text) for i in range (5, 5+ttl_steps, each_step)]
Prices + = price
Volume = [int (soup.find_all ("td") [2] .find _ all ("td") [I] .text.replace (",", ") for i in range (11, 11+ttl_steps, each_step)]
Volumes + = volume
Mkt_value = [float (soup.find_all ("td") [2] .find _ all ("td") [I] .text) * 100000000 for i in range (12, 12+ttl_steps, each_step)]
Mkt_values + = mkt_value
Types = ["listed" for _ in range] + ["OTC" for _ in range (100)]
Ky_registered = [True if "KY" in st else False for st in stocks]
Df = pd.DataFrame ()
Df ["scrapingTime"] = current_dts
Df ["type"] = types
Df ["kyRegistered"] = ky_registered
Df ["ticker"] = tickers
Df ["stock"] = stocks
Df ["price"] = prices
Df ["volume"] = volumes
Df ["mktValue"] = mkt_values
Return df
Price_ranks = get_price_ranks ()
Print (price_ranks.shape)
The result of this is shown as
# # (200,8)
Next, we use pandas to show the first few lines.
Price_ranks.head ()
Price_ranks.tail ()
Then we start to deploy to the server.
For the choice of the server, the environment configuration is outside the scope of this lesson, we will mainly talk about how to set up scheduled tasks.
Next, let's modify the code so that the results are stored in sqlite.
Import datetime
Import requests
From bs4 import BeautifulSoup
Import pandas as pd
Import sqlite3
Def get_price_ranks ():
Current_dt = datetime.datetime.now () .strftime ("% Y-%m-%d% X")
Current_dts = [current_dt for _ in range]
Stock_types = ["tse", "otc"]
Price_rank_urls = ["https://tw.stock.yahoo.com/d/i/rank.php?t=pri&e={}&n=100".format(st) for st in stock_types]
Tickers = []
Stocks = []
Prices = []
Volumes = []
Mkt_values = []
Ttl_steps = 100.100
Each_step = 10
For pr_url in price_rank_urls:
R = requests.get (pr_url)
Soup = BeautifulSoup (r.text, 'html.parser')
Ticker = [i.text.split () [0] for i in soup.select (".name a")]
Tickers + = ticker
Stock = [i.text.split () [1] for i in soup.select (".name a")]
Stocks + = stock
Price = [float (soup.find_all ("td") [2] .find _ all ("td") [I] .text) for i in range (5, 5+ttl_steps, each_step)]
Prices + = price
Volume = [int (soup.find_all ("td") [2] .find _ all ("td") [I] .text.replace (",", ") for i in range (11, 11+ttl_steps, each_step)]
Volumes + = volume
Mkt_value = [float (soup.find_all ("td") [2] .find _ all ("td") [I] .text) * 100000000 for i in range (12, 12+ttl_steps, each_step)]
Mkt_values + = mkt_value
Types = ["listed" for _ in range] + ["listed" for _ in range (100)]
Ky_registered = [True if "KY" in st else False for st in stocks]
Df = pd.DataFrame ()
Df ["scrapingTime"] = current_dts
Df ["type"] = types
Df ["kyRegistered"] = ky_registered
Df ["ticker"] = tickers
Df ["stock"] = stocks
Df ["price"] = prices
Df ["volume"] = volumes
Df ["mktValue"] = mkt_values
Return df
Price_ranks = get_price_ranks ()
Conn = sqlite3.connect ('/ home/ubuntu/yahoo_stock.db')
Price_ranks.to_sql (price_ranks, conn, if_exists= "append", index=False)
Next, if we ask him to start on a regular basis, we need the linux crontab command:
If we want to set it to be executed every hour between 9:30 and 16:30 every day
So we just need to name the file price_rank_scraper.py first.
This is the answer to the question about how to regularly execute the website crawler in Python. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.