In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "how to use C#+Selenium+ChromeDriver to crawl web pages and simulate real user browsing behavior". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Background
Selenium is a tool for testing Web applications. The Selenium test runs directly in the browser, just like a real user is working on it. For crawlers, using Selenium to manipulate browsers to crawl data on the Internet must be a killer weapon among crawlers. Here, I will introduce the general use of selenium + Google browser.
Demand
In ordinary crawler development, sometimes the web page is a pile of js code, involving a lot of asynchronous computing, if it is an ordinary http console request, then the source file is a pile of js, you need to assemble the data yourself, which is very laborious; but using Selenium+ChromeDriver can achieve the perfect effect of WYSIWYG.
Mode of realization
Project structure: winform program for ease of use, with nuget package attached
The following is the code for form1.cs, so I'll just put the key method code here. Need to install the latest chrome browser + the chromedriver used in the code is v2.9.248315
Private void crawlingWebFunc ()
{
SetText ("\ r\ nstart trying...")
List surls = new List ()
String path = System.Environment.CurrentDirectory + "\\ Picture url\"
DirectoryInfo root = new DirectoryInfo (path)
DirectoryInfo [] dics = root.GetDirectories ()
Foreach (var itemdic in dics)
{
String txt = ""
StreamReader sr = new StreamReader (itemdic.FullName + "\\ data.txt")
While (! sr.EndOfStream)
{
String str = sr.ReadLine ()
Txt + = str;// + "\ n"
}
Sr.Close ()
Surls.Add (new testfold () {key = itemdic.FullName, picurl = txt})
}
ChromeDriverService service = ChromeDriverService.CreateDefaultService (System.Environment.CurrentDirectory)
/ / service.HideCommandPromptWindow = true
ChromeOptions options = new ChromeOptions ()
Options.AddArguments ("--test-type", "--ignore-certificate-errors")
Options.AddArgument ("enable-automation")
/ / options.AddArgument ("headless")
/ / options.AddArguments ("--proxy-server= http://user:password@yourProxyServer.com:8080");
Using (IWebDriver driver = new OpenQA.Selenium.Chrome.ChromeDriver (service, options, TimeSpan.FromSeconds (120))
{
Driver.Url = "https://www.1688.com/";
Thread.Sleep (200)
Try
{
Int a = 1
Foreach (var itemsurls in surls)
{
SetText ("\ r\ nth" + a.ToString () + "a")
Driver.Navigate () GoToUrl (itemsurls.picurl)
/ / Log in
If (driver.Url.Contains ("login.1688.com"))
{
SetText ("\ r\ nneed to log in, start trying...")
Trylogin (driver); / / login attempt completed
/ / try again
Driver.Navigate () .GoToUrl ("https://s.1688.com/youyuan/index.htm?tab=imageSearch&imageType=oss&imageAddress=cbuimgsearch/eWXC7XHHPN1607529600000&spm=");
If (driver.Url.Contains ("login.1688.com"))
{
/ / No way to quit
SetText ("\ r\ nquit and try again with ip.")
Return
}
}
/ / the content put on the mouse can only be displayed by one because the page itself can not be displayed in full, and then it can only be downloaded in other ways.
/ / var elements = document.getElementsByClassName ('hover-container')
/ / Array.prototype.forEach.call (elements, function (element) {
/ / element.style.display = "block"
/ / console.log (element)
/ /})
/ / IJavaScriptExecutor js = (IJavaScriptExecutor) driver
/ / var sss = js.ExecuteScript ("var elements = document.getElementsByClassName ('hover-container'); Array.prototype.forEach.call (elements, function (element) {console.log (element); element.setAttribute (\" class\ ",\" Test title\ "); element.style.display =\" block\ "; console.log (element);});)
Thread.Sleep (500)
Var responseModel = Write (itemsurls.key, driver.PageSource, Pagetypeenum. List)
Thread.Sleep (500)
Int I = 1
Foreach (var offer in responseModel?.data?.offerList?? New List ()
{
Driver.Navigate () GoToUrl (offer.information.detailUrl)
String responseDatadetail = driver.PageSource
Write (itemsurls.key, driver.PageSource, Pagetypeenum. Details)
SetText ("\ r\ nth" + a.ToString () + "-" + i.ToString () + ")
Thread.Sleep (500)
ITunes +
}
}
}
Catch (Exception ex)
{
CloseChromeDriver (driver)
Throw
}
}
}
# region abnormal exit chromedriver [DllImport ("user32.dll", EntryPoint = "FindWindow")] private extern static IntPtr FindWindow (string lpClassName, string lpWindowName); [DllImport ("user32.dll", EntryPoint = "SendMessage")] public static extern int SendMessage (IntPtr hWnd, int Msg, int wParam, int lParam); public const int SW_HIDE = 0; public const int SW_SHOW = 5 [DllImport ("user32.dll", EntryPoint = "ShowWindow")] public static extern int ShowWindow (IntPtr hwnd, int nCmdShow); / get window handle / public IntPtr GetWindowHandle () {string name = (Environment.CurrentDirectory + "\\ chromedriver.exe"); IntPtr hwd = FindWindow (null, name) Return hwd;} / close the chromedriver window / public void CloseWindow () {try {IntPtr hwd = GetWindowHandle (); SendMessage (hwd, 0x10, 0,0) } catch {}} / exit chromedriver / public void CloseChromeDriver (IWebDriver driver) {try {driver.Quit (); driver.Dispose () } catch {} CloseWindow ();} # endregion exited chromedriver abnormally
Effect.
Summary
Let's talk about the train of thought:
1. Jump to the specified web page driver.Navigate (). GoToUrl
two。 Determine the data source and read the data from driver.PageSource
3. Parsing html data
This is the end of the content of "how to use C#+Selenium+ChromeDriver to crawl a web page and simulate real user browsing behavior". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.