In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to achieve data mining in Python". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to achieve data mining with Python".
This Selenium module is mainly used to deal with the need for us to browse web data automatically, let the program execute semi-intelligence, as long as you teach it what to do!
Directly introduce the family modules that need to be used this time:
1 from selenium import webdriver2 import time3 from selenium.webdriver.common.keys import Keys4 from selenium.webdriver.common.action_chains import ActionChains5 from selenium.webdriver.common.by import By 1. Explain each one, and check the signs in order:
The main contents are as follows: 1. The embedding of the main module is mainly for the control program to automatically open the browser to browse the web.
2, as a developer, especially the development tool for automated testing of web pages, you must need the time module to restrict the access time of the program, because the website may directly seal off your IP.
3, selenium module family member Keys, this member should operate with a simulated keyboard, should simulate the input of user login name and password, or value data index input.
4, selenium module family member ActionChains, it is to deal with the simulation mouse operation, double-click with the mouse, click, left and right keys, we turn the page, search button click function.
5. By, a member of the selenium module family, this is what we need to teach it to do, and it is also one of the core value functions that we need to use in data mining to deal with value data capture.
Second, the preliminary development:
1. The operating program opens the browser and opens the web page that we need to enter:
1 url = 'https://www.xxx.com'2 driver=webdriver.Chrome () 3 driver.get (url) 4 time.sleep (5) 5 driver.quit ()
You can test it yourself here. I am using Google browser. You can try to use Firefox. There are some differences between them, mainly the difference between sites!
2. Lock tag after entering the page
Html:
1 2
three
four
Python:
1 element = driver.find_element_by_id ("aaa") 2 frame = driver.find_element_by_tag_name ("div") 3 cheese = driver.find_element_by_name ("ccc") 4 cheeses = driver.find_elements_by_class_name ("bbb") 56 or 7 8 from selenium.webdriver.common.by import By 9 element = driver.find_element (by=By.ID, value= "aaa") 10 frame = driver.find_element (By.TAG_NAME "div" 11 cheese = driver.find_element (By.NAME, "ccc") 12 cheeses = driver.find_elements (By.CLASS_NAME, "bbb")
Each of these is a locked tag tree, and they are defined according to id,class,name,tagname.
1 xpath_class = driver.find_element_by_xpath ('/ / div [@ class= "bbb"] / p') 2 xpath_id = driver.find_element_by_xpath ('/ / div [@ id= "aaa"] / p')
This is the general method, the Xpath method, and they all output content locking tag that belongs to parsing the web page.
3. Processing operation:
When we lock the tag property of the function key, we can further operate, such as page changing and the implementation of the search function.
Here we will introduce the operation of the simulated mouse:
1 elem = driver.find_element_by_xpath ('/ / a [@ id= "tagname"]') 2 ActionChains (driver) .double_click (elem) .perform () 3 time.sleep (3)
Because of the time problem, I just introduce the left mouse button click to change pages, why else refer to the official document: Selenium Webdrive
ActionChains: lock the browser, double_click lock the tag tag tree, .perform (): click the tag tree
4. Obtain value data
The operation here is similar to the syntax of Xpath:
Driver.find_elements_by_tag_name ('td') [3] .textdriver.find _ elements_by_tag_name (' a'). Get_attribute ('href')
Note here that elements, which refers to the href of all tag- > a ratio tags, is in list format and needs to be traversed.
5. Finally, a complete string of code:
1 from selenium import webdriver 2 import time 3 import lxml.html as HTML 4 from bs4 import BeautifulSoup 5 from selenium.webdriver.common.keys import Keys 6 from selenium.webdriver.common.action_chains import ActionChains 7 from pymongo import MongoClient,ASCENDING DESCENDING 8 from selenium.webdriver.common.by import By 9 def parser (): 10 url = 'https://www.xxx.com'11 driver=webdriver.Chrome () 12 driver.get (url) 13 time.sleep (5) 14 for i in range (1675): 15a = driver.find_element_by_xpath (' / / div [@ class= "aaa]') 16 tr = a.find_elements_by_tag_name ('tr') 17 for j in xrange (1 Len (tr): 18 quantity = t [j] .find _ elements_by_tag_name ('td') [3] .text19 producturl = t [j] .find _ elements_by_tag_name (' td') [0]. Find _ elements_by_tag_name ("div") [1]. Find _ element_by_tag_name ('ul'). Find_element_by_tag_name (' li'). Find_element_by_tag_name ('a') .get _ attribute ('href') 20 producturl_db (producturl Quantity) 21 elem = driver.find_element_by_xpath ('/ / a [@ id= "eleNextPage"]') 22 ActionChains (driver) .double_click (elem) .perform () 23 time.sleep (3) 24 25 driver.quit ()
Selenium has a small GUB, that is, when you use Xpath, you have found the parent tag, but there are many parents, such as tr, if you traverse it and look for td, then you still use find_elements_by_tag_name, because that will initialize, regardless of which parent you find.
At this point, I believe you have a deeper understanding of "Python how to achieve data mining", might as well come to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.