In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains the "Java crawler how to achieve Jsoup using the dom method to traverse the Document object", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Java crawler how to achieve Jsoup use dom method to traverse Document object" bar!
First give the web address:
Https://wall.alphacoders.com/featured.php?lang=Chinese
Main steps:
Using the connect method of Jsoup to obtain Document object
String html = "https://wall.alphacoders.com/featured.php?lang=Chinese";Document doc = Jsoup.connect (html) .get ()
If the content is too long, it will no longer be displayed.
Let's take this part as an example:
About Us FAQ Privacy Policy Terms Of Service Acceptable Use Etiquette Advertise With Us Change Consent
Let's first find all the ul:
Elements elements = doc.getElementsByTag ("ul")
The output is as follows:
Submit a fine prize
Chinese login registration
< 上一页 1 2 3 4 5 6 7 8 9 10 ... 319 下一页 > < 上一页 1 2 3 4 5 6 7 ... 319 下一页 >> 1 2 3 4 5 6 7 About Us FAQ Privacy Policy Terms Of Service Acceptable Use Etiquette Advertise With Us Change Consent
We can find that only one class is "nav nav-pills", and we find it:
Elements elements = doc.getElementsByTag ("ul"); / / System.out.println (elements); Element tempElement = null;for (Element element: elements) {if (element.className (). Equals ("nav nav-pills")) {tempElement = element; / / System.out.println (element.className ()); break;}}
Loop through the ul, outputting the href and rel properties of each an in each li:
Elements li = tempElement.getElementsByTag ("li"); for (Element element: li) {Elements element2 = element.getElementsByTag ("a"); for (Element element3: element2) {String hrefString = element3.attr ("href"); String relString = element3.attr ("rel") If (hrefString! = "" & relString! = "") {System.out.println ("href=" + hrefString + "rel=" external nofollow "rel=" external nofollow "" + "rel=" + relString);}
End result:
Href= https://alphacoders.com/site/about-us rel=nofollow
Href= https://alphacoders.com/site/faq rel=nofollow
Href= https://alphacoders.com/site/privacy rel=nofollow
Href= https://alphacoders.com/site/tos rel=nofollow
Href= https://alphacoders.com/site/acceptable_use rel=nofollow
Href= https://alphacoders.com/site/etiquette rel=nofollow
Href= https://alphacoders.com/site/advertising rel=nofollow
Complete code:
Import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;import java.io.IOException;import org.jsoup.Jsoup / * @ ClassName: Jsoup_Test * @ description: * @ author: KI * @ Date: 8:15:14 * / public class Jsoup_Test {public static void main (String [] args) throws IOException {/ / TODO automatically generated method stub String html = "https://wall.alphacoders.com/featured.php?lang=Chinese"; Document doc = Jsoup.connect (html). Get () System.out.println (doc); Elements elements = doc.getElementsByTag ("ul"); / / System.out.println (elements); Element tempElement = null; for (Element element: elements) {if (element.className (). Equals ("nav nav-pills")) {tempElement = element; / / System.out.println (element.className ()) Break;}} System.out.println (tempElement); Elements li = tempElement.getElementsByTag ("li"); for (Element element: li) {Elements element2 = element.getElementsByTag ("a"); for (Element element3: element2) {String hrefString = element3.attr ("href") String relString = element3.attr ("rel"); if (hrefString! = "& relString! =") {System.out.println (" href= "+ hrefString +" rel= "external nofollow" rel= "external nofollow"+" rel= "+ relString) }} Thank you for your reading, this is the content of "Java crawler how to achieve Jsoup using dom method to traverse Document object". After the study of this article, I believe you have a deeper understanding of how Java crawler realizes Jsoup traversing Document object using dom method, and the specific usage still needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.