Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement Java crawler Jsoup use dom method to traverse Document object

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains the "Java crawler how to achieve Jsoup using the dom method to traverse the Document object", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Java crawler how to achieve Jsoup use dom method to traverse Document object" bar!

First give the web address:

Https://wall.alphacoders.com/featured.php?lang=Chinese

Main steps:

Using the connect method of Jsoup to obtain Document object

String html = "https://wall.alphacoders.com/featured.php?lang=Chinese";Document doc = Jsoup.connect (html) .get ()

If the content is too long, it will no longer be displayed.

Let's take this part as an example:

About Us FAQ Privacy Policy Terms Of Service Acceptable Use Etiquette Advertise With Us Change Consent

Let's first find all the ul:

Elements elements = doc.getElementsByTag ("ul")

The output is as follows:

Submit a fine prize

Chinese-flag

Chinese login registration

< 上一页 1 2 3 4 5 6 7 8 9 10 ... 319 下一页 >

< 上一页 1 2 3 4 5 6 7 ... 319 下一页 >

> 1 2 3 4 5 6 7 About Us FAQ Privacy Policy Terms Of Service Acceptable Use Etiquette Advertise With Us Change Consent

We can find that only one class is "nav nav-pills", and we find it:

Elements elements = doc.getElementsByTag ("ul"); / / System.out.println (elements); Element tempElement = null;for (Element element: elements) {if (element.className (). Equals ("nav nav-pills")) {tempElement = element; / / System.out.println (element.className ()); break;}}

Loop through the ul, outputting the href and rel properties of each an in each li:

Elements li = tempElement.getElementsByTag ("li"); for (Element element: li) {Elements element2 = element.getElementsByTag ("a"); for (Element element3: element2) {String hrefString = element3.attr ("href"); String relString = element3.attr ("rel") If (hrefString! = "" & relString! = "") {System.out.println ("href=" + hrefString + "rel=" external nofollow "rel=" external nofollow "" + "rel=" + relString);}

End result:

Href= https://alphacoders.com/site/about-us rel=nofollow

Href= https://alphacoders.com/site/faq rel=nofollow

Href= https://alphacoders.com/site/privacy rel=nofollow

Href= https://alphacoders.com/site/tos rel=nofollow

Href= https://alphacoders.com/site/acceptable_use rel=nofollow

Href= https://alphacoders.com/site/etiquette rel=nofollow

Href= https://alphacoders.com/site/advertising rel=nofollow

Complete code:

Import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;import java.io.IOException;import org.jsoup.Jsoup / * @ ClassName: Jsoup_Test * @ description: * @ author: KI * @ Date: 8:15:14 * / public class Jsoup_Test {public static void main (String [] args) throws IOException {/ / TODO automatically generated method stub String html = "https://wall.alphacoders.com/featured.php?lang=Chinese"; Document doc = Jsoup.connect (html). Get () System.out.println (doc); Elements elements = doc.getElementsByTag ("ul"); / / System.out.println (elements); Element tempElement = null; for (Element element: elements) {if (element.className (). Equals ("nav nav-pills")) {tempElement = element; / / System.out.println (element.className ()) Break;}} System.out.println (tempElement); Elements li = tempElement.getElementsByTag ("li"); for (Element element: li) {Elements element2 = element.getElementsByTag ("a"); for (Element element3: element2) {String hrefString = element3.attr ("href") String relString = element3.attr ("rel"); if (hrefString! = "& relString! =") {System.out.println (" href= "+ hrefString +" rel= "external nofollow" rel= "external nofollow"+" rel= "+ relString) }} Thank you for your reading, this is the content of "Java crawler how to achieve Jsoup using dom method to traverse Document object". After the study of this article, I believe you have a deeper understanding of how Java crawler realizes Jsoup traversing Document object using dom method, and the specific usage still needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report