Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The reptile of java

2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

A recent study of crawlers crawling links to websites:

1. Need to get all hyperlinks

two。 Exclude the crawled links and remove the heavy ones

3. Research on the breadth and depth of reptiles (ps: it has not been thoroughly studied so far)

The following is the implementation code:

Import java.io.BufferedReader;import java.io.InputStream;import java.io.InputStreamReader;import java.net.URL;import java.net.URLConnection;import java.util.ArrayList;import java.util.HashMap;import java.util.HashSet;import java.util.List;import java.util.Map;import java.util.Set;import java.util.regex.Matcher;import java.util.regex.Pattern;import javax.swing.plaf.synth.SynthSpinnerUI;public class study {private static List waitforUrl=new ArrayList () / / store the crawled url and wait for the crawled private static Set goforUrl=new HashSet (); / / store the crawled url private static Map allUrldepth=new HashMap (); / / judge the crawling depth of all url by private static int Maxdepth=2; public static void main (String [] args) {String urlstr= ".; study.gourl (urlstr, 1) } public static void gourl (String urlstr,int depath) {if (! (goforUrl.contains (urlstr) | | depath > Maxdepth) {goforUrl.add (urlstr); try {URL url=new URL (urlstr); URLConnection urlConn=url.openConnection (); / / establish url link InputStream is=urlConn.getInputStream () / / get the page content through the link to InputStreamReader isr=new InputStreamReader (is, "utf-8"); / / convert the byte stream into a byte stream BufferedReader br=new BufferedReader (isr); / / read the byte stream StringBuffer sb=new StringBuffer (); / / instantiate the StringBuffer to store the read data String line=null While ((line=br.readLine ())! = null) {sb.append (line); / / System.out.println (line); Pattern p = Pattern.compile ("

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report