In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
A recent study of crawlers crawling links to websites:
1. Need to get all hyperlinks
two。 Exclude the crawled links and remove the heavy ones
3. Research on the breadth and depth of reptiles (ps: it has not been thoroughly studied so far)
The following is the implementation code:
Import java.io.BufferedReader;import java.io.InputStream;import java.io.InputStreamReader;import java.net.URL;import java.net.URLConnection;import java.util.ArrayList;import java.util.HashMap;import java.util.HashSet;import java.util.List;import java.util.Map;import java.util.Set;import java.util.regex.Matcher;import java.util.regex.Pattern;import javax.swing.plaf.synth.SynthSpinnerUI;public class study {private static List waitforUrl=new ArrayList () / / store the crawled url and wait for the crawled private static Set goforUrl=new HashSet (); / / store the crawled url private static Map allUrldepth=new HashMap (); / / judge the crawling depth of all url by private static int Maxdepth=2; public static void main (String [] args) {String urlstr= ".; study.gourl (urlstr, 1) } public static void gourl (String urlstr,int depath) {if (! (goforUrl.contains (urlstr) | | depath > Maxdepth) {goforUrl.add (urlstr); try {URL url=new URL (urlstr); URLConnection urlConn=url.openConnection (); / / establish url link InputStream is=urlConn.getInputStream () / / get the page content through the link to InputStreamReader isr=new InputStreamReader (is, "utf-8"); / / convert the byte stream into a byte stream BufferedReader br=new BufferedReader (isr); / / read the byte stream StringBuffer sb=new StringBuffer (); / / instantiate the StringBuffer to store the read data String line=null While ((line=br.readLine ())! = null) {sb.append (line); / / System.out.println (line); Pattern p = Pattern.compile ("
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.