In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
In this issue, the editor will bring you about how to use regular expressions to achieve web crawlers. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.
Train of thought:
1. To simulate a web crawler, we can now deploy a 1.html page on our tomcat server. Deployment step: create a new 1.html under the ROOTS directory of the webapps directory in the tomcat directory. Use notepad++ to edit the content as follows:
)
two。 Use URL to connect with web pages
3. Gets the input stream, which is used to read the content in a web page
4. Establish a regular rule, because here we are crawling to the mailbox information in the web page, so we create a regular expression that matches the mailbox: String regex= "\ w + (\.\ w +) +"
5. Put the extracted data into the collection.
Code:
Import java.io.BufferedReader;import java.io.InputStream;import java.io.InputStreamReader;import java.net.URL;import java.util.ArrayList;import java.util.List;import java.util.regex.Matcher;import java.util.regex.Pattern;/* * Web crawler: a program used to obtain data of specified rules on the Internet * * / public class RegexDemo {public static void main (String [] args) throws Exception {List list=getMailByWeb () For (String str:list) {System.out.println (str);}} private static List getMailByWeb () throws Exception {/ / 1. Establish contact with the web page. Using URL String path= "http://localhost:8080//1.html";// followed by a double slash is used to escape URL url=new URL (path); / / 2. Get input stream InputStream is=url.openStream (); / / buffered BufferedReader br=new BufferedReader (new InputStreamReader (is)); / / 3. Extract the mailbox-compliant data String regex= "\.\\ w+ (\.\\ w+) +; / match / / encapsulate the regular rules into an object Pattern p=Pattern.compile (regex); / / put the extracted data into a collection List list=new ArrayList (); String line=null; while ((line=br.readLine ())! = null) {/ / matcher Matcher m=p.matcher (line); while (m.find ()) {/ / 3. Store the data that meets the rules in the collection list.add (m.group ());}} return list;}}
Note: you need to turn on the tomcat server before execution
Running result:
The above is the editor for you to share how to use regular expressions to achieve web crawlers, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.