In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
In this issue, the editor will bring you about how to use regular expressions to achieve web crawlers. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.
Train of thought:
1. To simulate a web crawler, we can now deploy a 1.html page on our tomcat server. Deployment step: create a new 1.html under the ROOTS directory of the webapps directory in the tomcat directory. Use notepad++ to edit the content as follows:
)
two。 Use URL to connect with web pages
3. Gets the input stream, which is used to read the content in a web page
4. Establish a regular rule, because here we are crawling to the mailbox information in the web page, so we create a regular expression that matches the mailbox: String regex= "\ w + (\.\ w +) +"
5. Put the extracted data into the collection.
Code:
Import java.io.BufferedReader;import java.io.InputStream;import java.io.InputStreamReader;import java.net.URL;import java.util.ArrayList;import java.util.List;import java.util.regex.Matcher;import java.util.regex.Pattern;/* * Web crawler: a program used to obtain data of specified rules on the Internet * * / public class RegexDemo {public static void main (String [] args) throws Exception {List list=getMailByWeb () For (String str:list) {System.out.println (str);}} private static List getMailByWeb () throws Exception {/ / 1. Establish contact with the web page. Using URL String path= "http://localhost:8080//1.html";// followed by a double slash is used to escape URL url=new URL (path); / / 2. Get input stream InputStream is=url.openStream (); / / buffered BufferedReader br=new BufferedReader (new InputStreamReader (is)); / / 3. Extract the mailbox-compliant data String regex= "\.\\ w+ (\.\\ w+) +; / match / / encapsulate the regular rules into an object Pattern p=Pattern.compile (regex); / / put the extracted data into a collection List list=new ArrayList (); String line=null; while ((line=br.readLine ())! = null) {/ / matcher Matcher m=p.matcher (line); while (m.find ()) {/ / 3. Store the data that meets the rules in the collection list.add (m.group ());}} return list;}}
Note: you need to turn on the tomcat server before execution
Running result:
The above is the editor for you to share how to use regular expressions to achieve web crawlers, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.