In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces how to achieve Java crawler, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
I. Code
The essence of a crawler is to open the source code of the web page for matching search, and then get the results found.
Open the web page:
URL url = new URL ("http://www.cnblogs.com/Renyi-Fan/p/6896901.html");
Read the contents of the web page:
BufferedReader bufr = new BufferedReader (new InputStreamReader (url.openStream ()
Regular expressions match:
Tring mail_regex = "\\ wicked @\\ w+ (\\.\\ w+) +"
Save the results:
List list = new ArrayList ()
/ *
* get
* encapsulate regular rules into objects.
* Pattern p = Pattern.compile ("aquib")
* / / is associated with the matcher method string of the regular object. Gets the matcher object Matcher. Exe to operate on the string.
* Matcher m = p.matcher ("aaaaab")
* / / manipulate strings through the method of Matcher matcher object.
* boolean b = m.matches ()
, /
Import java.io.BufferedReader;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.io.InputStreamReader;import java.net.URL;import java.util.ArrayList;import java.util.List;import java.util.regex.Matcher;import java.util.regex.Pattern; public class Spider {public static void main (String [] args) throws IOException {/ / List list = getMails () / / for (String mail: list) {/ / System.out.println (mail); / /} List list = getMailsByWeb (); for (String mail: list) {System.out.println (mail);}} public static List getMailsByWeb () throws IOException {/ / 1, read the source file. / / URL url = new URL ("http://192.168.1.100:8080/myweb/mail.html"); / / URL url = new URL (" http://localhost:8080/SecondWeb/index.jsp"); URL url = new URL ("http://www.cnblogs.com/Renyi-Fan/p/6896901.html"); BufferedReader bufr = new BufferedReader (new InputStreamReader (url.openStream () / / 2, regular matching of the read data. Get the data that conforms to the rules. String mail_regex = "\\ line=bufr.readLine @\\ w + (\\.\\ w +) +"; List list = new ArrayList (); Pattern p = Pattern.compile (mail_regex); String line= null; while ((line=bufr.readLine ())! = null) {Matcher m = p.matcher (line) While (m.find ()) {/ / 3 stores data that conforms to the rules in the collection. List.add (m.group ());}} return list;} public static List getMails () throws IOException {/ / 1, read the source file. BufferedReader bufr = new BufferedReader (new FileReader ("c:\\ mail.html")); / / 2, regular matching of the read data. Get the data that conforms to the rules. String mail_regex = "\\ line=bufr.readLine @\\ w + (\\.\\ w +) +"; List list = new ArrayList (); Pattern p = Pattern.compile (mail_regex); String line= null; while ((line=bufr.readLine ())! = null) {Matcher m = p.matcher (line) While (m.find ()) {/ / 3 stores data that conforms to the rules in the collection. List.add (m.group ());}} return list;}} 2. Thank you for reading this article carefully. I hope the article "how to achieve a crawler in Java" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.