In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Java web page data collector how to carry out data storage, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
Brief introduction:
As the most widely used language in the world, Java is deeply loved by application developers because of its high efficiency, portability (cross-platform), code robustness and strong expansibility. As a powerful development language, the application of regular expressions is of course essential, and the ability to master regular expressions is also the embodiment of the development skills of those senior programmers. To be a qualified programmer for website development (especially for front-end development), regular expression is a must.
Recently, due to some needs, using java and regularization, I have done a data collection program for football websites. Because I have done data collection on html pages about java for * times, I must have found a lot of information on the Internet, but I found that there are very few (Chinese) articles on using java regularities to do html collection. They all simply talk about the concept of java regularities, but they are not really used in the actual web page html collection. So the example tutorial is even more * (although java has its own Html Parser and is very powerful), but I think that as such a popular regular expression, there should be related java instance tutorials, and there should be a lot of them. So after completing the html data acquisition program of java version, I intend to write a collection of regular expressions on the html page on java, so that readers who are interested in it can learn better.
In order to facilitate us to call the collected data in the future, we will talk about how to do data storage (MySql database).
Data collection page record of Premier League teams in 2011-2012 season
About Java Operation MySql
We need to import a jar package (mysql-connector-java-5.1.18-bin) into the project file before using java to manipulate the MySql database
You can download Connector/J 5.1.18 from MySql's official website.
* use MySql for many times? Please see java connection MYSQL
How do I import a jar package in a java project?
Please see how to import the jar package under this Eclipse
About MySql database
If you are a beginner who wants to use MySql database, you can download the XAMPP package here on the Chinese website of XAMPP.
XAMPP (Apache+MySQL+PHP+PERL) is a powerful integrated software package for building XAMPP software stations, and it can be installed with one click without modifying the configuration file, so it is very easy to use.
All right, everything that needs to be prepared is done. let's start writing code.
Open the MySql database to create the database and table (copy the following code to mysql and execute it directly)
/ / create a database htmldatacollection CREATE DATABASE htmldatacollection; / / before creating a table, we need to use the database htmldatacollection use htmldatacollection; / / create a table Premiership in the database to store the data we have collected / / here to facilitate all fields are in string format CREATE TABLE Premiership (Date varchar (15), HomeTeam varchar (20), AwayTeam varchar (20), Result varchar (20))
After the creation, let's take a look at the database structure.
The database is ready, and we start to implement the java code.
Here is a brief introduction to each class and the methods contained in the class
The DataStorage class and the included dataStore () method are used for data collection and storage
Import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.URL; public class DataStorage {public void dataStore () {/ / first load the web link String strUrl= "http://www.footballresults.org/league.php?all=1&league=EngPrem"; String sqlLeagues =" with a string Try {/ / create a url object to point to the website link the path of the site link is loaded in parentheses / / for more information, please see http://wenku.baidu.com/view/8186caf4f61fb7360b4c6547.html URL url = new URL (strUrl) / / InputStreamReader is an input stream reader used to convert read bytes into characters / / for more information, see http://blog.sina.com.cn/s/blog_44a05959010004il.html InputStreamReader isr = new InputStreamReader (url.openStream (), "utf-8") / / uniformly use utf-8 encoding mode / / use BufferedReader to read the characters converted by InputStreamReader BufferedReader br = new BufferedReader (isr); String strRead = "" / / new a string to load the content read by BufferedReader / / define three regulars used to get the data we need String regularDate = "(\\ d {1jue 2}\\.\ d {1Magne2}\\.\ d {4}); String regularTwoTeam =" > [^] * " String regularResult = "> (\\ d {1Magne2} -\ d {1Magne2})"; / / create an object of the GroupMethod class gMethod to facilitate later calling the regularGroup method GroupMethod gMethod = new GroupMethod () in its class; / / create an object of the DataStructure data structure class for data storage DataStructure ds = new DataStructure () under the data / / create an object of the MySql class to execute the MySql statement MySql ms = new MySql (); int I = 0; / / define an I to record the number of loops, that is, the number of team match results collected int index = 0 / / define an index to get the data that separates the two teams because the two teams are the same / / start to read the data if the data is not empty, then read while ((strRead = br.readLine ())! = null) {/ * Used to capture date data * / String strGet = gMethod.regularGroup (regularDate StrRead) / / if eligible date data is captured, print out if (! strGet.equals ("")) {/ / System.out.println ("Date:" + strGet); / / store the collected date in the data structure ds.date = strGet / / Index + 1 is used to obtain later team data + + index. / / because in the source code in the html page, the team data is just after the date} / * is used to obtain the data of 2 teams * / strGet = gMethod.regularGroup (regularTwoTeam, strRead) If (! strGet.equals (") & & index = = 1) {/ / Home team data with index 1 / / the home team data strGet = strGet.substring (1, strGet.indexOf (")) is separated by subtring method; / / System.out.println ("HomeTeam:" + strGet) / / print out the home team / / store the collected home team name in the data structure ds.homeTeam = strGet; index++ / / it is 2 after index + 1 / / the visiting team is separated by subtring method} else if (! strGet.equals (") & & index = = 2) {/ / here the index of 2 is the visiting team data strGet = strGet.substring (1, strGet.indexOf (")) / / System.out.println ("AwayTeam:" + strGet); / / print out the visiting team / / store the collected visiting team name in the data structure ds.awayTeam = strGet; index = 0 / / after collecting the visiting team name, you need to restore the index to the name of the home team used to collect the next piece of data} / * to get the match result * / strGet = gMethod.regularGroup (regularResult, strRead) If (! strGet.equals ("")) {/ / the substring method is also used here to remove'
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.