Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use jsoup to crawl and parse data in java

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Java how to use jsoup to crawl data and parse, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.

1. First of all, analyze the data to climb:

a. If you get the interface for requesting attendance, the website development tools will be able to catch it quite smoothly. (I didn't get it here, so I took a detour. The implementation will be recorded later)

b. Analyze the data format, simple html-table table nice!

Second, obtain attendance data and output them to excel

Here I use jsoup to climb data and dom4j to process data.

Maven introduction

Org.jsoup jsoup 1.12.1 dom4j dom4j 1.6.1

The code is as follows:

/ * simulate request to obtain data * * @ param userId user id * @ param date date * / @ Override public List getPunchList (Integer userId, String date) throws Exception {UserBean user = userMapper.getUser (userId); / / assemble the request address, parameters and necessary request header information StringBuilder url = new StringBuilder (); url.append ("http://xxxxxx.com/newkaoqin?userid="); url.append (userId)) Url.append ("& seldate="); url.append (date); Connection con2 = Jsoup.connect (url.toString ()); con2.header ("User-Agent", USER_AGENT); con2.header ("Host", HOST); con2.header ("Referer", REFERER); con2.header ("Cookie", getCookie ()); Response response = con2.ignoreContentType (true) .method (Method.GET). Execute () / / processing the returned data String body = response.body (); Document doc = Jsoup.parse (body); / / navigating to the location selector to be parsed can be selected at will. Elements tab2 = doc.getElementsByClass ("tab2"); / / because there is a lot of useless data behind, we only use the first one to get the first data Element first = tab2.first (); / / first may be empty Elements tr = first.select ("tr"); / / processing data KaoQinData is the entity of attendance data. The collection of entities returned after statistics List kaoQinData = new ArrayList (); KaoQinData kaoQin; for (Iterator cit = tr.iterator (); cit.hasNext ();) {Element tr1 = cit.next (); Elements td = tr1.select ("td"); if (td.size () > 0) {Element punchDateEle = td.get (0); String punchDateStr = punchDateEle.text (); Element punchTimeEle = td.get (4) String punchTimeStr = punchTimeEle.text (); if (! StringUtils.isEmpty (punchTimeStr)) {String substring = punchTimeStr.substring (0, punchTimeStr.indexOf (":")); Integer integer = Integer.valueOf (substring); if (integer > = 20) {System.out.println (punchDateStr + "," + punchTimeStr); kaoQin = new KaoQinData (user.getRealName (), punchDateStr, punchTimeStr) KaoQinData.add (kaoQin);}} return kaoQinData;}

This is basically enough. Output excel or something. Whatever. Use whatever you like.

PS: by the way, I made a detour because I didn't get the automatic login to get cookie.

1. A file was created in the project to store the cookie, like this:

two。 So an interface is written to update the interface of cookie, and the new cookie is written to overwrite the cookie file.

3. Get the cookie in the cookiefile file when using it. (it's a little troublesome. I'll study the automatic login later when I'm free.)

/ * get coockie. * @ return * @ throws IOException * / private String getCookie () throws IOException {int num; char [] buf = new char [1024]; File file = new File (COOKIE_FILEPATH); if (! file.exists ()) {file.createNewFile ();} FileReader fileReader = new FileReader (file.getPath ()); StringBuilder stringBuilder = new StringBuilder () While ((num = fileReader.read (buf))! =-1) {stringBuilder.append (buf, 0, num);} return stringBuilder.toString ();} is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report