Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What if you run nutch and report an error unzipBestEffort returned null

2025-03-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Editor today to show you how to run nutch error report unzipBestEffort returned null, the knowledge points in the article are introduced in great detail. Friends who feel helpful can follow the editor to browse the content of the article, hoping to help more friends who want to solve this problem to find the answer to the problem. Follow the editor to learn more about "how to run nutch error report unzipBestEffort returned null".

Error message: fetch of http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html failed with: java.io.IOException: unzipBestEffort returned null

The complete error message is:

2014-03-12 16 unzipBestEffort returned nullat org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded ERROR http.Http-Failed to get protocol outputjava.io.IOException: unzipBestEffort returned nullat org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded (HttpBase.java:317) at org.apache.nutch.protocol.http.HttpResponse. (HttpResponse.java:164) at org.apache.nutch.protocol.http.Http.getResponse (Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput (HttpBase.java:140) at org.apache.nutch .fetcher.Fetcher $FetcherThread.run (Fetcher.java:703) 2014-03-12 16 purse 4848 INFO fetcher.Fetcher 38031 INFO fetcher.Fetcher-fetch of http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html failed with: java.io.IOException: unzipBestEffort returned null2014-03-12 16 purge 4848 purvey38031 INFO fetcher.Fetcher-- finishing thread FetcherThread ActiveThreads=0

You can see that the code that throws the exception is on line 317 of the processGzipEncoded method of the src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java (lib-http plug-in) class:

Byte [] content;if (getMaxContent () > = 0) {content = GZIPUtils.unzipBestEffort (compressed, getMaxContent ());} else {content = GZIPUtils.unzipBestEffort (compressed);} if (content = = null) throw new IOException ("unzipBestEffort returned null")

The processGzipEncoded method is called on line 164of nutch2.7\ src\ plugin\ protocol-http\ src\ java\ org\ apache\ nutch\ protocol\ http\ HttpResponse.java (protocol-http plug-in):

ReadPlainContent (in); String contentEncoding = getHeader (Response.CONTENT_ENCODING); if ("gzip" .equals (contentEncoding) | | "x-gzip" .equals (contentEncoding)) {content = http.processGzipEncoded (content, url);} else if ("deflate" .equals (contentEncoding)) {content = http.processDeflateEncoded (content, url) } else {if (Http.LOG.isTraceEnabled ()) {Http.LOG.trace ("fetched" + content.length + "bytes from" + url);}}

Through the Firebug tool of Firefox, you can see that the response header of this URL is Content-Encoding:gzip,Transfer-Encoding:chunked.

The solution is as follows:

1. Modify the file nutch2.7\ src\ java\ org\ apache\ nutch\ metadata\ HttpHeaders.java, and add a field:

Public final static String TRANSFER_ENCODING = "Transfer-Encoding"

2. Modify the file nutch2.7\ src\ plugin\ protocol-http\ src\ java\ org\ apache\ nutch\ protocol\ http\ HttpResponse.java, and replace line 160 readPlainContent (in) with the following code

String transferEncoding = getHeader (Response.TRANSFER_ENCODING); if (transferEncoding! = null & & "chunked" .equals IgnoreCase (transferEncoding.trim () {readChunkedContent (in, line);} else {readPlainContent (in);}

3. Http content length limit cannot use a negative value, but can only use a large integer:

Http.content.limit 655360000

4. Because the core code and plug-in code have been modified, you need to recompile the packaged release and execute the default target:runtime of nutch2.7\ build.xml.

Cd nutch2.7ant thank you for your reading, the above is "run nutch error unzipBestEffort returned null how to do" all the content, learn friends hurry up to operate it. I believe that the editor will certainly bring you better quality articles. Thank you for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report