In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "how to use Storm IPResolutionBolt to write crawlers". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Package com.digitalpebble.storm.crawler.bolt;import java.net.InetAddress;import java.net.MalformedURLException;import java.net.URL;import java.util.HashMap;import java.util.Map;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import backtype.storm.task.OutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.topology.base.BaseRichBolt;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Tuple;import backtype.storm.tuple.Values Public class IPResolutionBolt extends BaseRichBolt {public static final Logger LOG = LoggerFactory .getLogger (IPResolutionBolt.class); OutputCollector _ collector; @ SuppressWarnings ("unchecked") public void execute (Tuple tuple) {String url = tuple.getStringByField ("url"); HashMap metadata = null / / determine here whether our tuple contains Meradata if (tuple.contains ("metadata")) metadata = (HashMap) tuple .getValueByField ("metadata") / / the Metadata here is a HashMap and holds a combination of objects String ip = null; String host = ""; URL u; try {u = new URL (url) Host = u.getHost ();} catch (MalformedURLException E1) {LOG.warn ("Invalid URL:" + url); / / ack it so that it doesn't get replayed _ collector.ack (tuple); return } try {long start = System.currentTimeMillis (); final InetAddress addr = InetAddress.getByName (host); ip = addr.getHostAddress (); long end = System.currentTimeMillis () LOG.info ("IP for:" + host + ">" + ip + "in" + (end-start) + "msec") / / here we launch url,ip,metadata and do an Ack _ collector.emit (tuple, new Values (url,ip,metadata)) for tuple; _ collector.ack (tuple) } catch (final Exception e) {LOG.warn ("Unable to resolve IP for:" + host); _ collector.fail (tuple);}} public void declareOutputFields (OutputFieldsDeclarer declarer) {declarer.declare (new Fields ("url", "ip", "metadata")) } public void prepare (Map stormConf, TopologyContext context, OutputCollector collector) {_ collector = collector;}}
Here we need to note that when declareOutputFields sets our Tuple records object, it is passed "url", "ip", "metadata", not an encapsulated object.
Once we pass a large number of records. Then please make sure that the value passed is set to the object. And get it in the way of the recipient, getValues (0).
"how to use Storm IPResolutionBolt to write crawlers" content is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.