In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces Java how to use the search engine to collect URL procedures, the article is very detailed, has a certain reference value, interested friends must read it!
Use two of the search rules of google and Baidu search engine, keyword search and inurl search. What is inurl search, that is, the keywords in the URL you want to search for, such as http://www.xxx.com/post.asp, this URL contains keywords such as post.asp, and filling in the rules in the search engine is inurl:post.asp, which is the key to collecting URLs, because many URLs themselves contain specific information, such as publish, submit, tuijian and so on. URLs such as http://www.xxx.com/publish.asp, are mostly web pages that release information, and in combination with the keywords that may be contained in the page itself, we can search for results with search engines, and then we use the program to retrieve the results, analyze the HTML page, remove useless information, write useful URL information into files or databases, and then give it to other applications or people to use.
The first step is to retrieve the search results with the program. Take Baidu as an example, for example, if we want to search the web page released by the software, the keyword is "software release version inurl:publish.asp". Log in to Baidu first, write the keyword, and then submit it. You will see http://www.baidu.com/s?ie=gb2312&bs=%C8%ED%BC%FE%B7%A2%B2%BC+%C8%ED%BC%FE%B0%E6%B1%BE+inurl%3Apublish.asp&sr=&z=&cl=3&f=8&wd=%C8%ED%BC%FE%B7%A2%B2%BC+%B0%E6%B1%BE+inurl%3Apublish.asp&ct=0 in the address bar. All the keywords in the Chinese text have been encoded. It doesn't matter. We can also use Chinese directly in the program. Many keywords are connected by a + sign. Without some useless information, we can optimize the address to http://www.baidu.com/s?lm=0&si=&rn=20&ie=gb2312&ct=0& wd= Software release + version + inurl%3Apublish%2Easp&pn=0&cl=0, where rn indicates how many results are displayed on a page, wd= indicates the keywords you want to search, and pn indicates which items to display. This pn will be the variable in which our program loops the result, every 20 loops. We use the program written by Java to simulate the search process. The key class used is java.net.HttpURLConnection,java.net.URL. First, write a class to submit the search. The key code is as follows:
Class Search
{
Public URL url
Public HttpURLConnection http
Public java.io.InputStream urlstream
.
For (int item0 and other parts of the world)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.