In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Gne how to extract news pages, I believe that many inexperienced people do not know what to do, so this article summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
GNE [1] is a universal extractor for the text of a news website, which has been well received by many students since it was released.
For a long time, GNE exists in the form of Python package, to test the extraction effect of GNE, you need to use pip to install before writing code.
In order to reduce the cost of testing GNE, but also to let more students understand GNE, test GNE, I developed a web version of GNE--Gne Online.
The address to open the Gne Online is: http://122.51.39.219/, and the subsequent page is shown in the following figure.
To test the functionality of GNE, you simply paste the source code of the web page in the top text box and click the extract button:
For the case of incorrect extraction of title, author and news release time, we can enter XPath directional extraction through the corresponding Title XPath, Author and Publish Time XPath below. For example, for Jinri Toutiao's article:
If the author of the news made an error in the extraction, you can specify XPath://div [@ class= "article-sub"] / span [1] / text () for directional extraction, as shown in the following figure.
By setting the Host input box, you can spell the URL when the picture in the body of the web page is a relative path.
You can return the source code of the web page in the area where the body is located by checking the With Body Html check box below.
After reading the above, have you mastered the method of how to extract news pages from Gne? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.