Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use beautifulsoup4 Library based on pycharm

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly shows you how to use the beautiful soup4 library based on pycharm. The content is simple and easy to understand and the organization is clear. I hope it can help you solve your doubts. Let Xiaobian lead you to study and learn the article "how to use the beautiful soup4 library based on pycharm."

1.Beautifulsoup4 library installation

Step 1: Install the beautifulsoup4 library by entering the following command on the console.

pip install beautifulsoup4

Step 2: Verify that the beautiful soup 4 library is successfully installed by typing the following command in the console.

Step 3: In pycharm, click file--settings--project--python interpreter--click + sign--search beautifulsoup4--install package!

This will allow you to import modules into your.py file!

Beautiful soup4 library uses import requests#Although the library name is beautiful4, it is imported using its abbreviation bs4 where Beautiful Soup is a class name from bs4 import Beautiful Soupurl = 'www.baidu.com/s? '#Because most websites are for users to visit, if it is detected that User-Agent is a hacker or others may refuse access, simulate browser headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'} response = requests.get(url=url, headers=headers)#To prevent garbled characters, set its encoding to utf-8 because there is Chinese response.encoding ='utf-8'# print(response.text)#The parser used is html.parser Note Yes. o soup = BeautifulSoup(response.text, 'html.parser')#print parsed result print(soup.prettify())

What needs to be explained is in the code comments.

3.beautifulsoup4 library basic elements

Beautifulsoup4 library is a function library for parsing, traversing and maintaining "label tree."

First look at the BeautifulSoup library parser, the first two are more commonly used!

Looking at the basic elements of the BeautifulSoup library, we can understand that the tag tree is the same as HTML and BeautifulSoup. If we want to see some content of HTML, we can use the instantiation object of BeautifulSoup to view it.

On the basis of the above code, add the following lines, combined with the use of basic elements, you can get as shown in the figure.

It should be noted that.string can cross tags, so it is likely that the result will also be a comment. In order to distinguish whether it is a string within a tag or a comment, it can be judged by the type of print.

In summary, it can be summarized as follows:

Next, take a look at the traversal of the BeautifulSoup library, where the iterative traversal of the red box can be used in a for-in loop.

4.HTML lookup method for beautifulsoup4 library find_all( name , attrs , recursive , string , **kwargs )

The find_all() method searches all tag children of the current tag and determines whether the filter conditions are met.

The name parameter retrieves a label named name.

The attrs parameter retrieves tags with attrs attribute value.

Recursive parameter indicates whether to search all descendants. Default is TRUE. If you want to search only the children of the current node, you can set it to FALSE.

The string parameter retrieves the string contents of the tag.

5. Json (Javascript Object Notation)

We have learned js or java, should be familiar with Json!

Json is a typed key-value pair!

Note that keys and values need to be enclosed in "", if the value is an integer, you can not use ""!

If the value is multivalued, you can use [,]; if the value is a key-value pair, you can use {:,:,}, which can be nested.

JSON is generally used for interfaces, while YAML is an untyped key-value pair, generally used for configuration files.

The above is "Pycharm-based beautiful soup 4 library how to use" all the content of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report