Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the use of bs4 in Python

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces what is the use of bs4 in Python. It has certain reference value. Interested friends can refer to it. I hope you will gain a lot after reading this article. Let Xiaobian take you to understand it together.

Understand BS4BS4 (Beautiful Soup) is a Python library that can extract data from HTML or XML files. It can implement the customary way to navigate, find, and modify documents through your favorite converter. Beautiful Soup will save you hours or even days of working time. Beautiful Soup is a very important class in bs4. It uses html or xml files to quickly generate an object. We can borrow the methods and attributes of the object to quickly get the data we need. Usage: BeatifulSoup(html, resolver) parser selection, according to different requirements to choose the response parser

Type of object in bs4 First look at a test code: import bs4import requests# html Test code #contains only a div tag #div contains a p tag, a paragraph of text and a comment html ='

p-tag text

div Chinese Ben '#Instantiate BeautifulSoup object with html string soup = bs4.BeautifulSoup(html,'html.parser')#Access node with.div print(type(soup.div),'--->',soup.div) #Access attribute with.class #Print element and type in div node for item in soup.div: print(type(item),'--->',item) Display results:

Beautiful Soup transforms complex HTML documents into a complex tree structure. Each node is a Python object. All objects can be grouped into four types: Tag , Navigable String, Beautiful Soup, Comment .> Tag object Each tag in the html document appears as a Tag in BeautifulSoup. Tag can contain other tags and other partial objects Tag has many attributes, which can be used to access attributes and text of tag or its child tags. For example: Tag.name Returns the tagname Tag.string Returns the text information in the tag (strings are used when Tag contains only one text) Attributes in Tag are modified by assignment, which will change the tag> Navigable String object in the corresponding html file. String in Tag wrapped with Navigable String object It cannot be edited directly as a string, but can be replaced with replace(). To call outside of BeautifulSoup, use unicode()>BeautifulSoup object Contains the entire contents of a document In most cases, it can be regarded as a large Tag object, which supports traversing and searching the document tree>Comment object The Comment object is a special Navigable String object Used to wrap comments and special strings in documents, etc. Match nodes or information according to requirements. Find and findall methods apply to objects: Tag BeautifulSoupfind returns the first node that matches the condition findall returns all nodes that match the condition in a list Tag.find ('a ',) #Returns the first a tag node in Tag.find ('a', class_="hello") #Returns the first a-tag with class "hello" in Tag Tag.find_all ('p',text ='p-tagtext')#Returns the p-tag with (Navigable String node with string attribute 'p-tagtext') in Tag to get text information string, strings, text attribute and get_text() methods All of the above methods can be used to get text objects: string Apply to all objects in bs4 (Tag BeautifulSoup Navigable String Comment) strings ,text , get_text() applicable to Tag BeautifulSoup Difference: tag.strings Returns a generator You can iteratively obtain all Navigable String nodes in the tag node and its descendants. Each element type is NavigebleString node. tag.string When the child node of tag contains only one NavigebleString node, return this node When tag has only one child node, return the return value of the string attribute of the child node All return to None. tag.get_text() returns a string Used to capture non-comment text in all descendants of this node (that is, string return value of NavigebleString node) tag.text In bs4, text is the variable that receives the return value of get_text(). Equivalent to tag.get_text() Thank you for reading this article carefully, I hope Xiaobian shared "What is the use of bs4 in Python" This article is helpful to everyone, but also hope that everyone will support more, pay attention to the industry information channel, more relevant knowledge waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report