The method of built-in xml parsing Interface in python 07/12 Update SLTechnology News&Howtos

The method of built-in xml parsing Interface in python

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

The knowledge of this article "python built-in xml parsing interface method" is not understood by most people, so the editor summarizes the following, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "python built-in xml parsing interface method" article.

Many friends have heard of HTML, python and the names of JavaScript, but they may be a little unfamiliar with the name XML, and they don't know where and what use it should be used. In fact, XML is widely used, and the most common role is to store some settings data as a project deployment file, and another use is to transfer data in Ajax. In addition, HTML is essentially a branch of XML.

XML introduction

The full name of XML is Extensible markup language (eXtensible Markup Language), which uses tags to mark text to make it structural. Its most common use is mainly for data storage and transmission. Data transmission is generally the application of Ajax, while data storage generally stores some structured data or application configuration.

Its close relative HTML generally stores and expresses the text, and determines the display style of the text through the label. Because XML is customizable, HTML can actually be thought of as a special XML language.

With regard to the actual case of the application configuration of XML, java web is a good example. There is a web.xml file in the webapp of java web, which can configure functions such as servlet mapping, which is a typical application case of XML.

XML parsing

From the above, we can know that XML is used to store data, and the process of obtaining data in XML is called XML parsing. There are two common ways to parse XML, which are DOM and SAX, and python also provides a third parsing method (ElementTree). Let's take the following XML code as an example to describe these three methods in detail.

War, Thriller DVD 2003 PG 10 Talk about a US-Japan war Anime, Science Fiction DVD 1989 R 8 A schientific fiction Anime, Action DVD 4 PG 10 Vash the Stampede! Comedy VHS PG 2 Viewable boredomSAX parsing

The Python standard library contains SAX parsers. SAX uses an event-driven model to process XML files by triggering individual events and calling user-defined callback functions during parsing XML.

Import xml.saxclass MovieHandler (xml.sax.ContentHandler): def _ init__ (self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.rating = "" self.stars = "" self.description = "" # element starts calling def startElement (self, tag Attributes): self.CurrentData = tag if tag = = "movie": print ("* Movie*") title = attributes ["title"] print ("Title:", title) # element ends calling def endElement (self, tag): if self.CurrentData = = "type": print ("Type:" Self.type) elif self.CurrentData = = "format": print ("Format:", self.format) elif self.CurrentData = = "year": print ("Year:", self.year) elif self.CurrentData = = "rating": print ("Rating:", self.rating) elif self.CurrentData = = "stars": print ("Stars:" Self.stars) elif self.CurrentData = = "description": print ("Description:", self.description) self.CurrentData = "" # call def characters (self) when reading characters Content): if self.CurrentData = = "type": self.type = content elif self.CurrentData = = "format": self.format = content elif self.CurrentData = = "year": self.year = content elif self.CurrentData = = "rating": self.rating = content elif self.CurrentData = = "stars": self.stars = content elif self.CurrentData = = "description": self .description = content if (_ _ name__ = = "_ _ main__"): # create a XMLReader parser = xml.sax.make_parser () # close the namespace parser.setFeature (xml.sax.handler.feature_namespaces 0) # rewrite ContextHandler Handler = MovieHandler () parser.setContentHandler (Handler) parser.parse ("movies.xml") DOM parsing

The full name of DOM is File object Model (Document Object Model), which is a standard programming interface recommended by W3C to deal with extensible markup language.

When parsing an XML document, a DOM parser reads the entire document at once and saves all the elements in the document in a tree structure in memory, then you can use different functions provided by DOM to read or modify the content and structure of the document, or you can write the modified content to the xml file.

From xml.dom.minidom import parseimport xml.dom.minidom# uses the minidom parser to open the XML document DOMTree = xml.dom.minidom.parse ("movies.xml") collection = DOMTree.documentElementif collection.hasAttribute ("shelf"): print ("Root element:% s"% collection.getAttribute ("shelf")) # get all movies movies = collection.getElementsByTagName ("movie") # print details of each movie for movie in movies: print ("* Movie*") if movie.hasAttribute ("title"): print ("Title:% s"% movie.getAttribute ("title") type = movie.getElementsByTagName ('type') [0] print ("Type:% s"% type.childNodes [0] .data) format = movie.getElementsByTagName (' format') [0] print ("Format:% s"% format.childNodes [0] .data) Rating = movie.getElementsByTagName ('rating') [0] print ("Rating:% s"% rating.childNodes [0] .data) description = movie.getElementsByTagName (' description') [0] print ("Description:% s"% description.childNodes [0] .data)

HTML should be the most popular application for DOM, because the DOM of HTML is basically what every front-end must learn. JavaScript also uses the DOM method when operating HTML, and DOM is also used for browser rendering. So the editor, who is interested in the front end, suggests learning the DOM method (although it is python, the DOM method and usage of JavaScript is similar to python).

ElementTree parsing

This is a unique XML parsing method of python. It parses a xml file by generating an element tree, and then manipulates the data through tree-like access. It is a bit similar to DOM in the way it is built. Generate operations that are closer to the tree in the way it is used.

Import xml.etree.ElementTree as ETtree = ET.parse ("movies.xml") root = tree.getroot () print (root.tag, ":", root.attrib) # print the tag and attribute of the root element # traverse the second layer of the xml document for child in root: # tag name and attribute print of the second layer node (child.tag, ":" Child.attrib) # traversing layer 3 for children in child of xml documents: # Application of print (children.tag, ":", children.attrib) XML parsing of layer 3 nodes' tag names and attributes

The application of XML parsing is mainly in three directions:

In Ajax, XML can be used for data transmission, which requires the receiver to be able to parse the XML.

When XML is used as a data storage medium or deployment file, the data can not be obtained without the parsing of XML.

In fact, the purpose of writing this article is to use this third point, that is, to parse HTML web pages. As we said before, HTML and XML have a certain degree of similarity. In fact, parsing XML can also be used in HTML parsing, the editor here is not to implement a browser rendering engine, but another place where HTML files need to be parsed-crawlers. The crawler needs to crawl the data of the web page and obtain specific HTML elements, which requires the crawler to be able to manipulate HTML documents, the best way of operation is through XML.

The above is the content of this article on "the method of python built-in xml parsing interface". I believe we all have some understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.