How to parse xml files in python 07/08 Update SLTechnology News&Howtos

How to parse xml files in python

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article shows you how to parse xml files in python, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

What is XML?

XML refers to Extensible markup language (eXtensible Markup Language).

XML is designed to transfer and store data.

XML is a set of rules that define semantic tags that divide a document into many parts and identify them.

It is also a meta-markup language, that is, a syntactic language that defines other domain-specific, semantic, structured markup languages.

Analysis of XML by Python

Common XML programming interfaces are DOM and SAX, which deal with XML files in different ways, and of course they are used in different situations.

There are three ways for Python to parse XML,SAX,DOM and ElementTree:

1.SAX (simple API for XML)

The Python standard library contains SAX parsers. SAX uses an event-driven model to process XML files by triggering individual events and calling user-defined callback functions during parsing XML.

2.DOM (Document Object Model)

Parse the XML data into a tree in memory, and manipulate the XML through the operation of the tree.

3.ElementTree (element tree)

ElementTree is like a lightweight DOM with a convenient and friendly API. Good code availability, high speed and low memory consumption.

Note: because DOM needs to map XML data to the tree in memory, it is relatively slow and memory-consuming, while SAX streaming reads XML files faster and takes up less memory, but requires users to implement callback functions (handler).

Xml.etree

The country_data.xml file is as follows:

1 2008 141100

Explain some concepts first, it's very simple:

1. The first line is the declaration of the xml file, which defines the version of xml (1. 0) and the encoding used as UTF-8.

two。 Then there is the content of the xml file, which is organized in a tree structure as follows:

. .

Each and every one of them. It is called a node, also known as an element, and nodes can be placed in nests or side by side. In a nested structure, the inner and outer nodes are parent-child relationships, and so on, the outermost nodes are called root nodes. The two nodes juxtaposed are brothers.

3. For each node, it is generally composed of three parts: Tag, Attribute and Text. Data,country,rank,year in country_data.xml these are all tag. The one on the right in the same angle brackets as tag is Attribute, for example: name. Text is something sandwiched between two angle brackets.

The information in the xml file is stored in the node, and there are several ways to traverse the node, here using Element Tree. The code is as follows:

#-*-coding:utf-8-*-import sys, os.pathimport xml.etree.ElementTree as ET# reading method def read_xml (xml_file=''): # read xml file tree = ET.parse (xml_file) # get the root node root = tree.getroot () print ("root.tag:" Root.tag) # get all child country nodes under the current root node child_node_list = root.findall ('country') # print the Attribute for child_node in child_node_list: print (' attrib:%s'%child_node.attrib) for child in child_node.getchildren (): print ('tag:text) of each country node % slug% slug% (child.tag,child.text)) # get the text of a node Node = root.find ('country2/rank') print (node.text) class XmlParse: def _ init__ (self, file_path): self.tree = None self.root = None self.xml_file_path = file_path def ReadXml (self): try: print ("xmlfile:" Self.xml_file_path) # read xml file self.tree = ET.parse (self.xml_file_path) # get root node self.root = self.tree.getroot () print ("root.tag:", self.root.tag) except Exception as e: print ("parse xml faild!") Sys.exit () else: print ("parse xml success!") Finally: return self.tree, self.root def CreateNode (self, tag, attrib, text): element = ET.Element (tag, attrib) element.text = text print ("tag:%s;attrib:%s Text:%s "% (tag, attrib, text) return element def AddNode (self, Parent, tag, attrib, text): element = self.CreateNode (tag, attrib, text) if Parent: Parent.append (element) el = self.root.find (" Python ") print (el.tag,"-- ", el.attrib,"-" El.text) else: print ("parent is none") def WriteXml (self, destfile): dest_xml_file = os.path.abspath (destfile) self.tree.write (dest_xml_file, encoding= "utf-8", xml_declaration=True) def method_1 (): # add nodes and write new xml parse = XmlParse (xml_file) tree, root = parse.ReadXml () parse.AddNode (root "Python", {"age": "22", "hello": "world"}, "YES") parse.WriteXml (". / xml/country_data_added.xml") if _ _ name__ = = "_ _ main__": xml_file= os.path.abspath (". / xml/country_data.xml") read_xml (xml_file=xml_file) # read detailed method # method_1 () xml.dom

Xml.dom.minidom official documentation

#-*-coding:utf-8-*-from xml.dom.minidom import parsedef read_xml (xml_file=''): domTree = parse (xml_file) # document root element rootNode = domTree.documentElement # used to get the document element print ('attribute value:', rootNode.nodeName,rootNode.nodeValue,rootNode.nodeType) # node name, value of the dom object Type customers = rootNode.getElementsByTagName ("customer") for customer in customers: if customer.hasAttribute ("ID"): print ("ID:", customer.getAttribute ("ID")) # name element name = customer.getElementsByTagName ("name") [0] print (name.nodeName, ":" Name.childNodes [0] .data) # phone element phone = customer.getElementsByTagName ("phone") [0] print (phone.nodeName, ":", phone.childNodes [0] .data) # comments element comments = customer.getElementsByTagName ("comments") [0] print (comments.nodeName, ":" Comments.childNodes [0] .data) def write_xml (xml_file=''): domTree = parse (xml_file) # document root element rootNode = domTree.documentElement # New customer node customer_node = domTree.createElement ("customer") customer_node.setAttribute ("ID", "C003") # create name node And set textValue name_node = domTree.createElement ("name") name_text_value = domTree.createTextNode ("kavin") name_node.appendChild (name_text_value) # hang the text node to the name_node node customer_node.appendChild (name_node) # create the phone node And set textValue phone_node = domTree.createElement ("phone") phone_text_value = domTree.createTextNode ("32467") phone_node.appendChild (phone_text_value) # hang the text node to the name_node node customer_node.appendChild (phone_node) # create the comments node, here is CDATA comments_node = domTree.createElement ("comments") cdata_text_value = domTree.createCDATASection ("A small but healthy company.") Comments_node.appendChild (cdata_text_value) customer_node.appendChild (comments_node) rootNode.appendChild (customer_node) with open ('. / xml/customer_added.xml', 'w') as f: # indent-wrap-encode domTree.writexml (f, addindent='', newl= "" Encoding='utf-8') def update_xml (xml_file=''): domTree = parse (xml_file) # document root element rootNode = domTree.documentElement names = rootNode.getElementsByTagName ("name") for name in names: if name.childNodes [0] .data = = "Acme Inc.": # get the name node's parent node pn = name [XSS _ clean] # the phone node of the parent node In fact, the sibling node of name # may have a sibNode method, but I haven't tried it. You can google phone = pn.getElementsByTagName ("phone") [0] # update the value of phone phone.childNodes [0] .data = 99999 with open ('. / xml/customer_updated.xml', 'w') as f: # indent-wrap-code domTree.writexml (f, addindent='') Encoding='utf-8') if _ _ name__ = ='_ main__': xml_file='./xml/customer.xml' read_xml (xml_file=xml_file) write_xml (xml_file=xml_file) update_xml (xml_file=xml_file) ValueError: multi-byte encodings are not supported

Pyton reported an error when parsing xml

Because of the coding problem, the head of xml

Change to

The above is how to parse xml files in python. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.