Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the ways to parse XML files

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article will explain in detail the ways of parsing XML files for you. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

Features: DOM needs to load XML files into memory at once.

The SAX does not need to be loaded at once, and the analysis can begin immediately, rather than waiting for all the data to be processed.

JDOM makes extensive use of Java's collection objects, which greatly improves the work efficiency of Java programmers.

DOM4J is currently the most widely used, and our project is also using DOM4j to parse.

1) DOM (JAXP Crimson parser)

DOM is the official W3C standard for representing XML documents in a platform-and language-independent manner. A DOM is a collection of nodes or pieces of information organized in a hierarchical structure. This hierarchy allows developers to look for specific information in the tree. Analyzing the structure usually requires loading the entire document and constructing the hierarchy before any work can be done. Because it is based on the information hierarchy, DOM is considered to be tree-based or object-based. DOM and generalized tree-based processing have several advantages. First, because the tree is persistent in memory, you can modify it so that the application can make changes to the data and structure. It can also navigate up and down the tree at any time, rather than an one-time process like SAX. DOM is also much easier to use.

2) SAX

The advantages of SAX processing are very similar to those of streaming media. The analysis can begin immediately instead of waiting for all the data to be processed. Also, because the application only checks the data when it is read, there is no need to store the data in memory. This is a huge advantage for large documents. In fact, the application doesn't even have to parse the entire document; it can stop parsing when a condition is met. Generally speaking, SAX is also much faster than its replacement, DOM.

Choose DOM or SAX? For developers who need to write their own code to process XML documents, choosing DOM or SAX parsing model is a very important design decision. DOM accesses XML documents by establishing a tree structure, while SAX uses the event model.

The DOM parser converts the XML document into a tree that contains its contents and can traverse the tree. The advantage of parsing the model with DOM is that it is easy to program. Developers only need to call the instructions to build the tree, and then use navigation APIs to access the required tree nodes to complete the task. You can easily add and modify elements in the tree. However, because you need to process the entire XML document when using the DOM parser, the performance and memory requirements are high, especially when you encounter large XML files. Because of its traversal ability, DOM parsers are often used in services where XML documents need to change frequently.

The SAX parser uses an event-based model, which can trigger a series of events when parsing an XML document, and when a given tag is found, it can activate a callback method that tells the method that the tag has been found. SAX usually has low memory requirements because it leaves developers to decide which tag. SAX's extensibility is better demonstrated, especially when developers only need to deal with part of the data contained in the document. However, coding can be difficult when using a SAX parser, and it is difficult to access many different data in the same document at the same time.

3) JDOM www.jdom.org

The goal of JDOM is to become a Java-specific document model, which simplifies interaction with XML and is faster than using DOM. As the first Java-specific model, JDOM has been greatly promoted and promoted. It is under consideration to request JSR-102 through the Java specification to eventually use it as an "Java standard extension". JDOM development has started since the beginning of 2000.

There are two main differences between JDOM and DOM. First, JDOM uses only concrete classes and no interfaces. This simplifies API in some ways, but also limits flexibility. Second, API uses Collections classes heavily, simplifying the use of Java developers who are already familiar with these classes.

The JDOM document states that its purpose is to "use 20% (or less) of effort to solve 80% (or more) Java/XML problems" (20% based on the learning curve). JDOM is certainly useful for most Java/XML applications, and most developers find API much easier to understand than DOM. JDOM also includes a fairly extensive review of program behavior to prevent users from doing anything that doesn't make sense in XML. However, it still requires you to fully understand XML in order to do something beyond the basics (or even understand errors in some cases). This may be a more meaningful job than learning the DOM or JDOM interfaces.

JDOM itself does not contain a parser. It usually uses a SAX2 parser to parse and validate the input XML document (although it can also take the previously constructed DOM representation as input). It includes converters to output JDOM representations to SAX2 event streams, DOM models, or XML text documents. JDOM is open source released under a variant of the Apache license.

4) DOM4J dom4j.sourceforge.net

Although DOM4J represents a completely independent development outcome, it was originally an intelligent branch of JDOM. It incorporates many features that go beyond basic XML document representation, including integrated XPath support, XML Schema support, and event-based processing for large or streaming documents. It also provides the option to build a document representation, which has parallel access through DOM4J API and standard DOM interfaces. It has been under development since the second half of 2000.

To support all of these features, DOM4J uses interfaces and abstract base class methods. DOM4J makes heavy use of the Collections class in API, but in many cases it also provides alternatives to allow better performance or more direct coding. The direct benefit is that although DOM4J pays the price of more complex API, it provides much more flexibility than JDOM.

When adding flexibility, XPath integration, and the goal of working with large documents, DOM4J's goal is the same as JDOM's: ease of use and intuitive manipulation for Java developers. It is also committed to becoming a more complete solution than JDOM, achieving the goal of dealing with essentially all Java/XML problems. When accomplishing this goal, it places less emphasis on preventing incorrect application behavior than JDOM.

DOM4J is a very excellent Java XML API with excellent performance, powerful functions and extremely easy to use. It is also an open source software. Nowadays you can see that more and more Java software are using DOM4J to read and write XML. It is especially worth mentioning that even Sun's JAXM uses DOM4J.

2. Compare

1) DOM4J has the best performance, and even the JAXM of Sun is using DOM4J. At present, DOM4J is widely used in many open source projects, for example, the famous Hibernate also uses DOM4J to read XML configuration files. If portability is not considered, then use DOM4J.

2) JDOM and DOM performed poorly in performance testing, and memory overflowed when testing 10m documents. It is also worth considering using DOM and JDOM in the case of small documents. Although the developers of JDOM have stated that they expect to focus on performance issues before the official release, it really has nothing to recommend from a performance point of view. In addition, DOM is still a very good choice. DOM implementation is widely used in many programming languages. It is also the basis for many other XML-related standards because it is formally recommended by the W3C (as opposed to the non-standards-based Java model), so it may also be needed in some types of projects (such as using DOM in JavaScript).

3) SAX performs better, which depends on its specific parsing method-event-driven. A SAX detects the upcoming XML stream, but it is not loaded into memory (of course, when the XML stream is read, some documents are temporarily hidden in memory).

3. The basic usage of four xml operation modes

Xml file:

< RESULT > < VALUE > < NO > A1234 < / NO > < ADDR > XX < / ADDR > < / VALUE > < VALUE > < NO > B1234 < / NO > < ADDR > XX Group < / ADDR > < / VALUE > < / RESULT >

1) DOM

Import java.io.*; import java.util.*; import org.w3c.dom.*; import javax.xml.parsers.*; public class MyXMLReader {public static void main (String arge []) {long lasting = System.currentTimeMillis (); try {File f=new File ("data_10k.xml"); DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance (); DocumentBuilder builder=factory.newDocumentBuilder (); Document doc = builder.parse (f) NodeList nl = doc.getElementsByTagName ("VALUE"); for (int iTuno < nl.getLength (); iTunes +) {System.out.print ("license plate number:" + doc.getElementsByTagName ("NO") .item (I). GetFirstChild (). GetNodeValue ()); System.out.println ("owner address:" + doc.getElementsByTagName ("ADDR") .item (I). GetFirstChild (). GetNodeValue ()) }} catch (Exception e) {e.printStackTrace ();}

2) SAX

Import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*; public class MyXMLReader extends DefaultHandler {java.util.Stack tags = new java.util.Stack (); public MyXMLReader () {super ();} public static void main (String args []) {long lasting = System.currentTimeMillis (); try {SAXParserFactory sf = SAXParserFactory.newInstance (); SAXParser sp = sf.newSAXParser () MyXMLReader reader = new MyXMLReader (); sp.parse (new InputSource ("data_10k.xml"), reader);} catch (Exception e) {e.printStackTrace ();} System.out.println ("run time:" + (System.currentTimeMillis ()-lasting) + "milliseconds");} public void characters (char ch [], int start, int length) throws SAXException {String tag = (String) tags.peek () If (tag.equals ("NO")) {System.out.print ("license plate number:" + new String (ch, start, length));} if (tag.equals ("ADDR")) {System.out.println ("address:" + new String (ch, start, length));}} public void startElement (String uri,String localName,String qName,Attributes attrs) {tags.push (qName);}}

3) JDOM

Import java.io.*; import java.util.*; import org.jdom.*; import org.jdom.input.*; public class MyXMLReader {public static void main (String arge []) {long lasting = System.currentTimeMillis (); try {SAXBuilder builder = new SAXBuilder (); Document doc = builder.build (new File ("data_10k.xml")); Element foo = doc.getRootElement (); List allChildren = foo.getChildren () For (int iTuno getText (I); System.out.print +) {System.out.print ("license plate number:" + ((Element) allChildren.get (I)). GetChild ("NO"). GetText (); System.out.println ("owner's address:" (Element) allChildren.get (I)). GetChild ("ADDR"). GetText ()) } catch (Exception e) {e.printStackTrace ();}}

4) DOM4J

Import java.io.*; import java.util.*; import org.dom4j.*; import org.dom4j.io.*; public class MyXMLReader {public static void main (String arge []) {long lasting = System.currentTimeMillis (); try {File f = new File ("data_10k.xml"); SAXReader reader = new SAXReader (); Document doc = reader.read (f); Element root = doc.getRootElement (); Element foo For (Iterator I = root.elementIterator ("VALUE"); i.hasNext () {foo = (Element) i.next (); System.out.print ("license plate number:" + foo.elementText ("NO")); System.out.println ("owner address:" + foo.elementText ("ADDR"));} catch (Exception e) {e.printStackTrace () }) this is the end of the article on "what are the ways to parse XML files?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report