XML Technology-Schema constraint-Dom4j-Xpath detailed explanation 04/01 Update SLTechnology News&Howtos

XML Technology-Schema constraint-Dom4j-Xpath detailed explanation

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I can't do much in my life, so everything has to be brilliant.

People can't do too many things in my life,so everything will be wonderful

References to this document w3cschool.CHM API tutorial documentation for free download address http://down.51cto.com/data/2300287

XML technology 1. What is XML?

XML refers to the extensible markup language EXtensibleMarkup Language

XML is a markup language very similar to HTML.

XML is designed to transmit data rather than display it.

The XML tag is not predefined. You need to define your own label.

XML is designed to be self-descriptive.

XML is the recommended standard of W3C

The difference between html and xml

Html Hypertext markup language. It is mainly used to encapsulate the data to be displayed on the page and finally parse the html file through the browser and display the data on the browser. Similarly, we can use JS and DOM technology to parse and manipulate html files.

Xml extensible markup language. In the early days, it was intended to replace html technology, but the web pages written by html did not succeed because of their high global share.

In the later stage, we began to use xml files as software configuration files or data storage files and data transfer files.

2. XML action

Store and transfer complex relational model data

The main use as a configuration file in a software system

The modules it starts to improve the flexibility of the system are usually determined by its configuration file.

For example, when a software starts, it needs to start, two modules, and A, these two modules need the support of A1, A2, B1, B2 modules respectively. In order to accurately describe this relationship, it is most appropriate to use files at this time.

3. XML syntax

A XML file is divided into the following sections

Document declaration

element

Attribute

Annotation

CDATA area, special characters

Processing instruction processing instruction

3.1 document declaration

When writing an XML document, you need to first declare the type of the XML document using the document declaration. That is to tell other parsing software that the document is an XML document.

The simplest declaration syntax

Use the encoding attribute to describe the frequently used character encoding of a document

L use the standalone attribute to indicate whether the document is independent

Drag it into the browser to parse

Note that the problem of incorrect Chinese garbled will be parsed if you use notepad editing. It just appears when using notepad editing for the following reasons

L common mistakes

1. Attribute without quotation marks

two。 Full-width space

3. Coding error

3.2 element element

The XML element refers to the tags that appear in the XML file. A tag is divided into an opening tag and an ending tag. A tag has the following written forms, such as

Include tag body wyait.blog.51cto.com/

Without the label body, abbreviated as

Several child tags can also be nested within a tag. But all tags must be reasonably nested. Cross-nesting is never allowed, for example.

Welcometo wyait.blog.51cto.com/

A well-formed XML document must have one and only one root tag. All other tags are descendants of that root tag.

All spaces and newline XML parsers that appear in XML tags are treated as tag content. For example, the meaning of the following two paragraphs is different.

The first paragraph

Wyait.blog.51cto.com

The second paragraph

Wyait.blog.51cto.com

Because both spaces and line breaks are treated as raw content in XML, the "good" writing habit of using line breaks and indentation to make the contents of the original file clearly readable may be forced to change.

Naming convention

A XML element can contain letters, numbers, and other visible characters, but must follow the following specifications

Case-sensitive, such as

And

It's two different marks.

Cannot start with a number

Cannot contain spaces

The name cannot contain a colon:-Schema constraint conflict

The beginning of "_" (underscore) is not recommended.

Attribute

A tag can have multiple attributes, each with its own name and values, such as

Attribute values must be enclosed in double quotation marks or single quotation marks.

Defining attributes must follow the same naming convention as tags

In XML technology, the information represented by tag attributes can also be changed to describe, for example, in the form of child elements

Text

3.4 comments

The comments in the Xml file are in the format "".

Be careful

There can be no comments before the XML declaration

Comments cannot be nested, for example

……

3.5 CDATA area

When writing a XML file, there may be something that you don't want the parsing engine to parse but rather treat as raw content.

In this case, you can put this content in the CDATA area. The content in the CDATA area will not be processed by the XML parser, but will be output directly intact.

Grammar content]] >

]] >

In html, it represents the title tag.

Indicates line feeds in html

]] >

In html, it represents the title tag.

3.5.1 escape character

For some individual characters, if you want to show their original style, you can also use the form of escape.

Common XML escape characters are recorded as follows:

Spaces: those out of the library represent spaces! When parsing, the space is still a space!

Line change: Hello! \ nThe world! One of them\ nrepresents a line break.

Indent: Hello! \ t the world! The\ t represents several spaces that press the Tab key once.

It should be noted that due to the difference in the number of basic indented cells defined by the system, some represent four half-width characters and some represent eight half-width characters, so the effect may be different when displayed.

If line feeds, spaces, indents, etc., are used in the xml configuration, line feeds, spaces, indents will be parsed to\ n, spaces,\ t and other escape characters! As follows:

When parsing:

Code will lead to, its own configuration of the release path, the result is not released!

3.6 processing instruction processing instruction

Processing instructions are referred to as PI processinginstruction. Processing instructions are used to instruct the parsing engine how to parse the contents of an XML document.

For example, in an XML document, you can use the xml-stylesheet directive to tell the XML parsing engine to apply the css file to display the contents of the xml document.

Processing instructions must end with "". XML declaration statements are the most common processing instructions.

3.7 Summary

All XML elements must be related to a closed tag

L XML tags are case sensitive

L XML must be nested correctly

L XML document must have a root element (only one)

The attribute value of l XML must be in quotation marks

L special characters must be escaped-CDATA

Spaces in l XML and carriage return feeds are retained when parsing

4. XML constraint

What is a XML constraint?

In XML technology, you can write a document to constrain the writing specification of an XML document, which is called a XML constraint.

Why XML constraints are needed

Commonly used constraint techniques

XML DTD

XML Schema XSD

4.1 DTD constraint

Getting started with DTD

1. Create a xml file first

2. Write a DTD file

The extension of the dtd file must be dtd

Write as many ELEMNT as there are tags in xml in dtd

3. Import DTD constraints in xml file

4.1.1 DTD constraint syntax

The combination of DTD and xml files

Use internal DTD

You can write dtd and xml in the same file.

The XML file uses the DOCTYPE declaration statement to indicate that the DTD file DOCTYPE declaration statement it follows has two forms

When the referenced file is local, the external DTD is as follows

For example. Write it by hand in the xml file.

When the referenced file is a public file, the public DTD is as follows

For example

4.1.2 element ELEMENT definition of DTD

Use ELEMENT in DTD to declare the tag name () that can appear in the current xml to restrict the text or child tags in the current tag.

Tell us that there can be a books tag in the current xml and there can be one or more bookchild tags under this books tag.

+ the tag in the current parenthesis can appear one or more times

The tag in the current parenthesis can appear zero or once

* this tag in current parentheses can appear zero or more times

The comma in parentheses is the order in which the child tags are defined.

Text can be written in the current name tag

4.1.3 attribute ATTLIST definition

Label signature

Attribute name attribute type attribute constraint

...

The current book tag has an abc attribute whose value is text, but this must be written and cannot be omitted.

4.1.4 entity ENTITY definition

Related label referenc

4.2 Schema constraints XSD constraints 4.2.1 Schema Overview

XML Schema defines the structure and content schema of an XML document using a set of predefined XML elements and attributes created by these elements and attributes. XML Schema specifies the structure of the XML document instance and the data type of each element / attribute

The obvious advantage of Schema over DTD is that the XMLSchema document itself is an XML document rather than using its own syntax like DTD.

The difference between Schema and DTD

XML inherits DTD from SGML and uses it to define model validation and organizational elements for content. It also has a lot of limitations.

DTD does not follow XML syntax

DTD is not scalable

DTD does not support the application of namespaces

DTD does not provide strong data type support, which can only represent very simple data types.

Schema completely overcomes these weaknesses and makes it easier for Web-based applications to exchange XML data. Here are some of the new features it shows

Schema is based entirely on XML grammar and does not need to learn special syntax.

Schema can be processed with tools that process XML documents without the need for special tools

Schema greatly expands data types to support booleans, numbers, dates and times, URIs, integers, decimal numbers and real numbers, etc.

Schema supports prototypes, that is, inheritance of elements. For example, we define a "contact" data type, and then we can generate two data types: "friend contact" and "customer contact" based on it.

Schema supports property groups. We generally declare some common attributes and then apply them to all element attribute groups that allow elements and attribute relationships to be externally defined and combined.

Openness. The original DTD can only have one DTD applied to one XML document and now multiple Schema can be applied to one XML document.

4.2.2 basic concepts of Schema

The XML Schema file itself is a XML file, but it usually has a .xsd extension.

Like a XML file, a XMLSchema document must have a root node, but the name of the root node is Schema.

Developing xml process with schema constraint

W3C predefined elements and attributes-à Schema document schema document constraint document-à XML document instance document

After writing a XMLSchema constraint document, you usually need to bind the elements declared in this file to an address where the URI address is called the namespace namespace, and then the XML file can reference the elements of the specified namespace through this URI, that is, the namespace.

Basic structure of XML Schema document

The W3C XML schema specification stipulates that all Schema documents are used as their root elements

An element can contain some attributes. An XMLschema declaration often seems to appear in the following form

4.2.3 getting started with Schema

1. Define a xml file

2. Write a Schema file

Schema must be used as the root tag of the Schema file in the Schema file.

Xmlns= "http://www.w3.org/2001/XMLSchema"

It means that the current Schema file is constrained by the currently specified url namespace.

TargetNamespace= "http://www.example.org/books"

Give the current Schema file a name when the xml file that needs to be constrained by the current Schema file needs to be introduced into the current Schema file with the current name.

The property value corresponding to the targetNamespace attribute can be any content. Like targetNamespace= "http://www.huyouta.com/books".

3. Introduce the Schema file into the xml file

Xmlns= "http://www.huyouta.com/books"

Introduce the name of the Schema file in xml

Treasure Book of Sunflower

Monitor

9.9

4.2.4 Namespace

Declare document space

The targetNamespace element is used to specify which namespace the elements declared in the schema document belong to.

The elementFormDefault element is used to specify whether the local element is qualified by the namespace specified by the schema specified by the targetNamespace.

The attributeFormDefault element is used to specify whether the local attribute is qualified by the namespace specified by the schema specified targetNamespace.

Summary

When writing a Schema file, you need to use the targetNamespace attribute in the Schema file to name the current Schema file.

Take the value of the targetNamespace attribute as the namespace of the current Schema file.

In the xml file, you need to introduce Schema files under different names through xmlns.

If we introduce multiple Schema namespaces in the same xml file, we need to give these namespaces their aliases.

If the xml file uses more than one xmlns to introduce multiple namespaces, you need to use colons after the xmlns to name the current namespace. Through this distinction, you can tell which Schema file the tags in the current xml are limited to.

5. XML parsing

DOM-Document Object Model- document object model. Is a way for W3C organizations to deal with xml.

Characteristics

Load all data into memory at once.

Each node in the xml document is treated as a Node object. Including elements, text, attributes.

Document,Element,Node in the org.w3c.dom package.

It is very convenient to modify.

Has been integrated into JDK is Sun's standard for xml operations.

The disadvantage is that it takes up a lot of memory when the document has a large amount of data.

Sax-Sample Api for XML.

Analyzing the data when reading the data is done through the event listener.

Fast but only suitable for reading data only read forward but not backward.

Xml extensible markup language.

Both html files and xml files are tagged documents that can be parsed using dom technology developed by the W3C organization.

Dom parsing technology is developed by W3C organization and all programming languages use the characteristics of their own language to implement this parsing technology.

Java also implements the parsing of tagged documents with dom technology.

Dom technology developed by sun in the early days. This technology needs to load the entire xml file into memory when the page xml can be parsed according to getElementById, getElementsByName, getElementsByTagName and other methods.

Sun upgrades dom parsing technology in JDK6 version SAX parsing Stax parsing

Sun's analysis is collectively referred to as JAXP.

5.1the way to parse XML

JAXP (JavaApi for Xml Programming)-A set of API for operating XML from sun.

DOM parsing-loads all the data into memory at once.

SAX parsing-parsing while reading.

Dom4j (Document For Java)-third-party open source is a parsing technology split from jdom. Jdom has been completely replaced by dom4j at present.

The predecessor of jDom-Dom4j.

Dom4j is faster and faster than sun in terms of performance and speed and supports Xpath to quickly find large frameworks such as Spring,Hibernate using dom4j.

The new feature of StAX-JDK1.6 has been integrated into JDK6 as a new member of JAXP.

Dom4j

Dom4j is an open source, flexible XML API.

At present, many open source frameworks such as struts,hibernate use dom4j as a tool to parse their xml.

Support document read and write function and Xpath fast query operation.

Since dom4j is not the technology of sun but belongs to a third-party company, if we need to use dom4j, we need to download dom4j's jar package from the official website of dom4j.

Copy the dom4jjar package into our project

Create a new lib file in your project and copy the dom4j jar package to it

Add the jar package to the current classpath path

5.2.1 get the document object

/ / Note that the following classes all come from the org.dom4j package

/ / 1. Instantiate the parser

SAXReader sax = newSAXReader ()

/ / 2. Read xml documents

Document doc = sax.read (". / src/xml/a.xml")

/ / 3. You must get the root node first.

Element root = doc.getRootElement ()

/ / 4. Get the name of the first person

String name = root.element ("user") element ("name") .getText ()

System.err.println (name)

5.2.2 get the text values in all tags

/ / demonstrate using dom4j to get data in tags in xml

Publicstaticvoid getElement () throws Exception {

SAXReader reader = new SAXReader ()

/ / get the dom tree

Document dom = reader.read ("users.xml")

/ / get the root tag in xml

Element root = dom.getRootElement ()

/ / get all child tags under the root tag

List list = root.elements ()

/ / iterate through the collection to get each user tag

For (Element e: list) {

Elementname = e.element ("name")

Elementage = e.element ("age")

Elementsex = e.element ("sex")

System.out.println (name.getText () + ":" + age.getText () + ":" + sex.getText ())

}

5.2.3 modify the value in the specified label

/ / modify the sex in the last user to female.

Publicstaticvoid UpdateElement () throws Exception {

SAXReader reader = new SAXReader ()

/ / get the dom tree

Document dom = reader.read ("users.xml")

/ / get the root tag first

Element root = dom.getRootElement ()

/ / get all user tags under users

List list = root.elements ()

/ / get the last user tag

Element lastUser = list.get (list.size ()-1)

Element sex = lastUser.element ("sex")

Sex.setText ("female")

/ / rewrite the modified dom tree in memory to the xml file

/ / create a stream object for writing out data

/ / XMLWriter writer = new XMLWriter (newFileOutputStream ("users.xml"))

/ / create a formatter

OutputFormat format = OutputFormat.createPrettyPrint ()

/ / set the coding table

Format.setEncoding ("gbk")

XMLWriter writer = new XMLWriter (new FileWriter ("users.xml"), format)

/ / write out data

Writer.write (dom)

/ / close the flow

Writer.close ()

}

5.2.4 deleting tags

/ / Delete

Publicstaticvoid deleteElement () throws Exception {

SAXReader reader = new SAXReader ()

/ / get the dom tree

Document dom = reader.read ("users.xml")

/ / Delete the last user tag

/ / get the root tag first

Element root = dom.getRootElement ()

/ / get all user tags under users

List list = root.elements ()

/ / get the last user tag

Element lastUser = list.get (list.size ()-1)

Root.remove (lastUser)

XMLWriter writer = new XMLWriter (newFileOutputStream ("users.xml"))

Writer.write (dom)

/ / close the flow

Writer.close ()

}

5.2.5 add tags

/ / create a new dom to write to file

Publicstaticvoid addElement () throws Exception {

/ / create a dom tree first. This dom tree is in memory.

Document dom = DocumentHelper.createDocument ()

/ / add a root node to the tree

Element books = dom.addElement ("books")

/ / add 2 book tags to the root books

Element book = books.addElement ("book")

Element book2 = books.addElement ("book")

/ / add a child tag to the book tag

Element name = book.addElement ("name")

Element author = book.addElement ("author")

Element price = book.addElement ("price")

/ / add text to the child tags under book

Name.setText (Jiuyin True Sutra)

Author.addText ("Li Bai")

Price.addText ("1.1")

/ / add a child tag to the book tag

Element name2 = book2.addElement ("name")

Element author2 = book2.addElement ("author")

Element price2 = book2.addElement ("price")

/ / add text to the child tags under book

Name2.setText ("Jiuyang Shengong")

Author2.addText (Zhao Min)

Price2.addText ("1.2")

/ / add attributes to the book tag

Book.addAttribute ("addr", "Sutra Collection Pavilion")

Book2.addAttribute ("addr", "Peach Blossom Island")

OutputFormat format = OutputFormat.createPrettyPrint ()

XMLWriter writer = new XMLWriter (newFileOutputStream ("books2.xml"), format)

Writer.write (dom)

/ / close the flow

Writer.close ()

}

5.2.6 tool class extraction

/ * *

* at this time, a tool class whose function is to obtain and save the dom number

* @ authorwyait

* @ version 1.0

, /

Publicclass DomUtils {

Privatestatic Document dom = null

Static {

Try {

SAXReaderreader = new SAXReader ()

/ / get the dom tree

Dom = reader.read ("users.xml")

} catch (Exception e) {

/ / write the exception to the log file

System.out.println ("Congratulations on your failure to get the dom tree")

}

/ * *

* method used to get dom tree

, /

Publicstatic DocumentgetDom () {

Returndom

}

/ * *

* Save the dom tree

, /

Publicstaticvoid saveDom () {

Try {

OutputFormatformat = OutputFormat.createPrettyPrint ()

XMLWriterwriter = new XMLWriter (new FileOutputStream ("users.xml"), format)

Writer.write (dom)

/ / close the flow

Writer.close ()

} catch (Exception e) {

System.out.println ("Congratulations on your failure to save the dom tree")

}

5.2.7 Dom4j generates a new XML file

/ / 1. Create a Document in memory through DocumentHelper

Document doc = DocumentHelper.createDocument ()

Doc.setXMLEncoding ("UTF-8"); / / the encoding format of XML

/ / generate a node, and the first node generated is also the root node. This method is used only once.

Element root = doc.addElement ("users")

Root.addElement ("user") .setText ("Jack"); / / set another child node and set the value at the same time

/ / write it out. If you have Chinese, please use the technology described on the previous page.

XMLWriter writer= new XMLWriter (new FileOutputStream ("a.xml"))

Writer.write (doc)

Writer.close ()

5.3 xpath Technology bias Theory

Xpath technology is also a technology developed by W3C to quickly obtain a tag in a xml file.

L XML PATH Language .

L can realize fast query.

L XPATH contains

XPath uses path expressions to navigate through XML documents.

XPath contains a standard function library

L prepare the package for Xpath

Jaxen.jar

L Xpath uses the following methods

Dom.selectNodes-returns a List object

Dom.selectSingleNode-returns a Node object

5.3.1 Xpath example

/ / Select all user nodes below to deal with the following principles without namespaces

List list = doc.selectNodes ("/ / user")

System.err.println (list.size ())

/ / Select all name nodes below

List = doc.selectNodes ("/ / name"); / / or start with: / users//name

System.err.println (list.size ())

/ / Select the node with the country attribute below

List = doc.selectNodes ("/ / user [@ country]")

System.err.println (list.size ())

/ / Select a node whose country is EN. You can use the following methods to query user login.

/ / if country cannot be repeated, you can use selectSingleNode

/ / you can use either double or single quotation marks / / user

Node node = doc.selectSingleNode ("/ / user [@ country=\" EN\ "]")

System.err.println (node)

In xpath / it means to look for the tag from the root / / it does not consider the location of the tag, as long as it matches.

/ / abc [@ attribute name] Select the abc tag but require that the abc must have a specified attribute name

An exception occurred when using xpath technology combined with dom4j to get tags quickly.

The class was reported and no exception was found. At this point, the jar package is generally missing.

Generally, if the jar package is missing, the second or third word in the reported exception is the name of the jar package.

Case-insensitive solution

The following are all queries based on attributes

/ / the following query that the id element is XX and the name attribute is the name value of the user group are all converted to lowercase

String path = "/ / user [fn:upper-case (@ id) ='XX" 'and fn:upper-case (@ name) =' "+ name+"] "

Node n=dom.selectSingleNode (path); / / use Single because it is certain that there is an object

Or convert to lowercase the following query book element whose id attribute is hello

List list = dom.selectNodes ("/ / book [fn:lower-case (@ id) = 'hello']")

Xpath Fuzzy query Properties

An element that contains a value in a query attribute is similar to like

/ / book [contains (@ id,'A001')]

/ / query the id attribute of book that contains the A001 string

Since it's a fuzzy query, of course you have to convert it to case, so

/ / book [contains (fn:lower-case (@ id), 'a001')]

Xpath distinguishes major elements by querying the values of child elements: do not use the @ symbol

For example, the XML document is as follows

The basis of Oracle programming

89.99

Query all book elements that contain the word Oracle.

/ / book [name='Oracle'] / / precisely query the child element book element whose value name is Oracle

/ / the following is a fuzzy query

/ / book [contains (name,'Oracle')]

/ / you can also convert the value of the name element to lowercase

/ / book [contains (fn:lower-case (name), 'oracle']

Xpath processes XML documents with namespaces

Because the namespace of the namespace is part of the constituent element, that is, the prefix, the namespace must be set when processing XML documents with namespaces

Such as

For the above document, all elements come from the default namespace.

Xpath processes XML documents with namespaces

SAXReader sax = new SAXReader ()

/ / declare a map to save the namespace

Map uris = newHashMap ()

/ / give the namespace an alias

Uris.put ("a", "http://www.itcast.cn");"

/ / read the xml document after setting the namespace

Sax.getDocumentFactory () setXPathNamespaceURIs (uris)

Document dom = sax.read (". / xml2/a.xml")

/ / then use a prefix with a namespace to query.

Dom.selectNodes ("/ / a:book")

/ / queries with attributes are the same as before

Dom.selectNodes (/ / a:book [@ id]

/ / queries with elements must add a namespace prefix

Dom.seletNodes ("/ / a:book]); / / query the book element whose child element value is oracle

6. Summary

L SAXStAX reads fast. Are all members of JAXP.

L StAXIterator programming interface and Cursor programming interface.

L Dom4j . Dom . All nodes are loaded into memory. CRUD is very convenient.

L Dom4j supports XPath.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.