What are the basic knowledge points of XHTML 04/17 Update SLTechnology News&Howtos

What are the basic knowledge points of XHTML

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the basic knowledge points of XHTML". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what are the basic knowledge points of XHTML?"

Abstract

XHTML 1.0 is a redesigned HTML4 as a XML1.0 application. This specification book defines XHTML 1.0 and its three document type definitions (DTD,Document Type Definition) corresponding to HTML 4. The semantics of each element and their attributes have been defined in W3C Recommendation for HTML 4, which are the basis for future XHTML extensions. XHTML documents are compatible with existing HTML user agents by following a small set of guidelines.

The status of this document

This section describes the situation at the time of publication of this document. Other documents may replace this document. The W3C is responsible for maintaining the latest status of this document series. This document is reviewed by W3C members and some interested groups, and approved by the supervisor as a W3C recommendation. This is a stable document that can be used as a reference material or as a standard reference in other documents. The purpose of W3C's formulation of this recommendation is to draw attention to this specification, make it widely spread, and enhance the function and interoperability of Web.

1. What is XHTML?

XHTML is a series of current and future document types and blocks that are regenerated and extended from HTML 4 [HTML], of which HTML 4 is a subset. The XHTML series of documents are based on XML and are ultimately designed to work with XML-based user agents. Details of the XHTML family and its development are detailed in the Future Trends section.

XHTML 1. 0 (this specification book) is the first document of the XHTML family. It was reformed after three HTML 4 document types were applied to XML 1.0 [XML]. The intention is that, as a language, its content conforms to XML and can be recognized by HTML4 user agents if it follows some simple guidelines. Developers will get the following benefits by porting their documents to XHTML 1.0:

The XHTML document conforms to XML. This makes it easy to view, edit, and verify them with standard XML tools.

XHTML documents can be used in existing HTML4 agent user programs or in new XHTML user agents, which can achieve the same or better results as the former.

The applications used in XHTML documents, such as script and applet, can be either HTML's document object Model (Document Object Model) or XML's document object Model [DOM].

With the development of the XHTML family, documents that comply with XHTML 1.0 are more likely to be used in a variety of XHTML environments.

The XHTML family is the next step in Internet's development. By porting documents to XML now, developers can enjoy the benefits of entering the XML world while ensuring that their documents are compatible.

1.1 what is HTML 4?

HTML 4 [HTML] is an application of SGML (Standard Generalized Markup Language), which complies with the international standard ISO 8879 and is widely regarded as a standard publishing language on World Wide Web.

SGML is a description markup language, especially those used for electronic document exchange, document management and document publishing languages. HTML is an instance of a language defined by SGML.

SGML appeared in the mid-1980s and has remained stable. Stability is due to its rich characteristics and flexibility. However, flexibility brings a certain degree of complexity, which limits its adaptability in a variety of environments, including World Wide Web.

HTML was originally conceived as a language for exchanging scientific and other technical documents for experts who are not familiar with writing documents. HTML provides a small set of structural semantic tags, which are suitable for writing relatively simple documents, thus solving the problem of SGML complexity. In addition to simplifying the document structure, HTML also added support for hypertext and later added media capabilities.

In a very short period of time, HTML became widespread and quickly exceeded its original purpose. New elements within HTML are created very quickly, and HTML is quickly adapted for vertical, highly specialized markets. Redundant new elements lead to cross-platform compatibility problems.

As software and platform inconsistencies increase, it is clear that the applicability of "classic" HTML 4 on these platforms will be limited.

1.2 what is XML?

XML is an acronym for Extensible Markup Language, and XML was created to regain the power and flexibility of XML while removing most of its complexity. Despite being a restricted form of SGML, XML retains most of the functionality and richness of SGML, as well as all the common features of SGML.

While retaining these useful features, XML removes many of the more complex features in SMGL that make it difficult and expensive to write and design the right software.

1.3Why do I need XHTML?

The benefits of porting to XHTML have been mentioned above. In general, the benefits of porting to XHTML are:

Document developers and user agent programmers often use new methods and tags to express their ideas. It is relatively easy to import new elements and new element attributes with XML. The XHTML family is designed to provide these extensions through XHTML modules and the development of new XHTML-compliant module technologies. (it will be described in the upcoming specification of XHTML modularization specification). When developing documentation and designing new user agents, these modules will allow a mix of existing and new feature sets.

A variety of alternative ways to access Internet are constantly being proposed. Some estimate that 75 per cent of documents on internet will be viewed on these alternative platforms by 2002. XHTML is designed with the collaborative nature of user agents in mind. Through a new user agent and document protocol mechanism, the server, agent and user agent will be able to best satisfy the transmission. Finally, it is possible to develop XHTML-compliant documents that can be used by any XHTML-compliant user agent.

two。 Define

2.1 terminology

The following entries should be used in this specification book. These entries are based on a similar definition in ISO/IEC 9945-1 RFC2119 1990 [POSIX.1] and are extended in [RFC2119]:

Execution definition (Implementation-defined)

When a value or behavior is defined by execution, it is defined by execution to define the corresponding needs to correctly interpret the document.

Yes (May)

At the time of execution, the word "can" is interpreted as an optional feature that is not required in this specification book, but can be provided. For document consistency, the word "can" means that optional features cannot be used. The definitions of "optional" and "can" are the same.

Must (Must)

In this specification book, "must" is interpreted as a mandatory need at execution time or for documents that strictly follow the XHTML. The definitions of the entries "shall" and "must" are the same.

Reserved (Reserved)

A value or behavior is not specified, but is not allowed for compliant documents and is not supported by compliant user agents.

Should (Should)

In implementation, "should" is interpreted as an implementation recommendation, but not a must. For documentation, "should" is interpreted to mean that it is recommended for programming exercises and necessary for documents that strictly follow XHTML.

Supported (Supported)

Some functions in this specification book are optional, and if a function is supported, its behavior is specified in this specification book.

Unspecified (Unspecified)

When a value or behavior is not specified, the specification book does not define the portability requirements of a feature, even if the feature is used in the document. In this case, a document that requires a specified behavior, rather than allowing any behavior when using this feature, is not a document that strictly follows the XHTML.

2.2 Common entries

Attribute (Attribute)

An attribute is a parameter of an element that has been declared in DTD. The type and range of values for an attribute, including its possible default values, are defined in DTD.

DTD

A DTD, also known as a document type definition, is a collection of XML declarations that define legal structures, elements, and attributes used in DTD-compliant documents.

Document (Document)

A document is a data stream, including other data streams that it references. The structure of the document is organized by elements defined by the relevant DTD to hold information. For more information, see Document Conformance.

Element (Element)

An element is the structural unit of a document declared in DTD. The content model of an element is defined in DTD, and additional semantics can be defined in another element description.

Function (Facilities)

Functions include elements, attributes, and semantics related to element attributes. The implementation that supports these functions is said to provide the required functionality.

Execute (Implementation)

Execution means that the system provides a set of functions and services to support this specification. For more information, see User Agent Conformance.

Analysis (Parsing)

Parsing is the act of scanning a document, and the information contained in the document is filtered into the information contained in the context structure of the element.

Display (Rendering)

Display is the action in which document information is presented. The display is done in the form most suitable for the environment (such as sound, vision, printing).

User Agent (User Agent)

The user agent is the executor that fetches and processes XHTML documents. For more information, see User Agent Conformance.

Authentication (Validation)

Validation is the process of verifying a document with the relevant DTD to make sure that the structure, the use of elements, and the use of attributes are consistent with the definition in DTD.

Well formatted (Well-formed)

A well-formed document is structured in accordance with section 2.1 Section 2.1 of the XML1.0 recommendation [XML]. Basically, this definition states that elements must have start and end tags and should be correctly nested within each other.

3. Standard definition of XHTML 1.0

3.1 document consistency

This version of XHTML provides a strict definition of following XHTML documents, limited by the tags and attributes provided by XHTML names and addresses. For information that uses other XHTML name and address spaces, such as metadata expressed by RDF, see Section 3.1.2.

3.1.1 strictly followed documentation.

A document that strictly conforms to XHTML requires only the mandatory functions described in this specification book and must follow the following standards:

It must be verified by one of the 3 DTD in Appendix An Appendix A.

The root element of the document must be.

The root element of the document must specify the XHTML name and address with the xmlns attribute [XMLNAMES]. The name and address of the XHTML is defined in.

There must be a DOCTYPE declaration before the root element. The public identifiers contained in the DOCTYPE declaration must refer to one of the DTD in Appendix 3 in An Appendix A, and each DTD has its own formal public identifier. The system identifier can be changed to suit the local system.

Here is an example of the smallest XHTML document:

Virtual Library

Moved to vlib.org.

Notice that the XML declaration is included in this example. XML declarations are not required in all XML documents. XHTML is strongly recommended to use the XML declaration. Such a declaration is necessary when character encoding is not the default UTF-8 or UTF- 6.

3.1.2 use other name and address spaces in XHTML

Other XHTML name and address spaces can be used with [XMLNAMES] in the XML name and address space, although this document is not strictly XHTML-compliant. The W3C will work on specifying consistency issues for multiple address spaces in the future.

The following example illustrates the method of using XHTML1.0 in conjunction with MathML recommendations.

A Math Example

The following is MathML markup:

three

The following example illustrates the use of XHTML1.0 in conjunction with other XML name and address spaces:

Cheaper by the Dozen

1568491379

This is also available online.

3.2 user agent consistency

A compliant user agent must comply with all of the following criteria:

To be consistent with the XML1.0 recommendation [XML], the user agent must analyze and evaluate the well-orchestration of the XHTML document. If the user agent claims to be an authenticated user agent, it must also validate the document against the DTD referenced by the [XML] security document.

When a user agent claims to support the function facilities defined in this specification book, it must support it according to the definition of the function.

When the user agent processes XHTML documents as normal XML, it should only recognize attributes of type ID (such as the ID attribute of most XHTML elements) as fragment identifiers.

If the user agent encounters an element that it does not recognize, it must display the contents of the element.

If the user agent encounters an attribute that it does not recognize, it must ignore the definition of the entire attribute (that is, the attribute and its value).

If the user agent encounters a property value that it does not recognize, it must use the default value of the use property.

If it encounters an entity reference (not a predefined entity), the user agent does not process its declaration (which occurs if the declaration is in an external subset that the user agent does not read). Entity references should be displayed with characters (starting with the & symbol and ending with a semi-colon) to organize them into entity references.

When displaying content, if the user agent encounters characters it does not recognize or recognize but cannot display character entity references, it should tell the user in a conspicuous manner that an abnormal display has occurred.

The following characters are defined as whitespace characters in [XML]:

Space ()

Tab ()

Enter (

)

Line feed (

)

The XML processor standardizes the line end codes of different systems as a single newline attachment to the application. Therefore, XHTML must also treat the following characters as white space characters:

Page change ()

0 width space (? )

For elements where the 'xml:space' attribute is set to reserved, the user agent must leave all white space characters intact (except for boot and trailing white space characters, which should be deleted). Otherwise, handle white space characters according to the following rules:

The white space characters around all block elements should be deleted.

All comments are removed without affecting the handling of white space characters. A white space character on either side of the comment is treated as two white space characters leading and ending white space characters should be deleted the newline characters within the block element must be converted to spaces (except when the 'xml:space' attribute is set to reserved).

A string of consecutive white space characters must be reduced to a single space (except when the 'xml:space' property is set to reserved).

When displayed, the user agent should display the language in which the document content is written for the document content in an appropriate manner. In predominantly Latin written languages, ASCII spaces are typically used to encode grammatical boundaries and printed whitespace; in languages related to Nagari writing (such as Sanskrit, Thai, etc.), grammatical boundaries can be encoded with ZW space characters, but the output is generally not represented by printed spaces In a language written in Arabic, you can encode printed whitespace with space characters, but you cannot use ZW spaces to delimit grammatical demarcation within the language (for example, the 'kitAbuhum'='kitAbu-hum'='book them'==their book of a word in English is encoded for several words in Arabic); Chinese writing habits generally encode neither the demarcation nor the printing whitespace in this way.

The white space character in the value of the property is treated as [XML].

4. The difference from HTML 4

Because XHTML is an application of XML, some habits that are perfectly legal in SGML-based HTML 4 must be changed in XHTML.

4.1 documentation must be well organized

Well-choreographed Well-formedness is a new concept introduced by [XML]. In essence, this means that the element must have a closing tag, or it must be written in a special way (explained below).

Elements must be nested, and although cascading is illegal in SGML, existing browsers generally allow cascading.

Correct: nested elements.

Here is an emphasized paragraph.

Incorrect: cascading elements.

Here is an emphasized paragraph.

4.2 element and attribute names must be lowercase

XHTML documents must be lowercase for all HTML element and attribute names. Because XML is case-sensitive, this difference is necessary. Such as and are different labels.

4.3 for non-empty elements, you must use a closing tag

In SGML-based HTML 4, some elements that imply closing meaning allow closing tags to be ignored. In XML-based XHTML, this omission is not allowed. Except for elements declared empty in DTD, all elements must have a closing tag.

Correct: the finished element.

Here is a paragraph.

Here is another paragraph.

Incorrect: unfinished element.

Here is a paragraph.

Here is another paragraph.

4.4 the attribute value must be in quotation marks

All attributes must be in quotation marks, even if they are numbers.

Correct: attribute values in quotation marks

Incorrect: attribute value that is not in quotation marks.

4.5 attribute minimization

XML does not support attribute minimization. Property value pairs must be fully written. An attribute name such as compact,checked cannot appear in an element without specifying an attribute value.

Correct: there are no minimized attributes

Incorrect: minimize attributes

4.6 empty element

The empty element must either have a closing tag or the starting tag ends with / >. For example, or. See the information in the HTML compatibility guide HTML Compatibility Guidelines to ensure backward compatibility with the HTML 4 user agent

Correct: empty element ending

Error: unfinished empty label

4.7 Whitespace character handling in attribute values

For attribute values, the user agent deletes the leading and trailing spaces and converts one or more white space (including line breaks) into a single intercharacter space (an ASCII space in Western writing) See Section 3.3.3 of [XML].

4.8 Script and Style element

In XHTML, the script and style elements are declared as # PCDATA content, so

< 和 & 被看作是标识的开始，<和& 这样的实体被XML处理程序看作为实体引用而分别被认为是< 和 & . 将script和style元素的内容包裹在CDATA记号部分中避免了这些实体的扩张。　　　　　　… unescaped script content … 　　]]>

The CDATA part is recognized by the XML processor and is a node in the document object model. See the DOM LEVEL 1 recommendation [DOM] of section 1.3 of Section 1.3.

The alternative is to use external script and style documents.

4.9 SGML prohibition

The DTD that SGML gives to the author can specify elements that are forbidden within an element. Such a ban is impossible in XML.

For example, the strict HTML 4 DTD forbids the nesting of'a' elements to another'a' element at any depth. Such a prohibition cannot be written in XML. Although these prohibitions cannot be defined in DTD, some elements should not be nested. In the standardized appendix B Appendix B is a summary of these elements.

4.10 elements with 'id' and' name' attributes

The elements that HTML 4 defines the name attribute are: a name applet, form, frame, frame, and map. HTML 4 also introduces the id attribute. Both attributes are designed as fragment identifiers.

In XML, the fragment identifier is of type ID, and each element can have only one attribute of type ID. Therefore, in XHTML1.0, the id property is defined as an ID type. To ensure that an XHTML1.0 document is a well-formed XML document, the XHTML document must use the id attribute when defining a fragment identifier, even for elements that previously used the name attribute. Refer to HTML Compatibility Guidelines's information to ensure that these "anchors" are backward compatible when XHTML documents are used in text/html media types.

Note that in XHTML 1.0, the name attribute is not officially supported and will be removed in future versions of XHTML.

5. Compatibility problem

Although XHTML does not have to be compatible with existing user agents, it is easy to implement operationally. Guidelines for creating compatible documents are in Appendix C.

5.1 Internet Media Type

At the time of the publication of this recommendation, the issue of generic MIME tags for XML-based applications has not been resolved.

Nonetheless, XHTML documents that follow the guidelines in Appendix C can be marked as "text/html" media types because this is compatible with most browsers. This document does not recommend other MIME tags for XHTML documents.

6. Future trend

XHTML 1.0 provides the basis for a class of document types that extend xhtml and include a subset of XHTML. To widely support new devices and applications, define modules and specify mechanisms for federating these modules. This mechanism will define new modules in a uniform way to extend and subset XHTML.

6.1 Modular HTML

As XHTML applications move from traditional desktop user agents to other platforms, it is clear that all XHTML elements do not have to be used on all platforms. For example, a handheld device or cellular phone can support only a subset of the XHTML element.

The modular process divides XHTML into a series of small sets of elements. They can be reunited when different circumstances require.

These modules will be defined in future W3C documents.

6.2 subset and scalability

Modularity brings several benefits:

Provides a formal mechanism for subset XHTML.

Provides a formal mechanism for extending XHTML.

Simplifies the conversion between document types.

Promote module reuse in new document types.

6.3 documentation protocol

The document protocol specifies the syntax and semantics of a set of documents. Following the document protocol provides the basis for ensuring document interoperability. The document protocol specifies the functions required to process such documents, such as which image file type can be used, the level of the script, the support of the stylesheet, and so on.

For product designers, this allows different groups to define their own standard protocols.

For the author. This allows them to avoid writing different versions of documents for different customers.

For professional groups, such as chemists, doctors, or mathematicians, this allows a special protocol to be established using standard HTML elements plus a set of elements that suit the needs of experts.

Appendix A. DTDs

This appendix is standardized.

These DTD and entities form a standardized part of this specification book. The complete set of DTD files in this specification book, as well as the XML declaration and the SGML open directory, are in one zip file file.

A.1 document Type definition (Document Type Definitions)

These DTD are similar to DTD of HTML 4. When DTD is modularized, the approach used by component DTD may be more responsive to HTML 4.

XHTML-1.0-Strict

XHTML-1.0-Transitional

XHTML-1.0-Frameset

A.2 entity set

The entity set of XHTML is the same as that of HTML 4, but is decorated with a valid XML 1.0 entity declaration. Note that the entity of the European currency symbol ("or" or "European") is defined in the special symbol section.

Latin-1 characters

Special characters

Symbols

Appendix B. Forbidden element

This appendix is normative.

The following elements are prohibited when they contain other elements (see Section 4.9). These are prohibited from being applied to all deep nesting, that is, preserving child elements.

Cannot contain other an elements.

Pre

Cannot contain img,object,big,small,sub, or sup elements.

Button

Cannot contain input,select,textarea,label,button,form,fieldset,iframe or isindex elements.

Label

Cannot contain other label elements.

Form

Cannot contain other form elements.

Appendix C. HTML compatibility guidance

This appendix is informative.

This appendix summarizes the design-time guidelines and guides the author in writing XHTML documents that can be displayed in existing HTML user agents.

C.1 processing instructions

Some user agents display processing instructions. Note, however, that when there is no XML declaration in the document, the document can only encode UTF-8 or UTF-16 with default characters.

C.2 empty element

Add a space before the empty element Terminator / and >, such as, and

. Also, use minimized tag syntax, for example, because another syntax allowed by XML

Many existing user agents can lead to unreliable results.

C.3 element minimization and empty element content

The content model is not an empty element. In cases where it is empty (such as an empty title or an empty paragraph), do not use the minimized form (such as

Oh, no.

C.4 embedded stylesheet and Script

If your stylesheet uses

< 或 & 或 ]]>

Or-- use external stylesheets. If your script uses

< 或 & 或 ]]>

Or-- use external script. Note that the XML parser removes the comments without telling them. As a result, the previous habit of "hiding" script and stylesheets with annotations makes documents backward compatible, but may not work as expected during XML-based execution.

C.5 branch within the attribute value

Avoid using branches and multiple spaces in property values. User agents deal with these situations inconsistently.

C.6 Isindex

Do not use more than one isindex element in the head section of the document. The isindex element is not approved, and the input element is approved.

C.7 lang and xml:lang attributes

Use both the lang and xml:lang attributes when specifying the language of the element. The xml:lang property comes first.

C.8 fragment identifier

In XML, ending the fragment identifier URI [RFC2396] with "# foo" does not mean that the element has an attribute name= "foo", but rather that the element has an attribute defined as ID, such as the id attribute in HTML 4. Many HTML clients do not support ID type properties in this way, so you can attach the same value to both properties to ensure maximum backward and forward compatibility. (such as... ).

In addition, because the legal value set of the ID type property is much smaller than the value set of the CDATA type property, the name property is changed to NMTOKEN. This property is limited to the same value as the Name product in the ID type or section XML1.0 2.5. Unfortunately, XHTML's DTD cannot express this limitation. Because of this change, care should be taken when converting existing HTML documents. If the values may change during conversion, the values of these attributes must be unique and valid in the document, and any references to these fragment identifiers, whether internal or external, must be updated.

Finally, note that it is not recommended to use the name attribute in the name applet, form, frame, IMG, and map elements, which will be removed in future versions of XHTML.

C.9 character coding

To specify the character encoding in the document, and use the encoding attribute in the xml declaration, such as. ) and use http-equiv statements in meta (for example). The encoding attribute value of the xml processing instruction comes first.

C.10 Boolean attribute

Some HTML user agents cannot interpret Boolean attributes when they appear in full (non-minimized) form, which is necessary for XML1.0. Note that this issue does not affect the compatibility of user agents with HTML 4. It includes the following properties: compact,nowrap,ismap,declare,noshade,checked,disabled,readonly,multiple,selected,noresize,defer.

C.11 document object Model and XHTML

Document object Model level 1 recommendation [DOM] defines the document object model for XML and HTML 4. The HTML 4 document object model specifies that HTML element and attribute names are returned in uppercase. The XML document object model specifies that element and attribute names are returned in the form they are specified. In XHTML 1.0, elements and attributes are specified in lowercase. This significant difference can be dealt with in two ways:

Applications that access text/html internet media type XHTML documents through DOM can use HTML DOM and can also rely on uppercase element and attribute names returned by these interfaces.

Applications that access text/xml or application/xml internet media type XHTML documents through DOM can also use XML DOM. Element and attribute names are returned in lowercase. Also, some XHTML elements may or may not appear in the object tree because they are optional in the content model (such as the tbody element in table). In HTML 4, some elements can be minimized so that their opening and closing tags are ignored (the SGML feature), so they can occur. But not in XML. XHTML makes elements optional rather than inserting foreign elements by the author of the document. Accordingly, the application needs to adapt to this.

Uses & in attribute values

When the attribute value contains the & symbol, it must be represented by a character entity reference (that is, "&"). For example, when the href attribute of an element points to a CGI script that receives parameters, it must be represented as

C.13 cascading style sheets and XHTML

The cascading stylesheet level 2 recommendation [CSS2] defines the properties of style for analyzing the tree structure of HTML and XML documents. The difficulty in analysis will lead to different visual or auditory effects, depending on the selector used. The following techniques can reduce this impact on the document without modifying the two media types:

XHTML's CSS stylesheet should use lowercase element and attribute names.

In table, the tbody element is inferred by the parser of the HTML user agent, but not by the parser of the XML user agent. So if you want to be referenced in the CSS selector, you should always add the tbody element explicitly.

In the XHTML name and address space, the user agent will recognize the "id" attribute as an attribute of type ID. Therefore, even if the user agent does not recognize DTD, the stylesheet should be able to continue to use "#" to simplify the selector syntax.

In the XHTML name and address space, the user agent will recognize the class attribute, so the stylesheet should continue to be used. " Simplify selector syntax.

CSS defines different consistency standards for HTML and XML documents; HTML rules are used when XHTML documents are expressed in HTML and XML rules are used when XHTML documents are expressed in XML.

At this point, I believe you have a deeper understanding of "what are the basic basic knowledge points of XHTML?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.