An example Analysis of XML markup semantics 04/09 Update SLTechnology News&Howtos

An example Analysis of XML markup semantics

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly shows you the "sample analysis of XML markup semantics", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let the editor lead you to study and study the "sample analysis of XML markup semantics".

1 introduction

In recent years, with the development of digital publishing, the bursting of World wide Web applications and the rapid development of e-commerce, our daily society, business, culture, life and other aspects have begun to apply Flash standardized Universal markup language (Standard Generalized Markup Language,SGML) and Extensible markup language (Extensible Markup Language,XML) text marking system. SGML/XML is a machine-readable technology that defines descriptive markup languages. Apart from some parts that require special treatment, this language can clearly define the structure of the document and its potential meaning. SGML/XML is developing rapidly, and this technology is widely used to support high-performance document interoperability processing and publishing.

This good wish has been partially realized, and the advantages of SGML/XML exceed people's expectations, but the SGML/XML document system still needs to be improved in terms of functionality, interoperability, diversity and accessibility. If we do not seize this opportunity, the consequences will be very serious: the business community has already incurred high financial costs and lost many opportunities; it may also lead to some disasters in critical security applications; for people with disabilities, this will prevent them from having equal access to contemporary social, cultural and commercial benefits. In addition, some long-standing problems continue to remind us that the best digital document model is still flawed, at least imperfect.

The root of these problems is that although SGML/XML can provide a meaningful structure for a document, SGML/XML cannot represent the basic semantic relationship between document components and topics in a way that can be handled by a system's machine. SGML/XML supports the explanation of machine-readable "grammar", but it does not provide a mechanism to explain the semantic meaning of a certain grammar, so there is no way to formalize the potential meaning of a SGML/XML word. Using the current SGML/XML can not even express the very simple basic semantic facts about the document tagging system, these facts are usually designed in advance by markup language designers, but the specific implementation still depends on markup language users and software.

This lack of expressive function makes SGML/XML users have to guess the semantic relationships that markup language designers think of but do not formally express. Content developers must guess the designer's intentions, rely on these inferences when coding content, and cannot clearly express their inferences and intentions to others or to applications that deal with encoded content. Software designers also need to guess the possible intentions of markup language designers and design this conjecture into software tools and application systems. Sometimes second-order conjecture is necessary: the software designer guesses the content developer's inference about the intention of the markup language designer.

It is clear that these guesses are incomplete, fallible and unproven. Moreover, the production and implementation processes are time-consuming and laborious, and the functionality and interoperability are poor. Equipping general natural language documents with a SGML/XML manual does not solve this problem perfectly. Of course, ordinary natural language documents can provide some hints for content providers and software engineers, but there are currently no general rules for SGML/XML documents. In any case, ordinary natural language documents are not machine-readable, which is the problem with the SGML/XML markup system we are talking about.

The idea of machine-processable semantic description related to SGML and XML has not been formed, which is the root of the current engineering problems and future development obstacles, and there are few related semantic studies, but many scholars have begun to pay attention to this problem. The work on W3CSchema is relevant, but only covers a small part of the problem (such as data types). W3C's "semantic Web" project is also related to this, but it is to develop a general XML-based knowledge representation technology. Our research focuses on the semantics of document markup, which is hidden in the actual document processing system. People may say that the essence of the semantic Web is to design semantic tags. However, in this paper, we think that to solve the above problems, we must deeply consider the essential meaning of tags.

Next, this paper first explains the meaning of tags from the historical background (tags play an interesting role in the development of text processing methods). Secondly, it describes in detail what factors generate formal semantic tagging requirements and what factors determine semantic requirements. Finally, the paper briefly introduces a research project-BECHAMEL markup semantics project, which is being implemented by many institutions, which is trying to solve the semantic problem of markup.

2 Historical background

Document "tags" can probably be counted as part of the communication system, including early writing, copying, publishing and printing, but with the development of digital text processing and typesetting, the use of tags has become self-conscious and common. at the same time, it has also become an important innovation field in system development. The period from 1960s to 1980s is a period of comprehensive and systematic development of document marking system, and the key work is to improve the effectiveness and functionality of digital typesetting and text processing. In the early 1980s, people are still committed to studying the theoretical framework of tags, and using this framework to support the development of high-performance systems. Some results in this area have been published, but most of the results are only recorded in working documents and various standard forms of products.

One view that emerged at this stage is that documents, as an intellectual achievement, are more suitable to be abstracted as an ordered hierarchical structure model of a series of objects (such as chapters, paragraphs, formulas, etc.), rather than an one-dimensional text character flow model. Character streams are often mixed with a large number of codes that define formats, structures that describe design layouts (such as page numbers, columns, printing lines), pixel value matrices, and other potential expressions in different document processing and storage systems. The ordered hierarchical structure model summarizes two kinds of annotations with essential differences, namely, the annotations that identify and edit text objects (titles, chapters, etc.) and the annotations that explain the layout requirements. The application of the former has achieved some results. Related document elements such as headings, chapters, paragraphs, equations, and citations can be clearly marked with delimited tags, and then processed indirectly through rules mapped to element types. This separation of content and form can achieve indirectness and abstraction at the basic level in a common way of combinatorial economy. This form of separation has great and diverse practical value in all aspects of document processing, and more importantly, it seems to illustrate the question of what a document is. The descriptive markup used to achieve this function not only marks the scope of the element, but also carries the meaning that the document model wants to reveal (for example, this text is a chapter).

In the early 1980s, the National Bureau of Standardization (ANSI/ISO) issued an influential tag meta-syntax for SGML documents, and combed the previous theoretical and analytical work on markup and document structure. SGML provides a machine-readable form for defining descriptive markup languages. As a meta-syntax, SGML does not define a markup language, but rather details the machine-readable techniques in developing markup languages. The core of this definition is a formal expression mechanism similar to the Bacos-Noel paradigm (Backus-Naur Form,BNF). This mechanism carries rules for defining typed attributes and their values, as well as other designs for further abstraction and indirectness (see the summary of document type definition (Document Type Definitions,DTDs) and Barkos-Norr paradigm similarity in the comments). Structurally, a SGML document is a tree with ordered branches and tagged nodes, and it is a formal product of its corresponding DTD.

After years of analysis and practice, the basic concept behind SGML has been well known. Taking advantage of the industry-level standards at the meta-grammar level and the localization innovation at the thesaurus level, the unique mechanisms of SGML (meta-syntax of the Bakos-Norr paradigm, typed attribute / attribute value pairs, entity references, etc.) have been efficiently implemented in applications and tools. SGML markup language itself seems to support and optimize ideal workflows for the design, implementation, and utilization of document systems at the same time. From the mid-1980s to the early 1990s, a large number of SGML-based tagging systems were developed.

Although the development of SGML has received a lot of attention, its ideas are good, and have been successfully implemented in many areas, few people used it in the first decade. There are many factors leading to this result, but the most important thing is that SGML itself is too complex, especially SGML contains many complex optional attributes, and the corresponding software may not need to implement it at all, resulting in the slow development of SGML software. To make matters worse, further analysis is impossible if the document is not validated by DTD. Abbreviation control means that element boundaries cannot be determined without considering the document syntax. In addition, SGML also contains some other attributes, which will make the existing parsing tools not suitable for formal grammar and unable to perform efficient parsing.

In network publishing and communication, SGML system can be applied to HTML (Hypertext markup language). The original HTML version was loosely defined and lacked formal syntax. Later, people became interested in HTML's SGMLDTD, and it turned out to be difficult to design DTD for what has become the "right" practice. More importantly, in the original HTML specification, suppliers randomly added procedural tags (e.g.) to key descriptive tags (e.g.), resulting in developers and users ignoring the difference between descriptive tags and procedural tags at the same time. The descriptive part of HTML does not even well reflect the hierarchical structure of the document, and the specification does not provide a stylesheet language to support indirectness. Finally, SGML's mechanism cannot extend and replace the set of elements, and HTML documents do not seem to be processed by a general-purpose SGML processor (which allows extension and replacement of DTDs), but can only be processed by a specific HTML formatter, in conjunction with the hard-coded formatting rules in the processor to process HTML tags.

The subsequent development of HTML can be seen as a process of transformation from the original loose HTML language to SGML language sequence. This transformation can be achieved if there are sufficient time and resources to apply those mature document system design rules. However, the newly formed W3C organization is under great pressure to adopt new collections of elements and to apply SGML to Web. The deficiency of SGML makes it difficult to give full play to the advantages of SGML and descriptive tags on Web. The main problems are that there are a large number of multi-selection features in SGML, complex formal syntax, and the need to rely on DTD to determine elements.

In order to ensure that HTML and other related technologies can make full use of the advantages of meta-syntax, users can more easily develop and share new domain-specific elements, documents can be parsed into element trees without DTD index, SGML tools and applications can develop harmoniously, and W3C has created a subset of SGML, hoping to provide a relatively simple standard (without selection) and some relatively simple syntax. And a way to handle unverified document formats without DTD, so XML arises at the historic moment. After a year and a half of development, XML was officially launched in 1998 as a recommended standard by the W3C.

Since 1998, the novel XML markup language has grown explosively, and this rapid development continues to this day. The reason for this explosive development is:

(1) the need for a new tagging system in a specific field. With the growth of online electronic publishing applications in science, medicine, business, law, engineering and specific fields of these large disciplines, new tagging systems need to be developed.

(2) reduce the cost and complexity of developing new tools and their applications. Parsing XML is easier than SGML.

(3) XML tags support publishing-related information processing and dissemination, as well as publishing-independent applications.

Fortunately, we have finally developed effective and easy-to-implement technologies to create high-performance markup languages, digital documents, and document processing and publishing systems integrated with other information management programs. In particular, it should be pointed out that the need for deep processing of the underlying intention in the document structure not only promotes the emergence of new system functions, but also puts forward the need for automatic information processing, at least without a large number of human intervention.

3 questions

Unfortunately, some of the experience and feedback we have made us realize that our understanding of descriptive markers in the sense of communication, and that current technologies simply do not meet our expectations.

In the 1980s, the systematization and systematization of document markup mainly focused on three aspects.

(1) the conceptualization of general document model.

(2) the development of formal specifications, vocabulary and grammar related technologies related to document markup language. The document markup language can define specific document classes and instantiate the model.

(3) Development of markup languages (such as CALS, AAP, TEI, HTML, etc.).

Using descriptive markup language to identify and label the logical parts of a document can clearly convey "meaning" that previously could only exist in a potential form. At least the meaning of procedural tags can be very clear, clear, and applicable to machine processing.

Many people refer to XML documents as "self-describing data". Although there were some different voices in the early days (see Mamrak and, most importantly, the views of Raymond and Tompa), the enthusiasm of document researchers faded in the early stages of the development of descriptive markup, and most people did not seem to feel the need to explore more laborious ways to represent documents. The well-defined SGML markup language expresses the potential meaning of the document structure and makes it fully and effectively used in machine processing. One of the authors of this article once wrote such a sentence, "finally, we should know clearly that for competing marking systems, descriptive tagging is not only the best method, but also the best method that people can think of."

The experience of the 1990s shows that this confidence is a bit blind. From a practical point of view, things have improved a lot today, but repeated failures in interoperability and functionality show that SGML/XML has not really succeeded in providing potential meaning to documents and computer-manageable forms. In SGML/XMLDTD, the accuracy of elements and attributes does not match that of other similar document type definitions, part of the content is not formal, and there is no only definite answer to what needs to be inferred. But qualitatively speaking, people's understanding of documents is different from that before SGML, when people's understanding of the meaning of document structure comes from the reflection of those relatively obscure clues.

The essential attribute of DTD explains why this happens: DTD only displays a vocabulary and its corresponding grammar, and does not represent the semantic relationship between words. It is not up to DTD to decide whether the "title" element in the general sense is expressed or whether it is similar to what we usually call the concept of "title". DTD can only indicate that there is a specific element whose label is the string "title", which may be used with other elements, all of which are defined in the same way. Therefore, content developers and software designers who use markup languages to tag documents need to simply infer the meaning of the tag representation from the natural language associated with "title" in the text and how it is used in context. Perhaps the original language designer could not systematically and strictly define the meaning.

Of course, this exaggerates the actual situation. In a sense, the meaning of each tag can be expressed clearly in the pure natural language documents provided by markup language developers. However, even the best-tagged DTD documents in the industrial and academic fields do not fundamentally solve the problem.

When designing a software that reflects the semantic relationship in a markup language, the language designer must be able to clearly express the relationship between the parts of the document; then the software engineer must be able to (search, find, open) use the markup language document and design applications to show its advantages. Neither of these steps can be verified by machine, and the credibility can not be guaranteed. If human participation is required, it will hinder the development of high-performance network document processing and publishing systems. So we need a mechanism to ensure that markup language designers can specify semantic relationships in detail and formally, can also be read and processed by applications, and complete self-configuration without manual participation one by one.

Let's take a look at some specific semantic relationships. These relationships have more or less potential practical value, but at present they cannot be used conveniently and systematically because there is no standard machine-manageable form of expression. In fact, many relationships are so critical that software designers often infer their existence in documents in specific ways and build specific systems to take advantage of them.

Class relationship. SGML/XML does not contain a general structure to express the hierarchy or class membership of a class in an element, feature, or eigenvalue. Class is the most basic and practical module in the mainstream structure of software engineering. We cannot say that a paragraph is a structural element (isa relationship) or that all structural elements are editable elements (ako relationship). Two basic SGML/XML designs can sometimes be classified according to attributes / values (specifically, you can use "type" and "class" attributes). This classification technology is not mature enough, and SGML and XML do not provide a better mechanism to control and restrict its use. In practical applications, many document type designers use the hierarchical structure of classes to design. XML Schema provides a clear declaration of class relationships, but it does not semantically explain how these complex types are different from other complex types.

Inheritance relationship. In many markup languages, such as TEI and HTML4.0, some attributes are inherited by the containing element, and in some cases the included text content inherits these attributes. For example, if the attribute / value symbol of an element is "lang=" de ", this indicates that the text is German, which means that all its child element attributes are German. However, DTD does not provide formal instructions to specify which features can be inherited. Moreover, such an inheritance relationship is not fixed and sometimes changes because of the secondary definition of the containing element. There are also many ways of inheritance, some involving the attributes of the element, some involving the attributes of the attributes, and others involving the text and the content of the element. For example, if the mark indicates that a sentence is German, this means that all words in the sentence (except in special cases) are German. Similarly, all words and phrases marked with deletion attributes are deleted, and those marked with key attributes are emphasized, marking a part of the content as a paragraph means that all words (or elements) in this part of the content belong to this paragraph. You cannot specify which properties DTD inherits, nor can it specify its inheritance logic, including rule errors. Software designers often reason about these relationships in a particular markup language and then implement them in the tools and applications they develop.

Contextual relationship and citation relationship. In many markup languages, even if an element has a fixed meaning to mark the same element type, the element may have different meanings depending on the context. For example, some text is marked "" and its exact meaning depends on the structural location of the text. "" The "" under "refers to the title of the object", and "" under "" refers to the title of the section of this chapter. The criteria for determining what title it is does not exist. The case where the "" element is included in the reference is more complicated, and the title here is an entity outside the article. Relationships like this cannot be expressed in DTD, but it can be inferred by software designers that this is necessary to meet the efficient and automated processing of text. (if each meaning is represented by a different universal identifier, it can only solve a small part of the problem. Because it is still necessary to clarify the binary characteristics of the attribute and provide a parsable expression to locate the object to which the attribute is applied.

The essential change in the reference. There is a similar but more ambiguous situation in which the same object has multiple attributes, each of which points to the same reference in the same format, but must be carefully interpreted to ensure the clarity of its reference. For example, an instance of a particular element has the following three characteristics: it is a theorem, it is written in German, and it is illegible. Does such a simple and straightforward predicate description represent the same thing (or element instance)? Does this mean that knowledge is robust enough? In fact, it means that these abstract sentences are written in German, the propositions they express are theorems, and their concrete expressions are ambiguous. Strictly speaking, nothing has all these features.

Complete and partial synonyms. The complete or partial synonym of markup language is a very important semantic relationship, and the lack of the mechanism used to describe this synonym causes serious heterogeneity. Using a single markup language may eliminate complete synonyms, but with the increase in the variety of markup languages, complete and partial synonyms are still difficult to express and important between markup languages. At present, we do not have a suitable computer-manageable formal method to record the synonyms of elements, attributes, and attribute values in different markup languages. The construction form (see below) can record most complete synonyms, but some synonyms are difficult to record, and partial synonyms are more common in practical applications. Some synonymous problems represented by class inclusion relations still have a long way to go in solving heterogeneity problems.

4 BECHAMEL Plan

The BECHAMEL markup semantics project originated in the late 1990s and was carried out by researchers from the Department of Culture, the Department of language and Information Technology, the University of Bergen Research Foundation (Bergen University Research Foundation), and the Electronic Publishing Research Group of the Graduate School of Library and Information at the University of Illinois at Urbana-Champaign. The name of this plan is formed by the abbreviation of the city name of all collaborators (Bergen, Norway; Champaign, Illinois; Esp ñ ola, NewMexico).

The research objectives of the BECHAMEL project are as follows.

(1) define the representation and inference problems that are closely related to the semantics of document markup, and develop a problem classification and description that must be solved or faced by all semantic-aware document processing systems.

(2) study the attributes and semantic relations of common markup languages, and evaluate the applicability of normative knowledge representation techniques, such as semantic network, framework, logic, formal grammar and production rules. In order to model these relationships and attributes, we should also consider their adequacy, elegance, simplicity and computational efficiency in knowledge representation.

(3) develop and test a formal, machine-readable presentation framework that needs to be able to express the semantics of markup languages.

(4) to explore the application forms of semantic representation technology, such as supporting transcoding, information retrieval, availability enhancement and so on. At present, our focus is to support the semantic reasoning of document database cases, because we believe that this is the best focus for the application of knowledge representation technology.

(5) cooperate with the digital library content coding plan in the field of human computing research, and cooperate with software tool developers to test the semantic representation scheme on a large scale.

The early Prolog experimental platform has been developed into a prototype platform for knowledge representation, which is used to represent facts and reasoning rules in structural documents. The system allows analysts to specify certain facts, such as universal identifiers and attribute values, and separate them from inferential facts related to semantic entities and attributes.

The system also provides an abstraction layer so that the meaning of tags can be clearly expressed in machine-readable and executable forms. On this basis, it can be inferred from the components of the document, including those vague structures, such as hierarchical overlapping components. We have developed a set of predicates that can mimic the methods used for node hierarchy navigation in the W3C document object model, and can retrieve various attribute values and related information in the document type definition. In this way, we can clearly distinguish the syntax information parsed by the parser and the document semantics expressed by the parser.

The preliminary results show the complexity of semantic reasoning recognition and the complexity of contextual uncertain understanding. This prototype reasoning system proves that automatic reasoning about tags is feasible, and the rules of Prolog can deal with complex situations such as non-monotonicity and situational fuzziness. Citations can be used for further research.

Semantic modeling of 5 tags

The semantics of document markup is an abstract structure, attribute and relationship that can be understood by markup language users. Markup and its syntax imply this semantic clue. With the help of knowledge representation technology, the semantics of tags can construct the corresponding computational model by defining the structure, relations and attributes.

Refer to the following snippet of XML markup document

Be familiar with the structure

Readers of chemical tags naturally know that the tag P in the document element represents a paragraph, which has a title, and the paragraph content after the heading element forms the text body, which starts after the heading element and ends before the end tag of the paragraph. The meaning and usage of tags are not clear at a glance, so authors or readers can refer to the documentation of the tag collection.

The obvious markings are designed for the convenience of human readers. These tags cannot be extracted from the data structure with the help of a document parser. As shown in figure 1, the parse tree (used by stylesheet programmers) shows the header, the citation, and the text before and after the citation, each of which is a separate child node of the paragraph, but the parse tree cannot show the following characteristics: the header is an attribute of the entire paragraph, the text is two parts of the content structure, and the citation is embedded inside the text.

In fact, the data structure itself has no distinction between paragraphs and citations or anything related to it. A data structure is only a graphical structure of associated information, like a universal identifier with a "paragraph" value. The program should be able to infer the consistency between the meaning of the document and the use of tags and take advantage of this knowledge when the tree structure is transformed from one form to another. However, this transformation (for example, through XSLT, DSSSL, or a programming language like C++) relies on semantic reasoning rather than explicit coding

Figure 2 shows how to enrich and enhance the syntax tree by using semantic knowledge. The use of knowledge representation technology can encode the relationship between the whole and the part at a higher level, which is more suitable for computer processing. This figure shows a traditional semantic network representation, of course, other methods are also under development, including frame representation, rule representation, formal syntax and logic-based representation. The development of the semantic Web project (part 8 of this article) can even provide an appropriate representation for the markup language itself. The crux of the problem is to establish a hierarchical system of abstract concepts, associations, and constraints that cannot be modeled and implemented by traditional XML/SGML parsers.

Coding knowledge in machine-readable files (such as DTD or grammatical structures) can be used to validate the semantic constraints of documents, providing applications with a more powerful document model. These more expressive representations provide strong support for the design and implementation of a better document processing system.

6 apply

In recent years, with the development of many new technologies, conventional structured tagging is becoming more and more popular. These technologies mainly emphasize the following aspects in information management.

Conversion and association. The most common task for SGML/XML developers is to design transformations that move from one application syntax to another. This is done to create a new file representation or to facilitate its storage in the database. Sometimes developers need to integrate or adjust a large collection of digital documents, each represented by a markup language that cannot be interoperable. Regardless of the scope of the transformation, the conventional solution is to use a conversion program language that plays a direct role in the syntax parsing tree. The tree structure generated in the source file analysis is transformed into a tree structure instance of the target language. The converted tree is serialized into new document instances, graphics, or audio.

An island of information. This problem is similar to the above conversion problem, but the goal is not to convert one form of document into another, but to allow distributed storage of documents or document fragments to provide a common transparent access interface to system users. Although it is not necessary to convert a document word for word from one markup language to another, the system must be able to ensure that the content of the document appears to be seamlessly integrated, although the coding of the document may vary widely.

Availability. Authoring tools have gradually embraced structured markup, which has become a boon for visually impaired users to access digital documents. Declarative tags enable people to read with the help of screen readers or Braille monitors and infer with the help of mnemonics rather than using graphical cues. However, at present, such applications need to rely on users' own capabilities or interface software, structural inferences based on independent tag content or syntax. As described in the tag set document, tag syntax constraints and the meaning and use of tags are strictly dependent on the credibility of the document author. Unfortunately, authors often misuse tags, and the worst example is the use of "header" tags on web pages to mark specific layouts.

Safe handling. Part of the motivation for the development of more expressive markup schema languages, such as W3C's XML Schema language, is the recognition that the consequences of markup errors, misuse, and abuse are far more serious than poor formatted output. Declarative tags are used not only in e-commerce but also in the field of security information, such as medical records and the aviation industry. Developers in these areas should not only ensure that the syntax structure of digital documents is standardized, but also that they comply with certain security protocols to ensure the safe processing, storage, transmission and presentation of documents.

7 advantages of markup semantics

The current research results of the BECHAMEL project show that markup semantics can solve the above problems in the following ways.

A declarative, machine-readable semantic description. As far as the actual situation is concerned, structured markup language designers express the meaning of tags in natural language text and make clear the appropriate way to use them. The formal markup semantic system enables the relationship between ontologies to be clearly expressed by computer programs and automatically processed.

Verification of the hypothesis. In a document environment without a formal set of tags, a system with the ability to interpret tag semantics provides an environment for testing guesses and verifying hypotheses. In this environment, an undisclosed markup language user speculates about the attributes and rules that he believes continue to apply in the document database. The document processing software then retrieves document elements that are compatible or incompatible with hypothetical rules.

The enhancement of semantic constraints. Parsers that support validation can not only complete syntax verification like conventional semantic parsers, but also verify this guess in the process of discovering or writing semantics, which can also strengthen semantic constraints. This operation is consistent with hypothesis verification, but in this case, the semantic constraints are known and standardized.

Optimized and more expressive APIs. Markup semantics are used when transforming or representing digital documents using SGML and XML applications. But higher-level properties and associations are displayed only when the program is executed. Formal and machine-readable semantics will enrich the interface of application programs and speed up software design. With the development and change of markup language, these software can be maintained more conveniently and securely.

8 related work

In response to the above challenges and problems, there are many other document processing technologies, standards, and research projects. Next, let's comb through the existing ideas that try to solve these problems.

Semantic Web. The semantic Web refers to a lot of interrelated research and standardization work, just like some current ideas about tagging and knowledge representation technologies. The core is the W3C resource description framework, of course, including other technologies, such as ISO topic map technology. The semantic Web has a wide range and ambitious goals, which aims to use general knowledge representation technology to improve markup language, so as to "promote the all-round development of human knowledge". The research and standardization of the semantic Web is different from the current idea: it is not to describe the specific domain semantically, but to realize the semantic annotation of knowledge in all fields. The goal of the current research is to focus on "document markup semantics" rather than "universal semantic markup". The progress of semantic Web technology will make it possible for us to use semantic Web markup language to encode the semantics of tags.

W3C document object model. Document object model is an application program interface, which is a hierarchical data structure generated by analyzing XML documents. People want to design a system that can provide various interfaces for tag semantics, similar to the markup syntax-related forms provided by DOM, and eventually form a "semantic DOM", which complements the syntax DOM of W3C.

Schema of W3C. XML Schema is a XML-based language that can replace traditional DTDs and is used to constrain XML documents. The limitations of DTDs have driven the development of the language, and these limitations are similar to the problems we face in the BECHAMEL project. Schema allows document class designers to define complex data types, just as they do in high-level programming languages. However, in order to encode all the relationships and constraints in the tag set documentation, we also need a more powerful expression than the current XML Schema. The architectural form of hypermedia / time-based structure language (Hypermedia/Time based Structuring Language,HyTime). Widely adaptable architectural techniques come from the understanding that different markup language applications are often encoded with structures of different styles but semantically equivalent. The schema form allows document class designers to map their own specific element instances to more generic schema instances, which are easier to map between different applications. These mappings do represent the constraint form of semantic knowledge and help to solve the above transformation and integration challenges. To some extent, the BECHAMEL plan is to build a model that expresses more semantic relationships than the architectural form.

The above is all the content of the article "sample Analysis of XML markup semantics". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.