How to use DOMDocument to process HTML and XML documents in PHP 07/12 Update SLTechnology News&Howtos

How to use DOMDocument to process HTML and XML documents in PHP

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

How to use DOMDocument to deal with HTML and XML documents in PHP. In view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Use DOMDocument in PHP to process HTML and XML documents

In fact, since PHP5, PHP has provided us with a powerful class for parsing and generating XML-related operations, that is, the DOMDocument class we are going to talk about today. However, I guess most people still like to use regular parsing when crawling web pages. After learning this class today, you can try to parse and analyze in the way that comes with PHP next time.

Parsing HTML// parsing HTML

$baidu = file_get_contents ('https://www.baidu.com');

$doc = new DOMDocument ()

@ $doc- > loadHTML ($baidu)

/ / Baidu output box

$inputSearch = $doc- > getElementById ('kw')

Var_dump ($inputSearch)

/ / object (DOMElement) # 2

/ /....

Echo $inputSearch- > getAttribute ('name'), PHP_EOL; / / wd

/ / get links to all pictures

$allImageLinks = []

$imgs = $doc- > getElementsByTagName ('img')

Foreach ($imgs as $img) {

$allImageLinks [] = $img- > getAttribute ('src')

}

Print_r ($allImageLinks)

/ / Array

/ / (

/ / [0] = > / / www.baidu.com/img/baidu_jgylogo3.gif

/ / [1] = > / / www.baidu.com/img/bd_logo.png

/ / [2] = > https://cache.yisu.com/upload/information/20210523/347/761644.gif

/ /)

/ / analyze links using parse_url

Foreach ($allImageLinks as $link) {

Print_r (parse_url ($link))

}

/ / Array

/ / (

/ / [host] = > www.baidu.com

/ / [path] = > / img/baidu_jgylogo3.gif

/ /)

/ / Array

/ / (

/ / [host] = > www.baidu.com

/ / [path] = > / img/bd_logo.png

/ /)

/ / Array

/ / (

/ / [scheme] = > http

/ / [host] = > s1.bdstatic.com

/ / [path] = > / r/www/cache/static/global/img/gs_237f015b.gif

/ /)

Does it feel so clear and object-oriented? It's like using the ORM library for database operations for the first time. Let's look at it paragraph by paragraph.

$baidu = file_get_contents ('https://www.baidu.com');

$doc = new DOMDocument ()

@ $doc- > loadHTML ($baidu)

The first step is to load the document content, which is easy to understand, directly using the loadHTML () method to load the HTML content. It also provides several other methods, namely: load () loads XML;loadXML () from a file, loads XML;loadHTMLFile () from a string, loads HTML from a file.

/ / Baidu output box

$inputSearch = $doc- > getElementById ('kw')

Var_dump ($inputSearch)

/ / object (DOMElement) # 2

/ /....

Echo $inputSearch- > getAttribute ('name'), PHP_EOL; / / wd

Next we use the same DOM operation API as the front-end JS to manipulate the elements in the HTML. In this example, you get the text box of Baidu and directly use the getElementById () method to get the DOMElement object with id as the specified content. Then you can get its values, attributes, and so on.

/ / get links to all pictures

$allImageLinks = []

$imgs = $doc- > getElementsByTagName ('img')

Foreach ($imgs as $img) {

$allImageLinks [] = $img- > getAttribute ('src')

}

Print_r ($allImageLinks)

/ / Array

/ / (

/ / [0] = > / / www.baidu.com/img/baidu_jgylogo3.gif

/ / [1] = > / / www.baidu.com/img/bd_logo.png

/ / [2] = > https://cache.yisu.com/upload/information/20210523/347/761644.gif

/ /)

/ / analyze links using parse_url

Foreach ($allImageLinks as $link) {

Print_r (parse_url ($link))

}

/ / Array

/ / (

/ / [host] = > www.baidu.com

/ / [path] = > / img/baidu_jgylogo3.gif

/ /)

/ / Array

/ / (

/ / [host] = > www.baidu.com

/ / [path] = > / img/bd_logo.png

/ /)

/ / Array

/ / (

/ / [scheme] = > http

/ / [host] = > s1.bdstatic.com

/ / [path] = > / r/www/cache/static/global/img/gs_237f015b.gif

/ /)

This example is to get links to all the images in the HTML document. Compared with regularization, it is much more convenient, and the code itself is self-explanatory, so there is no need to consider the problem of regular matching failure. With the parse_url () method included in another PHP, it is also very convenient to analyze the link and extract the content you want.

The parsing of XML is similar to that of HTML, which can be easily parsed using the method interface provided by DOMDocument and DOMElement. So we want to generate a standard format of XML? Of course, it is also very simple, there is no need to concatenate strings, use this class to do the same object operation.

Generate an XML// generate an XML document

$xml = new DOMDocument ('1.0,' UTF-8')

$node1 = $xml- > createElement ('First',' This is First Node.')

$node1- > setAttribute ('type',' 1')

$node2 = $xml- > createElement ('Second')

$node2- > setAttribute ('type',' 2')

$node2_child = $xml- > createElement ('Second-Child',' This is Second Node Child.')

$node2- > appendChild ($node2_child)

$xml- > appendChild ($node1)

$xml- > appendChild ($node2)

Print $xml- > saveXML ()

/ *

This is First Node.

This is Second Node Child.

, /

In fact, as long as there is a little bit of the foundation of the front-end JS, it is not difficult to see the meaning of this code. Use the createElement () method to create a DOMElement object, and then you can add properties and content to it. Using the appendChild () method, you can add subordinate nodes to the current DOMElement or DOMDocument. Finally, using saveXML (), you can generate standard XML format content.

This is the answer to the question about how to use DOMDocument to deal with HTML and XML documents in PHP. I hope the above content can be of some help to everyone. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.