In xpath, text () and string (.) What are the differences between 07/09 Update SLTechnology News&Howtos

In xpath, text () and string (.) What are the differences between

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail about text () and string (.) in xpath. What are the differences, the editor thinks it is quite practical, so I share it with you as a reference. I hope you can get something after reading this article.

When we are crawling, we often encounter pages like this:

Hello, Beijing

When you encounter more situations during daily crawling, you can extract it by using xpath ("/ / div/em/text ()").

Now let's consider the following two fetching requirements:

Demand 1: when we want to extract "Beijing", should we use text () or string (.)?

Demand 2: extract "Hello, Beijing"?

Let's initialize the page using the lxml library (if you are using scrapy's xpath selector, you can also follow these steps):

From lxml import etree

With open ('foo.html', 'r') as f:

Content = f.read () .encode ('utf8')

Page = etree.HTML (content)

Here we first take a look at the solution of demand 1, that is, to extract "Beijing":

Re = page.xpath ("/ / div/text ()")

What re gets here is an array:

This is because the tags on the page

There is a newline symbol between and, so using "/ / div/text ()" will ignore you, leaving "\ n", "Beijing\ n" two elements.

Let's take the second element of re and remove the newline character "\ n" at the end (if you are using scrapy's xpath, re may not get an array):

Re = re [1] .strip ()

The re we get at this time is the "Beijing" we need.

Now take a look at the second requirement: extract "Hello, Beijing":

This requires that the text in it should also be extracted, so we use string:

Re = page.xpath ("/ / div") [0] .xpath ("string (.)")

At this point, take a look at the value of re (again if you use scrapy's selector, the result returned by scrapy_selector.xpath ("/ / div") may not be an array, but you just need to get the result and then use .xpath ("string (.)"). That's fine.) :

The result is a whole string of text "\ nHello, Beijing\ n".

It seems that using "string (.)" After that, xpath will extract the contents directly, instead of removing them and dividing them into an array like the "text ()" above. Note here that when using string (), you should use string (.) Put it in a xpath instead of writing "/ / div/string (.)" In this way, otherwise you will not be able to grab it.

Then, again, remove the extra spaces and newline characters on both sides

Re = re.strip ()

At this time, re gets "Hello, Beijing".

Summary: http://www.0510bhyy.com/ of Wuxi abortion Hospital

Through the above experiments, we find that text () in xpath will only take the text of the node in the layer and split it according to the tag of the layer to form a list. While string (.) All the text in and below the current layer node is extracted and placed in a string variable.

Example code:

Test.py:

# coding=utf-8

From lxml import etree

Import sys

Reload (sys)

Sys.setdefaultencoding ('utf-8')

Class Test (object):

Def _ init__ (self):

With open ('foo.html', 'r') as f:

Content = f.read () .encode ('utf8')

Self.page = etree.HTML (content)

Print self.page

Def xpath_text (self):

Re = self.page.xpath ("/ / div/text ()")

Print re

Re = re [1] .strip ()

Print re

Return re

Def xpath_string (self):

Re = self.page.xpath ("/ / div") [0] .xpath ("string (.)")

Print re

# replacing newline characters, etc.

Re = re.strip (re)

Print re

Return re

If _ name__ = = "_ _ main__":

T = Test ()

Assert t.xpath_text () = = u ", Beijing"

Assert t.xpath_string () = u "Hello, Beijing"

Foo.html:

Hello, Beijing

This is the end of the article on "what is the difference between text () and string (.) in xpath". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.