In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you an analysis of how to carry out the JSOUP crawler. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
Rule writing instruction document
The universal selector (*) is implied in that no element selector is provided (i.e. * .header and .header is equivalent).
Example of style matching
* any element *
Tag gets the element div through the Tga tag
Ns | Eelements of type E in the namespace nsfb | name finds elements
# idelements with attribute ID of "id" div#wrap, # logo
.classelements with a class name of "class" div.left, .result
[attr] elements with an attribute named "attr" (with any value) a [href], [title]
[^ attrPrefix] elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets [^ data -], div [^ data -]
[attr=val] elements with an attribute named "attr", and value equal to "val" img [width=500], a [rel=nofollow]
[attr= "val"] elements with an attribute named "attr", and value equal to "val" span [hello= "Cleveland"] [goodbye= "Columbus"], a [rel= "nofollow"]
[attra ^ = valPrefix] elements with an attribute named "attr", and value starting with "valPrefix" a [href ^ = http:]
[attr$=valSuffix] elements with an attribute named "attr", and value ending with "valSuffix" img [src$=.png]
[attr*=valContaining] elements with an attribute named "attr", and value containing "valContaining" a [href*=/search/]
[attr~=regex] elements with an attribute named "attr", and value matching the regular expressionimg [src~= (? I)\\. (png | jpe?g)]
The above may be combined in any orderdiv.header [title]
Combiner
E Fan F element descended from an E elementdiv a, .logo h2
E > Fan F direct child of Eol > li
E + Fan F element immediately preceded by sibling Eli + li, div.head + div
E ~ Fan F element preceded by sibling Eh2 ~ p
E, F, Gall matching elements E, F, or Ga [href], div, h4
Pseudo selector
: lt (n) elements whose sibling index is less than ntd:lt (3) finds the first 3 cells of each row
: gt (n) elements whose sibling index is greater than ntd:gt (1) finds cells after skipping the first two
: eq (n) elements whose sibling index is equal to ntd:eq (0) finds the first cell of each row
: has (selector) elements that contains at least one element matching the selectordiv:has (p) finds divs that contain p elements
Not (selector) elements that do not match the selector. See also Elements.not (String) div:not (.logo) finds all divs that do not have the "logo" class.
Div:not (: has (div)) finds divs that do not contain divs.
Contains (text) elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants.p:contains (jsoup) finds p elements containing the text "jsoup".
Matches (regex) elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants.td:matches (\\ d +) finds table cells containing digits. Div:matches ((?) login) finds divs containing the text, case insensitively.
ContainsOwn (text) elements that directly contain the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants.p:containsOwn (jsoup) finds p elements with own text "jsoup".
MatchesOwn (regex) elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants.td:matchesOwn (\\ d +) finds table cells directly containing digits. Div:matchesOwn ((?) login) finds divs containing the text, case insensitively.
The above may be combined in any order and with other selectors.light:contains (name): eq (0)
Structural pseudo selector
: rootThe element that is the root of the document. In HTML, this is the html element:root
: nth-child (an+b)
Elements that have an+b-1 siblings before it in the document tree, for any positive integer or zero value of n, and has a parent element. For values of an and b greater than zero, this effectively divides the element's children into groups of an elements (the last group taking the remainder), and selecting the bth element of each group. For example, this allows the selectors to address every other row in a table, and could be used to alternate the color of paragraph text in a cycle of four. The an and b values must be integers (positive, negative, or zero). The index of the first child of an element is 1.
In addition to this,: nth-child () can take odd and even as arguments instead. Odd has the same signification as 2n+1, and even has the same signification as 2n.tr:nth-child (2n+1) finds every odd row of a table. Nth-child (10n-1) the 9th, 19th, 29th, etc, element. Li:nth-child (5) the 5h li
Nth-last-child (an+b) elements that have an+b-1 siblings after it in the document tree. Otherwise like: nth-child () tr:nth-last-child (- nasty 2) the last two rows of a table
: nth-of-type (an+b) pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent elementimg:nth-of-type (2n+1)
: nth-last-of-type (an+b) pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent elementimg:nth-last-of-type (2n+1)
: first-childelements that are the first child of some other element.div > p:first-child
: last-childelements that are the last child of some other element.ol > li:last-child
: first-of-typeelements that are the first sibling of its type in the list of children of its parent elementdl dt:first-of-type
: last-of-typeelements that are the last sibling of its type in the list of children of its parent elementtr > td:last-of-type
: only-childelements that have a parent element and whose parent element hasve no other element children
: only-of-typean element that has a parent element and whose parent element has no other element children with the same expanded element name
: emptyelements that have no children at all
The above is the editor for you to share how to carry out the analysis of JSOUP crawler, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.