In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Today, the editor will share with you what are the relevant knowledge points about the common questions of c #, the content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article, let's take a look at it.
There are about 28 classes in the Html Agility Pack source code. In fact, it is not a very complex class library, but its function is not weak. It has provided enough powerful support for parsing DOM, which is comparable to jQuery operation DOM:)
In fact, not many basic classes are most commonly used in Html Agility Pack. For parsing DOM, there are only two commonly used classes, HtmlDocument and HtmlNode, as well as a HtmlNodeCollection collection class.
Of course, before parsing DOM, you need to load the original html file or the string of html. The HtmlDocument class encapsulates the methods that support this function. Here is how to load html.
The HtmlDocument class defines several overloaded Load methods to load html in different ways, which are mainly divided into two types: one is to load html from Stream, and the other is to load html from the physical path, as shown below:
Method: public void Load (TextReader reader)
Description: loads Html from a specified TextReader object
Example:
HtmlDcument doc?=new?HtmlDocument ()
StreamReader sr?=?File.OpenText ("file path")
Doc.Load (sr)
Based on the above method, several different overloading methods are derived.
The main Stream objects that are specified are:
(1) public void Load (Stream stream)? / load html from the specified Stream object
(2) public void Load (Stream stream, bool detectEncodingFromByteOrderMarks)? / specify whether to parse the encoding format from the sequential byte stream.
(3) public void Load (Stream stream, Encoding encoding)? / / specify the encoding format
(4) public void Load (Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks)
(5) public void Load (Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)
Mainly based on the specified physical path are:
(1) public void Load (string path)
(2) public void Load (string path, bool detectEncodingFromByteOrderMarks)? / specify whether to parse the encoding format from the sequential byte stream.
(3) public void Load (string path, Encoding encoding)? / / specify the encoding format
(4) public void Load (string path, Encoding encoding, bool detectEncodingFromByteOrderMarks)
(5) public void Load (string path, Encoding encoding, bool detectEncodingFromByteOrderMarks, int buffersize)
Loading Html directly from the html string is also defined in the HtmlDocument class, as follows:
Method: public void LoadHtml (string html)
Description: loads html from the specified html string
Example:
HtmlDocument doc?=new?HtmlDocument ()
String?html?= "Hello World!"
Doc.LoadHtml (html)
The HtmlDocument class also has other definitions of writing DOM methods, which will not be described in detail here, but will be reserved for the future introduction of Html Agility Pack to write DOM chapters, with emphasis on the details of Html Agility pack parsing DOM.
After you load the html through HtmlDocument, what's next? Parsing html, of course, and parsing DOM requires mentioning the HtmlNode class. The HtmlDocument class returns a global HtmlNode object after the current Html parsing by the attribute DocumentNode property; if you want to get the HtmlNode of an element, you can get it through the GetElementbyId (string Id) method of the HtmlDocument class and return the HtmlNode object that specifies a html element. How can I access DOM through the HtmlNode object? Get to know its function before introducing it.
The HtmlNode class implements the IXPathNavigable interface, which shows that it can query DOM through xpath. If you know about the XmlDocument class under the System.Xml namespace, especially those who have used the SelectNodes () and SelectSingleNode () methods, you will be familiar with using the HtmlNode class. In fact, Html Agility Pack parses html into xml document format internally, so it supports some common query methods in xml. The following is a brief description of some of the major commonly used members of HtmlNode.
1) Attributes attribute
Gets a collection of attributes of the current Html element and returns a HtmlAttributeCollection object. Such as a div element, it may define some attributes, such as: * *, then the HtmlAttributeCollection returned by Attributes contains the information of "id,name,class,title". The HtmlAttributeCollection class is a collection class that implements the interface IList, so each member can be accessed in the following code.
HtmlNode node?=?doc.GetElementbyId ("title")
String?titleValue?=?node.Attributes ["title"] .Value
Or
Foreach (HtmlAttribute attr?in?node.Attributes)
{
Console.WriteLine ("{0} = {1}", attr.Name,attr.Value)
}
When getting an attribute value, Attributes ["name"] returns a null value if a property name does not exist.
2) FirstChild,LastChild,ChildNodes,ParentNode attribute
The FirstChild property: returns the first node of all child nodes, as in the following code:
String?html?= "Hello Worldinside div"
FirstChild returns "Hello World!" The node of.
LastChild attribute: returns the last node of all child nodes, or the "inner div" node in the case of html above.
ChildNodes attribute: returns a collection of all direct generation child nodes of the current node, excluding cross-generation child nodes. Take the above html as an example, return "Hello World!" And "inner div".
ParentNode property: returns the immediate parent of the current node.
3) get the Html source code and text
The HtmlNode class designs the OuterHtml attribute and the InnerHtml attribute to get the Html source code of the current node. The difference between the two is that the OuterHtml property returns all the Html code, including the Html code of the current node, while the InnerHtml property returns all the Html code of the child node in the current node. As follows:
To get the text value of a node, get it through the InnerText attribute, which filters out all Html tag codes and returns only the text value, as shown below:
Console.WriteLine (node.InnerText); /? return "Hello World!"
The HtmlNode class provides enough methods to query the child nodes (elements) under the current node, including the methods to query the parent nodes (elements) of the current node. The main methods and instructions are listed below.
A series of methods to get the parent node:
1) public IEnumerable Ancestors ()
Gets a list of the parents of the current node (excluding itself).
2) public IEnumerable Ancestors (string name)
Gets a list of parent nodes (excluding itself) by specifying a name.
3) public IEnumerable AncestorsAndSelf ()
Gets a list of the parents of the current node (including itself).
4) public IEnumerable AncestorsAndSelf (string name)
Gets a list of parent nodes (including itself) by specifying a name.
A series of methods for getting child nodes:
1) public IEnumerable DescendantNodes ()
Gets a list of all child nodes under the current node, including the children of the child node (excluding itself).
2) public IEnumerable DescendantNodesAndSelf ()
Gets a list of all child nodes under the current node, including the children of the child node (including itself).
3) public IEnumerable Descendants ()
Gets a list of direct child nodes under the current node (excluding itself).
4) public IEnumerable DescendantsAndSelf ()
Gets a list of direct child nodes under the current node (including itself).
5) public IEnumerable Descendants (string name)
Gets a list of child nodes under the current node with the specified name.
6) public IEnumerable DescendantsAndSelf (string name)
Gets a list (including itself) of child nodes with the specified name under the current node.
7) public HtmlNode Element (string name)
Gets the node element of the first immediate child node that matches the specified name.
8) public IEnumerable Elements (string name)
Gets a list of nodes for all direct child nodes that match the specified name.
9) public HtmlNodeCollection SelectNodes (string xpath)
Gets a list of child nodes that match the specified xpath.
10) public HtmlNode SelectSingleNode (string xpath)
Gets a single word node element that matches the specified xpath.
The methods of querying nodes are mainly the above 10 methods, and this class also has a series of methods for writing nodes. The method of writing operation is not described in detail here, but will be described in detail later.
Combined with Xpath to query nodes is more powerful, which is as convenient as operating xml.
The code for a simple example
The following example is to query the list of blogs in the essence of the blog park. The execution result is as follows:
Code
Using?System
Using?System.Collections.Generic
Using?System.Linq
Using?System.Text
Using?System.IO
Using?HtmlAgilityPack
Namespace?DemoCnBlogs
{
Class?Program
{
Staticvoid?Main (string [] args)
{
HtmlWeb web?=new?HtmlWeb ()
HtmlDocument doc?=?web.Load ("")
HtmlNode node?=?doc.GetElementbyId ("post_list")
StreamWriter sw?=?File.CreateText ("log.txt")
Foreach (HtmlNode child?in?node.ChildNodes)
{
If? (child.Attributes ["class"]? = = null | |? child.Attributes ["class"] .value?! = "post_item")
Continue
HtmlNode hn?=?HtmlNode.CreateNode (child.OuterHtml)
/ / if you use child.SelectSingleNode ("" titlelnk "]") .InnerText, you will always query on the basis of the entire document.
/ this is not good. It should be based on the html of the current child node.
Write (sw, String.Format ("recommended: {0}", hn.SelectSingleNode ("/ / * [@ class=\" diggnum\ "]") .InnerText))
Write (sw, String.Format ("title: {0}", hn.SelectSingleNode ("/ / * [@ class=\" titlelnk\ "]") .InnerText)
Write (sw, String.Format ("introduction: {0}", hn.SelectSingleNode ("/ / * [@ class=\" post_item_summary\ "]") .InnerText))
Write (sw, String.Format ("Information: {0}", hn.SelectSingleNode ("/ / * [@ class=\" post_item_foot\ "]") .InnerText)
Write (sw,? "- -")
}
Sw.Close ()
Console.ReadLine ()
}
Staticvoid?Write (StreamWriter writer,?string?str)
{
Console.WriteLine (str)
Writer.WriteLine (str)
}
}
}
These are all the contents of the article "what are the Common questions in c #?" Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.