In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
Today, I would like to share with you the relevant knowledge of how to achieve KDTree on C++. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.
Brief introduction
KMurd tree (k-dimensional) is a data structure that divides k-dimensional data space (a data structure that divides data points in k-dimensional space). It is mainly used to search key data in multi-dimensional space (such as range search and nearest neighbor search).
KdTree concept
Kd-tree or k-dimensional tree is a data structure used in computer science, which is used to organize a set of points in k-dimensional space. It is a binary search tree with other constraints. Kd-tree is very useful for interval and nearest neighbor searches. Generally, kd-tree is commonly used in neighborhood search in three-dimensional space, so all the kd-tree in this paper are three-dimensional kd-tree.
Give an example
The picture above shows a kdtree, and you can see that kdtree is a variant of the binary search tree.
The nature of kdtree:
Kdtree has the characteristic of balance, and the height difference between the two leaves is less than 1. (the more balanced the tree is, the more evenly it is divided and the less time it takes to search.)
The data is only stored in the leaf node, while the root node and the middle node store some spatial partition information (such as partition dimension, partition value).
Each tuple is sorted by 0 (the first item is 0, the second item is 1, and the third item is 2). In the n layer of the tree, the n% 3 items are shown in bold, and these trees are shown in bold as the key values of the binary search tree. for example, the first item of each node in the left subtree of the root node is less than the first item of the root node. The first item in the node of the right subtree is larger than the first term of the root node, and so on.
The function of segmentation in one dimension
for a standard BSTree, each node has only one key value.
maps key values to one-dimensional axes.
The root node corresponds to 2, the left subtree is on the left side of 2, and the right subtree is on the right side of 2. The whole one-dimensional space is divided into two parts by the root node. When looking for node 0, because it is on the left side of 2, you can rest assured to search only the left subtree. The whole search process can be seen as the process of constantly dividing the search interval until the target node is found.
Two-dimensional
Such segmentation can be extended to two-dimensional or even more dimensional cases.
But the question is, how do two-dimensional nodes compare their size?
In BSTree, the node divides the one-dimensional axis, so in two-dimensional, it should be the split plane, like this:
The yellow point is used as the root node, the upper point belongs to the left subtree, and the lower point belongs to the right subtree, and then it is divided again and again, and finally a tree is the famous BSPTree (binary space partitioning tree). The line that is divided is called split hyperplane (splitting hyperplane), which is a point in one dimension, a line in two dimensions and a face in three dimensions.
N-dimensional
KDTree is the BSPTree in which the hyperplane is perpendicular to the axis. The same dataset, when divided with KDTree, looks like this:
The yellow node is the Root node, the next layer is red, the next layer is green, and the next layer is blue. In order to better understand the segmentation of KDTree, let's take a look at the search process in the graph. Suppose we need to search for a point in the lower right corner. The first thing to do is to compare the x coordinates of this point with the x coordinate values of root points. Because the x coordinate values are greater than the x coordinates of root nodes, we only need to search on the right. Next, we need to compare the size of the node with the red node on the right. And so on. The whole process is shown below:
1.
two。
3.
The first important question about kdtree. The establishment of trees
1. Data structure of node
Definition:
Node-data-data vector, a data point in a data set, is an n-dimensional vector (in this case, k-dimensional)
Range-Space vector, the range of space represented by the node
Split-integer, direction axis serial number perpendicular to the split hyperplane
The Left-Kmurd tree, which is composed of all the data points located in the left subspace of the partition hyperplane of the node.
The Right-Kmurd tree, which is composed of all the data points located in the right subspace of the node partition hyperplane.
Parent-k murd tree, parent node
two。 Optimize
1. Segmentation dimension optimization
Before the construction starts, comparing the distribution of data points in each dimension, the larger the variance of the coordinate values of data points in a certain dimension, the more dispersed the distribution, and the smaller the variance is, the more concentrated the distribution is. Starting from the dimension with large variance, we can achieve a good segmentation effect and balance.
two。 Median selection optimization
First, before the start of the algorithm, the original data points are sorted once in all dimensions, stored, and then in the subsequent median selection, there is no need to sort its subset every time, which improves the performance.
Second, we randomly select a fixed number of points from the original data points, then sort them, and take the median value from these sample points each time as a segmented hyperplane. This method has been proved to have good performance and good balance in practice.
two。 Nearest neighborhood search (Nearest-Neighbor Lookup)
Given a KDTree and a node, find the nearest node in the KDTree. (this node is the nearest point)
The distance here is calculated by Euclidean distance.
The basic idea is simple: first, search through the binary tree (compare the value of the split dimension between the node to be queried and the split node, enter the left subtree branch if less than or equal to, and enter the right subtree branch until the leaf node). Follow the "search path" to find the nearest neighbor's approximate point, that is, the leaf node in the same subspace as the query point. Then backtrack the search path and determine whether there may be data points closer to the query point in the other sub-node space of the node on the search path, and if possible, you need to jump to other sub-node space to search (add other sub-nodes to the search path). Repeat this process until the search path is empty.
There are a few details to pay attention to here, such as the following figure, assuming that the point marked as the star is test point, and the green point is the approximate point found. In the backtracking process, a queue is needed to store the points that need to be traced back. When judging whether there may be data points closer to the query point in other sub-node space, the method is to draw a circle with the query point as the center and the current nearest distance as the radius. This circle is called candidate candidate hypersphere. If the circle intersects the axis of the backtracking point, you need to put all the nodes on the other side of the axis into the backtracking queue.
The method to determine whether the axis intersects the candidate hyperspheres can be found in the following figure:
Key code
Build KDTree
Void KDTree::buildKdTree (KDTree * tree, vector data, unsigned int depth) {/ / number of samples unsigned long samplesNum = data.size (); / / termination condition if (samplesNum = = 0) {return;} if (samplesNum = = 1) {tree- > root = data [0]; dimension unsigned long k of return;} / / sample = data [0] .size () / / number of axes vector transData = Transpose (data); / / Select split attribute unsigned splitAttribute = depth% k; vector splitAttributeValues = transData [splitAttribute]; / / Select split value double splitValue = findMiddleValue (splitAttributeValues); / / cout left_child = new KDTree; tree- > left_child- > parent = tree; tree- > right_child = new KDTree; tree- > right_child- > parent = tree; buildKdTree (tree- > left_child, subset1, depth + 1) BuildKdTree (tree- > right_child, subset2, depth + 1);}
Query the nearest neighbor of the target point
Vector KDTree::searchNearestNeighbor (vector goal, KDTree * tree) {/ * first step: find the leaf node containing the target point in the kd tree: access the kd tree recursively from the root node. If the coordinate of the current dimension of the target point is less than the coordinate of the syncopation point, move to the left child node, otherwise move to the right child node Until the child node is the leaf node, the leaf node is the "current closest point" * / unsigned long k = tree- > root.size () / / calculate the dimension of the data unsigned d = 0 if / the dimension initialized to 0, that is, from the first dimension, KDTree* currentTree = tree; vector currentNearest = currentTree- > root; while (! currentTree- > is_leaf ()) {unsigned index = d% right_child- / calculate the current dimension if (currentTree- > right_child- > is_empty () | goal [index]
< currentNearest[index]) { currentTree = currentTree->Left_child;} else {currentTree = currentTree- > right_child;} + d;} currentNearest = currentTree- > root / * step 2: back up recursively and do the following at each node: (a) if the instance saved by the node is closer to the target point than the current nearest point, then the instance point is the "current nearest point" (b) the current nearest point must exist in the area corresponding to a child node of a node. Check whether the corresponding area of the other child node of the parent node of the child node has a nearer point (that is, check whether the corresponding area of the other child node intersects the sphere with the target point as the sphere center and the distance between the target point and the "current nearest point" as the radius) If it intersects, there may be a point closer to the target point in the area corresponding to another sub-node, move to another sub-node, and then recursively search for the nearest neighbor. If it does not intersect, back up * / / the distance between the current nearest neighbor and the target point double currentDistance = measureDistance (goal, currentNearest, 0); / / if the root node of the current child kd tree is the left child of its parent node, search the area represented by the right child node of its parent node, and vice versa KDTree* searchDistrict If (currentTree- > is_left ()) {if (currentTree- > parent- > right_child = = nullptr) searchDistrict = currentTree; else searchDistrict = currentTree- > parent- > right_child;} else {searchDistrict = currentTree- > parent- > left_child } / / if the root node of the subkd tree corresponding to the search area is not the root node of the entire kd tree, go back to search while (searchDistrict- > parent! = nullptr) {/ / the nearest distance between the search area and the target point double districtDistance = abs (goal [(dumb1)% k]-searchDistrict- > parent- > root [(dumb1)% k]) / / if the nearest distance between the search area and the target point is shorter than the distance between the current nearest neighbor and the target point, it indicates that there may be a point if (districtDistance) closer to the target point in the search / / area.
< currentDistance )//&& !searchDistrict->IsEmpty () {double parentDistance = measureDistance (goal, searchDistrict- > parent- > root, 0); if (parentDistance
< currentDistance) { currentDistance = parentDistance; currentTree = searchDistrict->Parent; currentNearest = currentTree- > root;} if (! searchDistrict- > is_empty ()) {double rootDistance = measureDistance (goal, searchDistrict- > root, 0); if (rootDistance
< currentDistance) { currentDistance = rootDistance; currentTree = searchDistrict; currentNearest = currentTree->Root;}} if (searchDistrict- > left_child! = nullptr) {double leftDistance = measureDistance (goal, searchDistrict- > left_child- > root, 0); if (leftDistance
< currentDistance) { currentDistance = leftDistance; currentTree = searchDistrict; currentNearest = currentTree->Root;}} if (searchDistrict- > right_child! = nullptr) {double rightDistance = measureDistance (goal, searchDistrict- > right_child- > root, 0); if (rightDistance
< currentDistance) { currentDistance = rightDistance; currentTree = searchDistrict; currentNearest = currentTree->Root;} / / end if if (searchDistrict- > parent- > parent! = nullptr) {searchDistrict = searchDistrict- > parent- > is_left ()? SearchDistrict- > parent- > parent- > right_child: searchDistrict- > parent- > parent- > left_child;} else {searchDistrict = searchDistrict- > parent;} + d;} / / end while return currentNearest;} above is all the content of the article "how C++ can achieve KDTree". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.