In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article shows you how to write a complete and optimized SuffixTree code, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.
Some friends remind me that this time the code that implements the three optimized complete SuffixTree of Ukkonen's paper is released. Read the previous blog code that only implements SuffixLink optimization, and then look at this should be very simple.
The following is the header file SuffixTree.h of SuffixTree
# pragma once#include # include using namespace std;class SuffixNode {public: vector masked pSons; SuffixNode* masked pFarther; SuffixNode* masked pSuffixLink; int massively iPathPos; int massively EdgeStart; int massively EdgeEnd;}; class SuffixTree {public: int massively EdgeEnd. SuffixNode* masked pRootschaft the root of the tree. The string that the tree represent string that the tree represent.}; / / Means a sub string of the suffix tree (string [beging], string [end]). Class TreePath {public: int masked iBegin; int massively end;}; / Represent the char in a node's incoming edge.class TreePos {public: TreePos () {m_iEdgePos = 0; m_pNode = NULL;} TreePos (int edgePos,SuffixNode* pNode) {m_iEdgePos = edgePos M_pNode = pNode;} int masked EdgePossinct the ith char of the incoming edge. The node we are going to search.}; / / = = Class Declarations== void SingleCharExtesion (SuffixTree* pTree,TreePos*& pPos, TreePath extendStrPath,int* firstExtensionFlag,bool * rule3Applied,int* iLeafNum); / * Add s [0....i+1], s [1... I + 1].... To the suffix tree Input: SuffixNode* pNode: When we only use trick 1,pNode is the pointer to the longest leaf,s [0.i]. Phase: Equals iTunes 1 in the paper.*/void SinglePhaseExtend (SuffixTree* pTree,TreePos* & pPos,int phase,int* iExtension,int* ruleApplied,bool* lastRule3Applied,int * iLeafNum); SuffixNode* CreateTreeNode (SuffixNode* pFarther,int iedgeStart,int iedgeEnd,int pathPos); / * FollowSuffixLink: Follows the suffix link of the source node according to Ukkonen's rules (jump from s [Jmurl 1. I] to s [j.I]). Input: The tree, and node. The node is the last internal node we visited. Output: The destination node that represents the longest suffix of node's path. Example: if node represents the path "abcde" then it returns the node that represents "bcde". * / void FollowSuffixLink (SuffixTree* pTree,TreePos*& pPos, TreePath strji); int GetNodeLabelLength (SuffixTree* pTree,SuffixNode* pNode); int GetNodeLabelEnd (SuffixTree* pTree,SuffixNode* pNode); / * Find the son node which starts with the char,ch.*/SuffixNode* Find_Son (SuffixTree* pTree,SuffixNode* pFarNode, char ch); bool IsTheLastCharInEdge (SuffixTree* pTree,SuffixNode* pNode, int edge_pos) SuffixNode* ApplyExtensionRule2 (SuffixNode* pNode,int edgeLabelBeg,int edgeLabelEnd,int edgePos,int pathPos,bool newLeafFlag); / * Trace the sub string (TreePath str) from the node (SuffixNode* pNode). Input: int* edgePos: where the last char is found at that edge int* charsFound: how many chars of str have been found. Bool skipFlag: Use skip trick or not. * / SuffixNode* TraceString (SuffixTree* pTree,SuffixNode* pNode,TreePath str,int* edgePos,int* charsFound,bool skipFlag); / * Trace the substring (TreePath strPath) in one single edge out of pNode.*/SuffixNode* TraceSingleEdge (SuffixTree* pTree,SuffixNode* pNode,TreePath strPath,int* charsFound,int* edgePos,bool* searchDone,bool skipFlag); SuffixNode* CreateFirstCharacter (SuffixTree* pTree); / / Add the first character to the suffix tree.SuffixTree* CreateSuffixTree (string tStr) / * For Debug:See if the sub string (from root to pPos) equals pTree- > string [subPath.m _ iBegin,subPath.m_iEnd] * / bool TestPosSubStringEqualPath (SuffixTree* pTree,TreePos * pPos, TreePath subPath); / * For debugging, We want to see if the suffix link from linkFrom to linkTo is right.*/bool TestSuffixLinkMatch (SuffixTree* pTree, SuffixNode* linkFrom,SuffixNode * linkTo); int FindSubString (SuffixTree* pTree,string subStr); / / =
The following is the implementation file SuffixTree.cpp of SuffixTree
# pragma once#include "SuffixTree.h" # include using namespace std;SuffixNode* pNodeNoSuffixLink=NULL;//==Class Definitions==/* Trace the substring (TreePath strPath) in one single edge going out of pNode Input: int* edgeCharsFound: how many characters we find matched in the outgoing edge of pNode.*/SuffixNode* TraceSingleEdge (SuffixTree* pTree,SuffixNode* pNode,TreePath strPath,int* edgeCharsFound,int* edgePos,bool* searchDone,bool skipFlag) {/ / Find outgoing edge of pNode with our first character. SuffixNode* nextNode = Find_Son (pTree,pNode,pTree- > mczTreeString [strPath.m _ iBegin]); * searchDone = true; if (nextNode = = NULL) {/ / There is no match in pNode's sons,so we can only return to pNode. * edgePos = GetNodeLabelLength (pTree,pNode); * edgeCharsFound = 0; return pNode;} int edgeLen = GetNodeLabelLength (pTree,nextNode); int strLen = strPath.m_iEnd-strPath.m_iBegin + 1; if (skipFlag = = true) / / Use the trick1: skip {if (edgeLen)
< strLen) { *searchDone = false; *edgeCharsFound = edgeLen; *edgePos = edgeLen - 1; } else if(edgeLen == strLen) { *edgeCharsFound = edgeLen; *edgePos = edgeLen - 1; } else { *edgeCharsFound = strLen; *edgePos = strLen - 1; } return nextNode; } else//No skip,match each char one after another { *edgePos = 0; *edgeCharsFound = 0; //Find out the min length if(strLen < edgeLen) edgeLen = strLen; for(*edgeCharsFound=1,*edgePos=1;(*edgePos)m_czTreeStr[ nextNode->M_iEdgeStart + * edgePos]! = pTree- > m _ czTreeString [strPath.m _ iBegin + * edgePos]) {(* edgePos) -; return nextNode;} / / When it comes here, (* edgePos) is one more (* edgePos) -; if (* edgeCharsFound)
< strLen) { *searchDone = false; } return nextNode;}/* Trace the sub string(TreePath str) from the node(SuffixNode* pNode). Input: int* edgePos :For output , where the last char is found at that edge int* charsFound : How many chars of str have been found. bool skipFlag : Use skip trick or not. */SuffixNode* TraceString(SuffixTree* pTree,SuffixNode* pNode,TreePath str,int* edgePos,int* charsFound,bool skipFlag){ bool searchDone=false; *charsFound = 0; *edgePos=0 ; int edgeCharsFound=0; while(searchDone==false) { edgeCharsFound = 0; *edgePos=0; pNode = TraceSingleEdge(pTree,pNode,str,&edgeCharsFound,edgePos,&searchDone,skipFlag); str.m_iBegin += edgeCharsFound; *charsFound += edgeCharsFound; } //if(*charsFound == 0) // return NULL; return pNode;}/* Input: (1) pNode : the node who is going to add a new son or whose edge is going to be split. (2) edgeLabelBeg : when newleafFlag==true,it's the edge begin label of the new leaf. when when newleafFlag==false, it's the edge begin label of the new new leaf( the leaf of s[i+1], not s[i]). (3) like above : just the end (4 )int edgePos : where split is done to pNode if newLeafFlag==false (the 0th position or 1th position or...)*/SuffixNode* ApplyExtensionRule2(SuffixNode* pNode,int edgeLabelBeg,int edgeLabelEnd,int edgePos,int pathPos,bool newLeafFlag){ if(newLeafFlag==true) { //Add an new leaf SuffixNode* newLeaf = CreateTreeNode(pNode,edgeLabelBeg,edgeLabelEnd,pathPos); return newLeaf; } else { //Add an new internal node and an new leaf //First create the new internal node. SuffixNode* nInternalNode = CreateTreeNode(pNode-> < subStr.length() && edgeIndex m_czTreeStr[edgeIndex] == subStr[strIndex]) { strIndex++; edgeIndex++; } if(strIndex == subStr.length()) { //we found it return node->Else if (edgeIndex > node- > m_iEdgeEnd) {node = Find_Son (pTree,node,subStr [strIndex]);} else {return-1 }} return-1;} / * For debugging, We want to see if the suffix link from linkFrom to linkTo is right.*/bool TestSuffixLinkMatch (SuffixTree * pTree, SuffixNode* linkFrom,SuffixNode * linkTo) {char ch2,ch3; while (linkFrom- > masked pFarther-> m_pRoot & & linkFrom- > masked pFartherized trees-> m_pRoot) {linkFrom = linkFrom- > m_pFarther } if (linkFrom- > m_pFarther = = pTree- > m_pRoot) {if (linkFrom- > m_iEdgeEnd = = linkFrom- > m_iEdgeStart) {if (linkTo! = pTree- > m_pRoot) {return false } else {return true;}} else {ch2 = pTree- > m _ czTreeString [linkfrom-> m_iEdgeStart + 1] }} else {if (linkFrom- > mroompFarther-> m_iEdgeStart = = linkFrom- > mroompFarther-> m_iEdgeEnd) {ch2 = pTree- > mczTreeString [linkFrom-> m_iEdgeStart] } else {ch2 = pTree- > mczTreeString [linkFrom-> mroompFarther-> m_iEdgeStart + 1];}} while (linkTo- > m_pFarther! = pTree- > m_pRoot) {linkTo = linkTo- > m_pFarther } ch3 = pTree- > m _ czTreeStrLinkTo-> m_iEdgeStart]; if (ch2 = = ch3) return true; else return false;}
Finally, there is the main function of the call, a simple judgment whether the substring is in the string, if the substring subStr exists in the character creation str, output its starting point, otherwise the output does not exist.
# include "SuffixTree.h" # include # include using namespace std;int main () {string str= "avnaoihvnjovsdvaeveavsfvwevfwa vafjoajv aoja aojfoaj afowajop afoajv afjoajo csvweavefvs fjwojvwoajv haovpjvowj ajovwjavo aojvowajv ajvoajv vnaojv vfdvdfvavaewvavaisadnvioenvoehnvPIavnaoihvnjovsdvaeveavsfvwevfwa vafjoajv aoja aojfoaj afowajop afoajv afjoajo csvweavefvs fjwojvwoajv haovpjvowj ajovwjavo aojvowajv ajvoajv vnaojv vfdvdfvavaewvavaisadnvioenvoehnvPI"; string subStr= "vafjoajv aoja aojfoaj afowajop afoajv afjoajo csvweavefvs fjwojvwoajv haov"; SuffixTree* pTree = CreateSuffixTree (str); int existFlag = FindSubString (pTree,subStr); if (existFlag > = 0) cout
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.