In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "how to understand the Chinese word segmentation category of PHPAnalysis". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
PHPAnalysis is a widely used Chinese word segmentation category, which uses reverse matching pattern for word segmentation, so its compatible coding is more extensive. Its variables and common functions are described in detail as follows:
I. more important member variables
Data types of word segmentation results generated by $resultType = 1 (1 for all, 2 for dictionary words and single Chinese, Japanese and Korean characters and English, 3 for dictionary words and English)
This variable is typically set using the SetResultType ($rstype) method.
$notSplitLen = 5 minimum length of syncopated sentence
$toLower = false lowercase all English words
$differMax = false uses the maximum syncopation mode to disambiguate binary words
$unitWord = true attempts to merge words (that is, new word recognition)
$differFreq = false uses buzzword priority mode to disambiguate
II. List of main member functions
1. Public function _ _ construct ($source_charset='utf-8', $target_charset='utf-8', $load_all=true, $source='')
Function description: constructor
Parameter list:
$source_charset source string encoding
$target_charset directory string encoding
Whether $load_all fully loads the dictionary (this parameter has been invalidated)
$source source string
If both the input and output are utf-8, you can actually set the text to be operated by using the SetSource method instead of using any parameters for initialization
2. Public function SetSource ($source, $source_charset='utf-8', $target_charset='utf-8')
Function description: set source string
Parameter list:
$source source string
$source_charset source string encoding
$target_charset directory string encoding
Return value: bool
3. Public function StartAnalysis ($optimize=true)
Function description: start to perform word segmentation operation
Parameter list:
Whether to try to optimize the result after $optimize participle
Return value: void
A basic process of word segmentation:
/ /
$pa = new PhpAnalysis ()
$pa- > SetSource ('string requiring participle')
/ / set the attributes of word segmentation
$pa- > resultType = 2
$pa- > differMax = true
$pa- > StartAnalysis ()
/ / get the results you want
$pa- > GetFinallyIndex ()
/ /
4. Public function SetResultType ($rstype)
Function description: sets the type of result returned
It is actually an operation on the member variable $resultType
The parameter $rstype value is:
1 for all, 2 for dictionary words and single Chinese, Japanese and Korean characters and English, 3 for dictionary words and English
Return value: void
5. Public function GetFinallyKeywords ($num = 10)
Function description: gets the specified number of entries with the highest frequency (usually used to extract document keywords)
Parameter list:
$num = 10 returns the number of entries
Return value: a list of keywords separated by ","
6. Public function GetFinallyResult ($spword='')
Function description: get the final word segmentation result
Parameter list:
Separator between $spword entries
Return value: string
7. Public function GetSimpleResult ()
Function description: get the rough score result
Return value: array
8. Public function GetSimpleResultAll ()
Function description: get the rough score result containing attribute information
Attributes (1 Chinese words and sentences, 2 ANSI words (including full width), 3 ANSI punctuation marks (including full width), 4 numbers (including full width), 5 Chinese punctuation or unrecognized characters)
Return value: array
9. Public function GetFinallyIndex ()
Function description: gets an array of hash indexes
Return value: array ('word'= > count,...) Sort by frequency of occurrence
10. Public function MakeDict ($source_file, $target_file='')
Function description: compiling a thesaurus of text files into a dictionary
Parameter list:
$source_file source text file
$target_file target file (current dictionary if not specified)
Return value: void
11. Public function ExportDict ($targetfile)
Function description: export all entries in the current dictionary to text files
Parameter list:
$targetfile object file
Return value: void
This is the end of the content of "how to understand PHPAnalysis Chinese word segmentation". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.