Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the Chinese word Segmentation of PHPAnalysis

2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "how to understand the Chinese word segmentation category of PHPAnalysis". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

PHPAnalysis is a widely used Chinese word segmentation category, which uses reverse matching pattern for word segmentation, so its compatible coding is more extensive. Its variables and common functions are described in detail as follows:

I. more important member variables

Data types of word segmentation results generated by $resultType = 1 (1 for all, 2 for dictionary words and single Chinese, Japanese and Korean characters and English, 3 for dictionary words and English)

This variable is typically set using the SetResultType ($rstype) method.

$notSplitLen = 5 minimum length of syncopated sentence

$toLower = false lowercase all English words

$differMax = false uses the maximum syncopation mode to disambiguate binary words

$unitWord = true attempts to merge words (that is, new word recognition)

$differFreq = false uses buzzword priority mode to disambiguate

II. List of main member functions

1. Public function _ _ construct ($source_charset='utf-8', $target_charset='utf-8', $load_all=true, $source='')

Function description: constructor

Parameter list:

$source_charset source string encoding

$target_charset directory string encoding

Whether $load_all fully loads the dictionary (this parameter has been invalidated)

$source source string

If both the input and output are utf-8, you can actually set the text to be operated by using the SetSource method instead of using any parameters for initialization

2. Public function SetSource ($source, $source_charset='utf-8', $target_charset='utf-8')

Function description: set source string

Parameter list:

$source source string

$source_charset source string encoding

$target_charset directory string encoding

Return value: bool

3. Public function StartAnalysis ($optimize=true)

Function description: start to perform word segmentation operation

Parameter list:

Whether to try to optimize the result after $optimize participle

Return value: void

A basic process of word segmentation:

/ /

$pa = new PhpAnalysis ()

$pa- > SetSource ('string requiring participle')

/ / set the attributes of word segmentation

$pa- > resultType = 2

$pa- > differMax = true

$pa- > StartAnalysis ()

/ / get the results you want

$pa- > GetFinallyIndex ()

/ /

4. Public function SetResultType ($rstype)

Function description: sets the type of result returned

It is actually an operation on the member variable $resultType

The parameter $rstype value is:

1 for all, 2 for dictionary words and single Chinese, Japanese and Korean characters and English, 3 for dictionary words and English

Return value: void

5. Public function GetFinallyKeywords ($num = 10)

Function description: gets the specified number of entries with the highest frequency (usually used to extract document keywords)

Parameter list:

$num = 10 returns the number of entries

Return value: a list of keywords separated by ","

6. Public function GetFinallyResult ($spword='')

Function description: get the final word segmentation result

Parameter list:

Separator between $spword entries

Return value: string

7. Public function GetSimpleResult ()

Function description: get the rough score result

Return value: array

8. Public function GetSimpleResultAll ()

Function description: get the rough score result containing attribute information

Attributes (1 Chinese words and sentences, 2 ANSI words (including full width), 3 ANSI punctuation marks (including full width), 4 numbers (including full width), 5 Chinese punctuation or unrecognized characters)

Return value: array

9. Public function GetFinallyIndex ()

Function description: gets an array of hash indexes

Return value: array ('word'= > count,...) Sort by frequency of occurrence

10. Public function MakeDict ($source_file, $target_file='')

Function description: compiling a thesaurus of text files into a dictionary

Parameter list:

$source_file source text file

$target_file target file (current dictionary if not specified)

Return value: void

11. Public function ExportDict ($targetfile)

Function description: export all entries in the current dictionary to text files

Parameter list:

$targetfile object file

Return value: void

This is the end of the content of "how to understand PHPAnalysis Chinese word segmentation". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report