Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use hanlp Chinese word Segmentation in Java

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces how to use hanlp Chinese word segmentation in Java, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Project structure

In this project, the .jar and data folders and .properties need to be downloaded from the official website / github and the data folder.

Project configuration

Modify hanlp.properties:

# / Test/src/hanlp.properties:

# Root directory of the path in this configuration file

# root=E:/SourceCode/eclipsePlace/Test

Root=./

.

The purpose of the configuration file is to tell HanLP that the packet is the location of the data folder, the value of root is the parent directory of the data folder, and you can use an absolute path or a relative path.

Test code

Package com.test

Import java.util.List

Import com.hankcs.hanlp.HanLP

Import com.hankcs.hanlp.seg.common.Term

Import com.hankcs.hanlp.suggest.Suggester

Import com.hankcs.hanlp.tokenizer.NLPTokenizer

Public class MainTest {

Public static void main (String [] args) {

System.out.println ("when the first compilation runs, HanLP will automatically build the dictionary cache, please wait.\ n")

/ / there will be an error that the file cannot be found in the first run, but it will not affect the run, and there will be no more after the cache is completed.

System.out.println ("Standard participle")

System.out.println (HanLP.segment ("Hello, welcome to HanLP!"))

System.out.println ("\ n")

List termList = NLPTokenizer.segment ("Professor Zong Chengqing of the Institute of Computing Technology, Chinese Academy of Sciences is teaching natural language processing courses")

System.out.println ("NLP participle:")

System.out.println (termList)

System.out.println ("\ n")

System.out.println ("Smart recommendation:")

GetSegement ()

System.out.println ("\ n")

System.out.println ("keyword extraction:")

GetMainIdea ()

System.out.println ("\ n")

System.out.println ("automatic Summary:")

GetZhaiYao ()

System.out.println ("\ n")

System.out.println ("phrase extraction:")

GetDuanYu ()

System.out.println ("\ n")

}

/ * *

* Smart recommendation section

, /

Public static void getSegement () {

Suggester suggester = new Suggester ()

String [] titleArray = ("Prince William gives a speech calling for wildlife protection" + "time" person of the year finalists list announced Putin Jack Ma was selected\ n "+"Hagupit" swept the Philippines: the Philippines learned from the experience of Haiyan and evacuated as soon as possible.

+ "Japanese Secrets Law will come into effect" Japanese media accuse it of harming citizens' right to know\ n "+" British report says that air pollution brings "public health crisis") .split ("\ n")

For (String title: titleArray) {

Suggester.addSentence (title)

}

System.out.println (suggester.suggest ("speech", 1)); / / semantics

System.out.println (suggester.suggest ("crisis Public", 1)); / / character

System.out.println (suggester.suggest ("mayun", 1)) / / Pinyin

}

/ * *

* keyword extraction

, /

Public static void getMainIdea () {

String content = "programmers (English Programmer) are professionals engaged in program development and maintenance. Programmers are generally divided into programmers and programmers, but the distinction between them is not very clear, especially in China. Software practitioners are divided into four categories: junior programmers, senior programmers, system analysts and project managers."

List keywordList = HanLP.extractKeyword (content, 5)

System.out.println (keywordList)

}

/ * *

* automatic summary

, /

Public static void getZhaiYao () {

String document = "algorithms can be roughly divided into basic algorithms, data structure algorithms, number theory algorithms, computational geometry algorithms, graph algorithms, dynamic programming and numerical analysis, encryption algorithms, sorting algorithms, retrieval algorithms, randomization algorithms, parallel algorithms, Hermitian deformation models, random forest algorithms.\ n"

+ "algorithms can be broadly divided into three categories,\ n" + "one, finite deterministic algorithms, which are terminated within a limited period of time. They may take a long time to perform a specified task, but will still be terminated within a certain period of time. The results of such algorithms often depend on input values.\ n"

+ "two, finite indeterminate algorithms, which are terminated in a limited time. However, for a given value (or some), the result of the algorithm is not unique or definite.\ n"

"third, infinite algorithms are those that do not stop running because there are no defined termination conditions, or the defined conditions cannot be satisfied by the input data. In general, infinite algorithms are generated because the termination conditions are not defined."

List sentenceList = HanLP.extractSummary (document, 3)

System.out.println (sentenceList)

}

/ * *

* phrase extraction

, /

Public static void getDuanYu () {

String text = "algorithm engineer\ n"

The "+" algorithm (Algorithm) is a series of clear instructions to solve the problem, that is, the required output can be obtained for a certain standard input in a limited time. If an algorithm is flawed or is not suitable for a problem, executing the algorithm will not solve the problem. Different algorithms may use different time, space or efficiency to accomplish the same task. The advantages and disadvantages of an algorithm can be measured by space complexity and time complexity. An algorithm engineer is a person who uses algorithms to deal with things. \ n "

+ "\ n" + "1 position profile\ n" + "algorithm engineer is a very high-end position;\ n" + "Professional requirements: computer, electronics, communications, mathematics and other related majors;\ n"

+ "academic requirements: bachelor's degree or above, mostly master's degree or above;\ n" + "language requirement: English requirement is proficient, basically able to read foreign professional books and periodicals;\ n"

+ "must master computer related knowledge, be proficient in using simulation tool MATLAB, etc., and must know a programming language.\ n" + "\ n" + "2 Research Direction\ n"

+ "Video algorithm engineer, image processing algorithm engineer, audio algorithm engineer communication baseband algorithm engineer\ n" + "\ n" 3 current situation at home and abroad\ n "

At present, there are many engineers engaged in algorithm research in China, but there are few senior algorithm engineers, who are very scarce. According to the research field, algorithm engineers are mainly divided into audio / video algorithm processing, two-dimensional information algorithm processing in image technology and one-dimensional information algorithm processing in communication physical layer, radar signal processing, biomedical signal processing and other fields.\ n "

At present, there are relatively advanced video processing algorithms in two-dimensional information processing, such as computer audio and video, graphics and image technology: machine vision has become the core of this kind of algorithm research. In addition, there are 2D to 3D algorithm (2D-to-3D conversion), de-interlacing algorithm (de-interlacing), motion estimation motion compensation algorithm (Motion estimation/Motion Compensation), denoising algorithm (Noise Reduction), scaling algorithm (scaling), sharpening algorithm (Sharpness), super-resolution algorithm (Super Resolution), gesture recognition (gesture recognition), face recognition (face recognition). \ n "

+ "algorithms commonly used in the field of one-dimensional information, such as communication physical layer, such as RRM and RTT in wireless field, modulation and demodulation in transmission field, channel equalization, signal detection, network optimization, signal decomposition, etc.\ n" in addition, data mining and Internet search algorithms have also become popular directions.\ n "

"algorithm engineers are gradually developing in the direction of artificial intelligence."

List phraseList = HanLP.extractPhrase (text, 10)

System.out.println (phraseList)

}

}

Running result

On how to use hanlp Chinese word segmentation in Java to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report