Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Pyhanlp user-defined Dictionary add example description

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Pyhanlp user-defined Dictionary add example description

Pyhanlp is a HanLP encapsulated in python version, project address: https://github.com/hankcs/pyhanlp

After testing, HanLP is better than nltk in Chinese word segmentation and entity recognition.

How do I add a custom dictionary to pyhanlp? Take python 2.7.9 as an example:

1. Install pyhanlp:pip install pyhanlp

two。 Add a custom dictionary under the dictionary path: the CustomDictionary main dictionary text path is data/dictionary/custom/CustomDictionary.txt, where users can add their own words (not recommended), or you can create a new text file through the configuration file; CustomDictionaryPath=data/dictionary/custom/CustomDictionary.txt; my dictionary .txt; to append the dictionary (recommended).

The specific absolute path can be obtained from hanlp-- version:

# hanlp-version

Jar1.6.3:/usr/local/lib/python2.7/site-packages/pyhanlp/static/hanlp-1.6.3.jar

Data 1.6.2: / usr/local/lib/python2.7/site-packages/pyhanlp/static/data

Config:/usr/local/lib/python2.7/site-packages/pyhanlp/static/hanlp.properties

# cat / usr/local/lib/python2.7/site-packages/pyhanlp/static/hanlp.properties | grep "CustomDictionaryPath"

3. It is recommended to add your own dictionary file under this path, such as my dictionary .txt, and add it to the above properties file under CustomDictionaryPath.

# cat my dictionary .txt

Codis Cluster nz 1000

Jinri Toutiao nz 1000

The first entry, the second part of speech (the default is n), and the third word frequency

4. Then you need to delete the cache file so that python will reload the new file:

# rm-f CustomDictionary.txt.bin

5. Test the new dictionary:

Python-c "from pyhanlp import *; print (HanLP.segment ('codis Cluster, Jinri Toutiao')"

May 16, 2018 4:43:14 afternoon com.hankcs.hanlp.corpus.io.IOUtil readBytes

Warning: read

Exception java.io.FileNotFoundException: / usr/local/lib/python2.7/site-packages/pyhanlp/static/data/dictionary/custom/CustomDictionary.txt.bin occurred during / usr/local/lib/python2.7/site-packages/pyhanlp/static/data/dictionary/custom/CustomDictionary.txt.bin (there is no file or directory)

It's okay to report this error, it's just a warning, reload the cache file.

Note:

HanLP part of speech list: a detailed part of speech list can query the content on the hanlp project website for more comprehensive details!

-

Author: Mingyue San Qianli 68

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report