Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use hanlp for Chinese word Segmentation in spark Cluster Environment

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use hanlp for Chinese word segmentation in the spark cluster environment". In the daily operation, I believe many people have doubts about how to use hanlp for Chinese word segmentation in the spark cluster environment. I have consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to use hanlp for Chinese word segmentation in the spark cluster environment". Next, please follow the editor to study!

1. Copy the downloaded hanlp-portable-1.6.2.jar to the cluster spark/jar folder

2. Start the spark cluster

Spark/bin/spark-shell-- executor-memory 6g-- driver-memory 1g-- executor-cores 2-- num-executors 2-- master spark://master:7077-- jars ansj_seg-5.1.6.jar hanlp-portable-1.6.2.jar

3. Execute the following instructions:

Import com.hankcs.hanlp.tokenizer.StandardTokenizer

Val data = sc.textFile ("hdfs://master:8020/clob.txt")

Val splits = data.filter (line= >! line.contains ("BODY")) .map (line= > line.replace ("[", ")) .map (line= > line.replace ("] ",")) .map (line= > StandardTokenizer.segment (line.toString ()

Splits.first

Prompt result is not serialized:

09:08:25 on 18-04-06 ERROR TaskSetManager: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: com.hankcs.hanlp.seg.common.Term

Serialization stack:

-object not serializable (class: com.hankcs.hanlp.seg.common.Term, value: monitor / ng)

-writeObject data (class: java.util.ArrayList)

-object (class java.util.ArrayList, [Supervisory / ng, Institute / u, Daily / r, dynamic / n, Jinan / ns, Public Security Bureau / n, Supervision / vn, Detachment / n, Secretariat / n, / w, 2012, year / Q, 9, month / Q, 11, Day / j, situation / n, Statistics / v, Today / t, / w, whole city / n, detention center / n, new / a, detention / v, 14Gao, person / n, / w, out / v, u, 14chow, person / n, / w, detention / v, personnel / n, total / n, 2596g, person / n, (/ w, among them / r, including / v, death row inmate / n, 27car, person / n, / w, death reprieve / j, offender / v, 14g, person / n, / w, life imprisonment / l, offender / v, 8g, person / n, / w, law / j, Lungong / nr, personnel / n, 1g, person / n,) / w,. / w, whole city / n, detention facility / n, new / a, detention / v, 47g, person / n, w, out / v, place / u, 20g, person / n, / w, in / p, place / u, personnel / n, total / n, 213 frames, person / n. / w, project / n, unit / n, detained / p, supervised / vn, personnel / n, situation / n, detained / v, total / n, new / a, received / v, out / v, so / u, unconvicted offender / n, law / j, Lungong / nr, focus / n, personnel / n, patient number / n, city / n, detention center / n, 164179144117873, Licheng / ns Detention center / n, 302502550143Gao, Zhangqiu / ns, detention center / n, 24220227066, Chang / a, Qing / a, detention center / n, 1050292050, Ping / v, Yin / a, detention center / n, 860169022, Jiyang / ns, detention center / n, 14401113091, Shang / n, river / n, detention center / n, 760162020, total / v, 259614962121685, city / n, detention center / n 1273315chow, long / a, Qing / a, detention facility / n, 2420, Zhangqiu / ns, detention facility / n, 2332 cong, Ping / v, Yin / a, detention facility / n, 1600, Jiyang / ns, detention facility / n, 2033, Shang / n, river / n, detention facility / n, 1160, total / v, 2214720, receipt / v, education / v, house / u, 30001])

-element of array (index: 0)

-array (class [Ljava.util.List;, size 1); not retrying

At this point, the study on "how to use hanlp for Chinese word segmentation in spark cluster environment" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report