In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "how to use hanlp for Chinese word segmentation in the spark cluster environment". In the daily operation, I believe many people have doubts about how to use hanlp for Chinese word segmentation in the spark cluster environment. I have consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to use hanlp for Chinese word segmentation in the spark cluster environment". Next, please follow the editor to study!
1. Copy the downloaded hanlp-portable-1.6.2.jar to the cluster spark/jar folder
2. Start the spark cluster
Spark/bin/spark-shell-- executor-memory 6g-- driver-memory 1g-- executor-cores 2-- num-executors 2-- master spark://master:7077-- jars ansj_seg-5.1.6.jar hanlp-portable-1.6.2.jar
3. Execute the following instructions:
Import com.hankcs.hanlp.tokenizer.StandardTokenizer
Val data = sc.textFile ("hdfs://master:8020/clob.txt")
Val splits = data.filter (line= >! line.contains ("BODY")) .map (line= > line.replace ("[", ")) .map (line= > line.replace ("] ",")) .map (line= > StandardTokenizer.segment (line.toString ()
Splits.first
Prompt result is not serialized:
09:08:25 on 18-04-06 ERROR TaskSetManager: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: com.hankcs.hanlp.seg.common.Term
Serialization stack:
-object not serializable (class: com.hankcs.hanlp.seg.common.Term, value: monitor / ng)
-writeObject data (class: java.util.ArrayList)
-object (class java.util.ArrayList, [Supervisory / ng, Institute / u, Daily / r, dynamic / n, Jinan / ns, Public Security Bureau / n, Supervision / vn, Detachment / n, Secretariat / n, / w, 2012, year / Q, 9, month / Q, 11, Day / j, situation / n, Statistics / v, Today / t, / w, whole city / n, detention center / n, new / a, detention / v, 14Gao, person / n, / w, out / v, u, 14chow, person / n, / w, detention / v, personnel / n, total / n, 2596g, person / n, (/ w, among them / r, including / v, death row inmate / n, 27car, person / n, / w, death reprieve / j, offender / v, 14g, person / n, / w, life imprisonment / l, offender / v, 8g, person / n, / w, law / j, Lungong / nr, personnel / n, 1g, person / n,) / w,. / w, whole city / n, detention facility / n, new / a, detention / v, 47g, person / n, w, out / v, place / u, 20g, person / n, / w, in / p, place / u, personnel / n, total / n, 213 frames, person / n. / w, project / n, unit / n, detained / p, supervised / vn, personnel / n, situation / n, detained / v, total / n, new / a, received / v, out / v, so / u, unconvicted offender / n, law / j, Lungong / nr, focus / n, personnel / n, patient number / n, city / n, detention center / n, 164179144117873, Licheng / ns Detention center / n, 302502550143Gao, Zhangqiu / ns, detention center / n, 24220227066, Chang / a, Qing / a, detention center / n, 1050292050, Ping / v, Yin / a, detention center / n, 860169022, Jiyang / ns, detention center / n, 14401113091, Shang / n, river / n, detention center / n, 760162020, total / v, 259614962121685, city / n, detention center / n 1273315chow, long / a, Qing / a, detention facility / n, 2420, Zhangqiu / ns, detention facility / n, 2332 cong, Ping / v, Yin / a, detention facility / n, 1600, Jiyang / ns, detention facility / n, 2033, Shang / n, river / n, detention facility / n, 1160, total / v, 2214720, receipt / v, education / v, house / u, 30001])
-element of array (index: 0)
-array (class [Ljava.util.List;, size 1); not retrying
At this point, the study on "how to use hanlp for Chinese word segmentation in spark cluster environment" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.