In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "how to use OpenNLP". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
# Chapter 1 introduction #
# # Overview # #
The Apache OpenNLP package is a machine learning-based tool for dealing with natural languages. It supports commonly used NLP tasks, such as word segmentation (tokenization), sentence segmentation (sentence segementation), named question extraction (named entity extraction), chunking, syntactic analysis (parsing), coreference resolution. These tasks are usually necessary to build more advanced text processing services. OpenNLP also includes maximum entropy and perception-based machine learning.
The goal of the OpenNLP project is to create a mature toolset for the above tasks. Another goal is to provide a large number of pre-built (pre-build) models for a variety of languages and annotated text resources derived from these models.
# # General package structure # #
OpenNLP includes many components that allow you to build a complete natural language processing pipeline. These components include: sentence detector, tokenizer, name finder, document categorizer, part-of-speech tagger, chunker, parser, coreference resolution. Components contain components that allow you to perform independent natural language processing tasks, such as training a model, or evaluating a model frequently. It is easy to understand each of their tools through his API. In addition, (OpenNLP) provides a command line (CLI) for easy testing and training.
# # General example of API # #
The OpenNLP component has a similar API. Usually, to perform a task, you should provide a model and an input.
Typically, you load a model by providing a FileInputStream with a model for the constructor of a model class.
InputStream modelIn = new FileInputStream ("lang-model-name.bin"); try {SomeModel model = new SomeModel (modelIn);} catch (IOException e) {/ / handle the exception} finally {if (null! = modelIn) {try {modelIn.close ();} catch (IOException e) {}}
After the model is loaded, the component tool itself can be instantiated.
ToolName toolName = new ToolName (model)
After the tool is instantiated, the processing task can be performed. For this tool, the input and output format is specific, but, typically, the output is an String array, and the input is an String or String array.
String output [] = toolName.executeTask ("This is a sample text.")
# # Command Line Interface (CLI) # #
# Overview #
OpenNLP provides a command-line script that serves as a unique entry point for all its tools. The script is located in the bin directory of the OpenNLP binary package. Contains Windows versions: opennlp.bat and Linux and Linux compatible systems: opennlp.
# install #
The OpenNLP script uses the JAVA_CMD and JAVA_HOME variables to decide which command to use to execute the Java virtual machine.
The OpenNLP script uses the "OPENNLP_ HOME" variable to determine the location of the OpenNLP binary package. It is recommended that you use this variable to point to the current OpenNLP version of the binary package, and update the PATH variable to include $OPENNLP _ HOME/bin or% OPENNLP _ HOME%\ bin.
This configuration allows for convenient calls to OpenNLP. The following example assumes such a configuration.
# General example #
Apache OpenNLP provides a general command line script to access all of its tools:
$opennlp
This script prints the version of the current package and lists all available tools:
OpenNLP. Usage: opennlp TOOLwhere TOOL is one of: Doccat learnable document categorizer DoccatTrainer trainer for the learnable document categorizer DoccatConverter converts leipzig data format to native OpenNLP format DictionaryBuilder builds a new dictionary SimpleTokenizer character class tokenizer TokenizerME learnable tokenizer TokenizerTrainer trainer for the learnable tokenizer TokenizerMEEvaluator Evaluator for the learnable tokenizer TokenizerCrossValidator K-fold cross validator for the learnable tokenizer TokenizerConverter converts foreign data formats (namefinder Conllx,pos) to native OpenNLP format DictionaryDetokenizer SentenceDetector learnable sentence detector SentenceDetectorTrainer trainer for the learnable sentence detector SentenceDetectorEvaluator evaluator for the learnable sentence detector SentenceDetectorCrossValidator K-fold cross validator for the learnable sentence detector SentenceDetectorConverter converts foreign data formats (namefinder,conllx Pos) to native OpenNLP format TokenNameFinder learnable name finder TokenNameFinderTrainer trainer for the learnable name finder TokenNameFinderEvaluator Measures the performance of the NameFinder model with the reference data TokenNameFinderCrossValidator K-fold cross validator for the learnable NameFinder TokenNameFinderConverter converts foreign data formats (bionlp2004,conll03,conll02 Ad) to native OpenNLP format CensusDictionaryCreator Converts 1990 US Census names into a dictionary POSTagger learnable part of speech tagger POSTaggerTrainer trains a model for the part-of-speech tagger POSTaggerEvaluator Measures the performance of the POS tagger model with the reference data POSTaggerCrossValidator K-fold cross validator for the learnable POS tagger POSTaggerConverter converts conllx data format to native OpenNLP format ChunkerME Learnable chunker ChunkerTrainerME trainer for the learnable chunker ChunkerEvaluator Measures the performance of the Chunker model with the reference data ChunkerCrossValidator K-fold cross validator for the chunker ChunkerConverter converts ad data format to native OpenNLP format Parser performs full syntactic parsing ParserTrainer trains the learnable parser BuildModelUpdater trains and updates the build model in A parser model CheckModelUpdater trains and updates the check model in a parser model TaggerModelReplacer replaces the tagger model in a parser modelAll tools print help when invoked with help parameterExample: opennlp SimpleTokenizer help
The OpenNLP tool has a similar command line structure and options. To discover the tool option, run it without task parameters:
$opennlp ToolName
This tool will enter two help blocks.
The first block describes the general structure of the command line tool:
Usage: opennlp TokenizerTrainer [.namefinder | .conllx | .pos] [- abbDict path]...-model modelFile.
The common structure of this command line tool includes the required tool name (TokenizerTrainer), optional format parameters ([.namefinder | .conllx | .pos]), optional parameters ([- abbDict path]...), and required parameters (- model modelFile.).
Format parameters enable non-localized data to be processed directly without conversion. Each format has its own parameters, and if the corresponding tool executes without a parameter or a parameter help, they will display:
$opennlp TokenizerTrainer.conllx helpUsage: opennlp TokenizerTrainer.conllx [- abbDict path] [- alphaNumOpt isAlphaNumOpt]... Arguments description:-abbDict path abbreviation dictionary in XML format. ...
To convert the current format and make a special format, add a., and the name of the format after the tool name:
$opennlp TokenizerTrainer.conllx-model en-pos.bin...
The second help block describes the unique parameters:
Arguments description:-type maxent | perceptron | perceptron_sequence The type of the token name finder model. One of maxent | perceptron | perceptron_sequence. -dict dictionaryPath The XML tag dictionary file...
Most tools for processing need to provide at least one model:
$opennlp ToolName lang-model-name.bin
When the tool is executed in this way, the model is loaded and the tool waits for input from the standard input device. The input is processed and printed to a standard output.
Instead, or the most common use is to use the console input and output redirection option to provide an input or output file:
$opennlp ToolName lang-model-name.bin
< input.txt >Output.txt
Most tools for model training need to first provide a model name, optional training options (such as model type, number of iterations), and data.
A model name is a file name.
Training options usually include iterations, interruptions, abbreviations dictionary, and other things. Sometimes, these options can be provided through a training options file. In this case, this option will be ignored and options from the file will be used.
For data, you need to specify the path to the data (file name), usually the language and encoding.
A common example of running a tool training from the command line is probably:
$opennlp ToolNameTrainer-model en-model-name.bin-lang en- data input.train-encoding UTF-8
Or with format:
$opennlp ToolNameTrainer.conll03-model en-model-name.bin-lang en- data input.train\-types per-encoding UTF-8
Most of the tools used for model evaluation are similar to those for task execution, and you first need to provide a model name, optional evaluation options (such as whether to print misclassified examples), and test data. A common example of running an evaluation under the command line is probably:
$opennlp ToolNameEvaluator-model en-model-name.bin-lang en- data input.test-encoding UTF-8
That's all for the content of "how to use OpenNLP". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.