In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how to judge a string is English or Java code related knowledge, the content is detailed and easy to understand, the operation is simple and fast, with a certain reference value, I believe that after reading this article on how to judge a string is English or Java code will have something to gain, let's take a look.
Consider the following two strings:
1. For (int I = 0; I
< b.size(); i++) { 2.do something in English (not necessary to be a sentence). 第一个是Java代码,第二个是英文。如何检测第一个是代码,第二个是英文? Java 代码可能无法解析,因为它不是完整的方法/语句/表达式。下面为这个问题提供了一个解决方案。由于有时代码和英文之间没有明确的界限,准确度不可能是 100%。但是,使用下面的解决方案,你可以轻松调整程序以满足你的需求。 基本思想是将字符串转换为一组标记。例如,上面的代码行可能会变成"KEY,SEPARATOR,ID,ASSIGN,NUMBER,SEPARATOR,..."。然后我们可以使用简单的规则将代码与英文分开。 标记器类将字符串转换为标记列表。 package lexical; import java.util.LinkedList;import java.util.regex.Matcher;import java.util.regex.Pattern; public class Tokenizer { private class TokenInfo { public final Pattern regex; public final int token; public TokenInfo(Pattern regex, int token) { super(); this.regex = regex; this.token = token; } } public class Token { public final int token; public final String sequence; public Token(int token, String sequence) { super(); this.token = token; this.sequence = sequence; } } private LinkedList tokenInfos; private LinkedList tokens; public Tokenizer() { tokenInfos = new LinkedList(); tokens = new LinkedList(); } public void add(String regex, int token) { tokenInfos .add(new TokenInfo(Pattern.compile("^(" + regex + ")"), token)); } public void tokenize(String str) { String s = str.trim(); tokens.clear(); while (!s.equals("")) { //System.out.println(s); boolean match = false; for (TokenInfo info : tokenInfos) { Matcher m = info.regex.matcher(s); if (m.find()) { match = true; String tok = m.group().trim(); s = m.replaceFirst("").trim(); tokens.add(new Token(info.token, tok)); break; } } if (!match){ //throw new ParserException("Unexpected character in input: " + s); tokens.clear(); System.out.println("Unexpected character in input: " + s); return; } } } public LinkedList getTokens() { return tokens; } public String getTokensString() { StringBuilder sb = new StringBuilder(); for (Tokenizer.Token tok : tokens) { sb.append(tok.token); } return sb.toString(); }} 我们可以得到Java的关键字、分隔符、运算符、标识符等,如果我们给token分配一个映射值,就可以将一个英文字符串转换为一个token字符串。 package lexical; import greenblocks.javaapiexamples.DB;import java.io.IOException;import java.sql.ResultSet;import java.sql.SQLException;import java.util.regex.Matcher;import java.util.regex.Pattern; import org.apache.commons.lang.StringUtils; import NLP.POSTagger; public class EnglishOrCode { private static Tokenizer tokenizer = null; public static void initializeTokenizer() { tokenizer = new Tokenizer(); //key words String keyString = "abstract assert boolean break byte case catch " + "char class const continue default do double else enum" + " extends false final finally float for goto if implements " + "import instanceof int interface long native new null " + "package private protected public return short static " + "strictfp super switch synchronized this throw throws true " + "transient try void volatile while todo"; String[] keys = keyString.split(" "); String keyStr = StringUtils.join(keys, "|"); tokenizer.add(keyStr, 1); tokenizer.add("\\(|\\)|\\{|\\}|\\[|\\]|;|,|\\.|=|>| |
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.