拼音工具类(多音字处理)
因项目需求,需要将中文名称转拼音存储,方便查询,考虑到多音字处理,要求最终结果如下:
举例说明:解和景都是多音字(小)
结果:
[xiehejingdushiduoyinzi(xiao), xiehuojingdoushiduoyinzi(xiao), jiehejingdushiduoyinzi(xiao), xieheyingdoushiduoyinzi(xiao), xiehujingdushiduoyinzi(xiao), jiehuoyingdoushiduoyinzi(xiao), jiehujingdoushiduoyinzi(xiao), jiehejingdoushiduoyinzi(xiao), xiehujingdoushiduoyinzi(xiao), jiehuojingdoushiduoyinzi(xiao), xiehaiyingdushiduoyinzi(xiao), xiehejingdoushiduoyinzi(xiao), xiehuyingdushiduoyinzi(xiao), xiehuoyingdoushiduoyinzi(xiao), jiehuoyingdushiduoyinzi(xiao), jiehuyingdushiduoyinzi(xiao), jieheyingdoushiduoyinzi(xiao), jiehuyingdoushiduoyinzi(xiao), jiehuojingdushiduoyinzi(xiao), xiehaiyingdoushiduoyinzi(xiao), jiehaijingdoushiduoyinzi(xiao), jiehaiyingdoushiduoyinzi(xiao), xiehaijingdushiduoyinzi(xiao), xiehuojingdushiduoyinzi(xiao), jieheyingdushiduoyinzi(xiao), jiehaijingdushiduoyinzi(xiao), xieheyingdushiduoyinzi(xiao), jiehujingdushiduoyinzi(xiao), xiehaijingdoushiduoyinzi(xiao), xiehuyingdoushiduoyinzi(xiao), jiehaiyingdushiduoyinzi(xiao), xiehuoyingdushiduoyinzi(xiao)]
废话不多说 上代码
添加依赖
<!-- https://mvnrepository.com/artifact/com.belerweb/pinyin4j --><dependency><groupId>com.belerweb</groupId><artifactId>pinyin4j</artifactId><version>2.5.0</version></dependency>
附上工具类代码
import net.sourceforge.pinyin4j.PinyinHelper;
import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;
import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;
import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;
import org.apache.log4j.Logger;import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;/*** 拼音工具类* @author wxm*/
public class PinYinUtils {public static final Logger logger = Logger.getLogger(PinYinUtils.class);/*** 获取字符串拼音的第一个字母* @param chinese* @return*/public static String ToFirstChar(String chinese){String pinyinStr = "";char[] newChar = chinese.toCharArray(); //转为单个字符HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();defaultFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);for (int i = 0; i < newChar.length; i++) {if (newChar[i] > 128) {try {pinyinStr += PinyinHelper.toHanyuPinyinStringArray(newChar[i], defaultFormat)[0].charAt(0);} catch (BadHanyuPinyinOutputFormatCombination e) {e.printStackTrace();}}else{pinyinStr += newChar[i];}}return pinyinStr;}/*** 汉字转为拼音* 不考虑多音字处理* @param chinese* @return*/public static String ToPinyin1(String chinese){String pinyinStr = "";char[] newChar = chinese.toCharArray();HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();defaultFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);try {for (int i = 0; i < newChar.length; i++) {List<String> strs = new ArrayList<>();if (newChar[i] > 128) {if(regEx(String.valueOf(newChar[i]))){pinyinStr += newChar[i];}else{pinyinStr += PinyinHelper.toHanyuPinyinStringArray(newChar[i], defaultFormat)[0];}}else{pinyinStr += newChar[i];}}} catch (Exception e) {logger.error("汉字转拼音异常:"+chinese);}return pinyinStr;}/*** 汉字转为拼音* 考虑多音字处理* @param chinese* @return*/public static String ToPinyin2(String chinese){String pinyinStr = "";char[] newChar = chinese.toCharArray();HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();defaultFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);List<List<String>> list = new ArrayList<>();try {for (int i = 0; i < newChar.length; i++) {List<String> strs = new ArrayList<>();if (newChar[i] > 128) {if(regEx(String.valueOf(newChar[i]))){strs.add(String.valueOf(newChar[i]));}else{int num = PinyinHelper.toHanyuPinyinStringArray(newChar[i], defaultFormat).length;for (int j = 0; j < num; j++) {strs.add(PinyinHelper.toHanyuPinyinStringArray(newChar[i], defaultFormat)[j]);}}}else{strs.add(String.valueOf(newChar[i]));}list.add(strs);}} catch (Exception e) {logger.error("汉字转拼音异常:"+chinese);}pinyinStr = strArray(list).toString();return pinyinStr;}public static boolean regEx(String s){String regEx="[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";Pattern p=Pattern.compile(regEx);Matcher m=p.matcher(s);return m.find();}public static Set<String> strArray(List<List<String>> list){if(list==null||list.isEmpty()){return new HashSet<String>();}Set<String> set = new HashSet<String>();for (List<String> item : list) {set = splice(item,set);}return set;}private static Set<String> splice(List<String> item, Set<String> set) {Set<String> result = new HashSet<String>();for (String str1 : item) {if(set.isEmpty()){result.add(str1);continue;}for (String str2 : set) {result.add(str2+str1);}}return result;}/*** 测试main方法* @param args*/public static void main(String[] args) {System.out.println(ToFirstChar("汉字转换为拼音").toUpperCase()); //转为首字母大写System.out.println("结果:"+ToPinyin2("解和景都是多音字(小)"));}
}
拼音工具类(多音字处理)相关推荐
- Java 中文转拼音工具类 (附带长度转换 2:1)
Java 中文转拼音工具类 (附带长度转换 2:1) import com.google.common.collect.Lists; import com.google.common.collect. ...
- 中文转换为拼音工具类(很全)
中文转换为拼音工具类(很全) 1.所需的jar包 2.工具类(可以直接拿去用) 1.所需的jar包 <!--获取汉字的拼音--><dependency><groupId& ...
- Java汉字转换拼音工具类
1. 使用pinyin4j 1.1 引入相关maven依赖 <dependency><groupId>com.belerweb</groupId><artif ...
- Java 文字转拼音工具类
需要引入的pom <dependency><groupId>com.belerweb</groupId><artifactId>pinyin4j< ...
- 汉字转拼音(工具类)
2019独角兽企业重金招聘Python工程师标准>>> package com.qst.tesc.course.web.rest.util; import java.io.Unsup ...
- Java汉字转为拼音工具类
依赖文件 <!-- https://mvnrepository.com/artifact/com.belerweb/pinyin4j --><dependency><gr ...
- 汉字转拼音工具类,依赖Pinyin4J
汉字转拼音工具类,依赖Pinyin4J Maven 坐标 <dependency><groupId>com.belerweb</groupId><artifa ...
- java汉字转拼音工具类源代码
原文:java汉字转拼音工具类源代码 源代码下载地址:http://www.zuidaima.com/share/1550463387880448.htm 汉字转拼音 Pinyin pinyin = ...
- 「Java工具类」汉语转拼音工具类HanyuPinyinHelper.java
介绍语 本号主要是Java常用关键技术点,通用工具类的分享:以及springboot+springcloud+Mybatisplus+druid+mysql+redis+swagger+maven+d ...
- Java汉字转拼音工具类(支持首字母和全拼)
工具类产生是因为个人业务需求需要根据中文汉字排序,而博主又对网上回答不满意,所以才根据相关资料写了该工具类,写入博客 以备不时之需.直接上代码: Java汉字转成汉语拼音工具类,需要用到pinyin4 ...
最新文章
- 2021年,神经科学AI有这几大趋势
- 查看源代码不方便?我有利器
- wps 模拟分析 规划求解_Excel数据分析两大利器,趋势预测与规划求解
- Opencv Kmeans聚类算法
- 2.11 while循环的嵌套以及应用(难)
- 电机编码器调零步骤_编码器原理、霍尔应用原理、调整步骤三个方面进行解读编码器调试...
- 常用电源芯片特性大集合
- 硬盘容量统计显示WinDirStat v1.1.2.79(印心绿化版)
- mysql中不重复_mysql中distinct的用法(不重复记录)
- 基于Tensorflow的MINIST手写体识别
- 职场中,什么样的人最容易升职?
- AI反欺诈:千亿的蓝海,烫手的山芋|甲子光年
- 对于Transformer 模型----可以从哪些地方进行创新和改进
- 帆软 大屏BI模板(含报表滚动,图表联动等)下载
- 作为一个才刚刚开始学习java的小白 居然显示码龄3年??每天吃饭点菜成为了一个难题 然后今天简单写了一个随机菜单
- 【C语言】#ifdef和#endif条件编译
- Farmer John的故事
- 马士兵python_马士兵老师的python入门教程
- 跑路、清退or出海?这道留给交易所的题太难
- pytorch基础学习(四) 数据处理(一)