在我们的系统中,可能经常需要按首字母排序一些信息(比如淘宝商城的品牌列表字母序排列),那么我们就需要一个能够根据汉字查询对应的拼音,取出拼音的首字母即可。

我们使用sourceforge.pinyin4j开源包来完成我们的功能。

使用很简单:

提供的工具类是下面这个PinyinHelper.java help类,里面有所有开放的API,有几个方法是对应转换成不同的拼音系统,关于拼音系统大家可以查看 http://wenku.baidu.com/view/28dda445b307e87101f696f9.html

/*** This file is part of pinyin4j (http://sourceforge.net/projects/pinyin4j/) * and distributed under GNU GENERAL PUBLIC LICENSE (GPL).* * pinyin4j is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * pinyin4j is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with pinyin4j.*/package net.sourceforge.pinyin4j;import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;/*** A class provides several utility functions to convert Chinese characters* (both Simplified and Tranditional) into various Chinese Romanization* representations* * @author Li Min (xmlerlimin@gmail.com)*/
public class PinyinHelper
{/*** Get all unformmatted Hanyu Pinyin presentations of a single Chinese* character (both Simplified and Tranditional)* * <p>* For example, <br/> If the input is '间', the return will be an array with* two Hanyu Pinyin strings: <br/> "jian1" <br/> "jian4" <br/> <br/> If the* input is '李', the return will be an array with single Hanyu Pinyin* string: <br/> "li3"* * <p>* <b>Special Note</b>: If the return is "none0", that means the input* Chinese character exists in Unicode CJK talbe, however, it has no* pronounciation in Chinese* * @param ch*            the given Chinese character* * @return a String array contains all unformmatted Hanyu Pinyin*         presentations with tone numbers; null for non-Chinese character* */static public String[] toHanyuPinyinStringArray(char ch){return getUnformattedHanyuPinyinStringArray(ch);}/*** Get all Hanyu Pinyin presentations of a single Chinese character (both* Simplified and Tranditional)* * <p>* For example, <br/> If the input is '间', the return will be an array with* two Hanyu Pinyin strings: <br/> "jian1" <br/> "jian4" <br/> <br/> If the* input is '李', the return will be an array with single Hanyu Pinyin* string: <br/> "li3"* * <p>* <b>Special Note</b>: If the return is "none0", that means the input* Chinese character is in Unicode CJK talbe, however, it has no* pronounciation in Chinese* * @param ch*            the given Chinese character* @param outputFormat*            describes the desired format of returned Hanyu Pinyin String* * @return a String array contains all Hanyu Pinyin presentations with tone*         numbers; return null for non-Chinese character* * @throws BadHanyuPinyinOutputFormatCombination*             if certain combination of output formats happens* * @see HanyuPinyinOutputFormat* @see BadHanyuPinyinOutputFormatCombination* */static public String[] toHanyuPinyinStringArray(char ch,HanyuPinyinOutputFormat outputFormat)throws BadHanyuPinyinOutputFormatCombination{return getFormattedHanyuPinyinStringArray(ch, outputFormat);}/*** Return the formatted Hanyu Pinyin representations of the given Chinese* character (both in Simplified and Tranditional) in array format.* * @param ch*            the given Chinese character* @param outputFormat*            Describes the desired format of returned Hanyu Pinyin string* @return The formatted Hanyu Pinyin representations of the given codepoint*         in array format; null if no record is found in the hashtable.*/static private String[] getFormattedHanyuPinyinStringArray(char ch,HanyuPinyinOutputFormat outputFormat)throws BadHanyuPinyinOutputFormatCombination{String[] pinyinStrArray = getUnformattedHanyuPinyinStringArray(ch);if (null != pinyinStrArray){for (int i = 0; i < pinyinStrArray.length; i++){pinyinStrArray[i] = PinyinFormatter.formatHanyuPinyin(pinyinStrArray[i], outputFormat);}return pinyinStrArray;} elsereturn null;}/*** Delegate function* * @param ch*            the given Chinese character* @return unformatted Hanyu Pinyin strings; null if the record is not found*/private static String[] getUnformattedHanyuPinyinStringArray(char ch){return ChineseToPinyinResource.getInstance().getHanyuPinyinStringArray(ch);}/*** Get all unformmatted Tongyong Pinyin presentations of a single Chinese* character (both Simplified and Tranditional)* * @param ch*            the given Chinese character* * @return a String array contains all unformmatted Tongyong Pinyin*         presentations with tone numbers; null for non-Chinese character* * @see #toHanyuPinyinStringArray(char)* */static public String[] toTongyongPinyinStringArray(char ch){return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.TONGYONG_PINYIN);}/*** Get all unformmatted Wade-Giles presentations of a single Chinese* character (both Simplified and Tranditional)* * @param ch*            the given Chinese character* * @return a String array contains all unformmatted Wade-Giles presentations*         with tone numbers; null for non-Chinese character* * @see #toHanyuPinyinStringArray(char)* */static public String[] toWadeGilesPinyinStringArray(char ch){return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.WADEGILES_PINYIN);}/*** Get all unformmatted MPS2 (Mandarin Phonetic Symbols 2) presentations of* a single Chinese character (both Simplified and Tranditional)* * @param ch*            the given Chinese character* * @return a String array contains all unformmatted MPS2 (Mandarin Phonetic*         Symbols 2) presentations with tone numbers; null for non-Chinese*         character* * @see #toHanyuPinyinStringArray(char)* */static public String[] toMPS2PinyinStringArray(char ch){return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.MPS2_PINYIN);}/*** Get all unformmatted Yale Pinyin presentations of a single Chinese* character (both Simplified and Tranditional)* * @param ch*            the given Chinese character* * @return a String array contains all unformmatted Yale Pinyin*         presentations with tone numbers; null for non-Chinese character* * @see #toHanyuPinyinStringArray(char)* */static public String[] toYalePinyinStringArray(char ch){return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.YALE_PINYIN);}/*** @param ch*            the given Chinese character* @param targetPinyinSystem*            indicates target Chinese Romanization system should be*            converted to* @return string representations of target Chinese Romanization system*         corresponding to the given Chinese character in array format;*         null if error happens* * @see PinyinRomanizationType*/private static String[] convertToTargetPinyinStringArray(char ch,PinyinRomanizationType targetPinyinSystem){String[] hanyuPinyinStringArray = getUnformattedHanyuPinyinStringArray(ch);if (null != hanyuPinyinStringArray){String[] targetPinyinStringArray = new String[hanyuPinyinStringArray.length];for (int i = 0; i < hanyuPinyinStringArray.length; i++){targetPinyinStringArray[i] = PinyinRomanizationTranslator.convertRomanizationSystem(hanyuPinyinStringArray[i], PinyinRomanizationType.HANYU_PINYIN, targetPinyinSystem);}return targetPinyinStringArray;} elsereturn null;}/*** Get all unformmatted Gwoyeu Romatzyh presentations of a single Chinese* character (both Simplified and Tranditional)* * @param ch*            the given Chinese character* * @return a String array contains all unformmatted Gwoyeu Romatzyh*         presentations with tone numbers; null for non-Chinese character* * @see #toHanyuPinyinStringArray(char)* */static public String[] toGwoyeuRomatzyhStringArray(char ch){return convertToGwoyeuRomatzyhStringArray(ch);}/*** @param ch*            the given Chinese character* * @return Gwoyeu Romatzyh string representations corresponding to the given*         Chinese character in array format; null if error happens* * @see PinyinRomanizationType*/private static String[] convertToGwoyeuRomatzyhStringArray(char ch){String[] hanyuPinyinStringArray = getUnformattedHanyuPinyinStringArray(ch);if (null != hanyuPinyinStringArray){String[] targetPinyinStringArray = new String[hanyuPinyinStringArray.length];for (int i = 0; i < hanyuPinyinStringArray.length; i++){targetPinyinStringArray[i] = GwoyeuRomatzyhTranslator.convertHanyuPinyinToGwoyeuRomatzyh(hanyuPinyinStringArray[i]);}return targetPinyinStringArray;} elsereturn null;}/*** Get a string which all Chinese characters are replaced by corresponding* main (first) Hanyu Pinyin representation.* * <p>* <b>Special Note</b>: If the return contains "none0", that means that* Chinese character is in Unicode CJK talbe, however, it has not* pronounciation in Chinese. <b> This interface will be removed in next* release. </b>* * @param str*            A given string contains Chinese characters* @param outputFormat*            Describes the desired format of returned Hanyu Pinyin string* @param seperater*            The string is appended after a Chinese character (excluding*            the last Chinese character at the end of sentence). <b>Note!*            Seperater will not appear after a non-Chinese character</b>* @return a String identical to the original one but all recognizable*         Chinese characters are converted into main (first) Hanyu Pinyin*         representation* * @deprecated DO NOT use it again because the first retrived pinyin string*             may be a wrong pronouciation in a certain sentence context.*             <b> This interface will be removed in next release. </b>*/static public String toHanyuPinyinString(String str,HanyuPinyinOutputFormat outputFormat, String seperater)throws BadHanyuPinyinOutputFormatCombination{StringBuffer resultPinyinStrBuf = new StringBuffer();for (int i = 0; i < str.length(); i++){String mainPinyinStrOfChar = getFirstHanyuPinyinString(str.charAt(i), outputFormat);if (null != mainPinyinStrOfChar){resultPinyinStrBuf.append(mainPinyinStrOfChar);if (i != str.length() - 1){ // avoid appending at the endresultPinyinStrBuf.append(seperater);}} else{resultPinyinStrBuf.append(str.charAt(i));}}return resultPinyinStrBuf.toString();}/*** Get the first Hanyu Pinyin of a Chinese character <b> This function will* be removed in next release. </b>* * @param ch*            The given Unicode character* @param outputFormat*            Describes the desired format of returned Hanyu Pinyin string* @return Return the first Hanyu Pinyin of given Chinese character; return*         null if the input is not a Chinese character* * @deprecated DO NOT use it again because the first retrived pinyin string*             may be a wrong pronouciation in a certain sentence context.*             <b> This function will be removed in next release. </b>*/static private String getFirstHanyuPinyinString(char ch,HanyuPinyinOutputFormat outputFormat)throws BadHanyuPinyinOutputFormatCombination{String[] pinyinStrArray = getFormattedHanyuPinyinStringArray(ch, outputFormat);if ((null != pinyinStrArray) && (pinyinStrArray.length > 0)){return pinyinStrArray[0];} else{return null;}}// ! Hidden constructorprivate PinyinHelper(){}
}

拼音系统列表如下:

/*** This file is part of pinyin4j (http://sourceforge.net/projects/pinyin4j/) * and distributed under GNU GENERAL PUBLIC LICENSE (GPL).* * pinyin4j is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * pinyin4j is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with pinyin4j.*//*** */
package net.sourceforge.pinyin4j;/*** The class describes variable Chinese Pinyin Romanization System* * @author Li Min (xmlerlimin@gmail.com)* */
class PinyinRomanizationType
{/*** Hanyu Pinyin system*/static final PinyinRomanizationType HANYU_PINYIN = new PinyinRomanizationType("Hanyu");/*** Wade-Giles Pinyin system*/static final PinyinRomanizationType WADEGILES_PINYIN = new PinyinRomanizationType("Wade");/*** Mandarin Phonetic Symbols 2 (MPS2) Pinyin system*/static final PinyinRomanizationType MPS2_PINYIN = new PinyinRomanizationType("MPSII");/*** Yale Pinyin system*/static final PinyinRomanizationType YALE_PINYIN = new PinyinRomanizationType("Yale");/*** Tongyong Pinyin system*/static final PinyinRomanizationType TONGYONG_PINYIN = new PinyinRomanizationType("Tongyong");/*** Gwoyeu Romatzyh system*/static final PinyinRomanizationType GWOYEU_ROMATZYH = new PinyinRomanizationType("Gwoyeu");/*** Constructor*/protected PinyinRomanizationType(String tagName){setTagName(tagName);}/*** @return Returns the tagName.*/String getTagName(){return tagName;}/*** @param tagName*            The tagName to set.*/protected void setTagName(String tagName){this.tagName = tagName;}protected String tagName;
}

我们使用的API demo如下:

package demo;import net.sourceforge.pinyin4j.PinyinHelper;
import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;
import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;
import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;
import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;public class MyPinyinDemo {/*** @param args* @throws BadHanyuPinyinOutputFormatCombination */public static void main(String[] args) throws BadHanyuPinyinOutputFormatCombination {char chineseCharacter = "绿".charAt(0);HanyuPinyinOutputFormat outputFormat = new HanyuPinyinOutputFormat();outputFormat.setToneType(HanyuPinyinToneType.WITH_TONE_NUMBER); // 输出的声调为数字:第一声为1,第二声为2,第三声为3,第四声为4 如:lu:4
//      outputFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE); // 输出拼音不带声调 如:lu:
//      outputFormat.setToneType(HanyuPinyinToneType.WITH_TONE_MARK); // 输出声调在拼音字母上 如:lǜoutputFormat.setVCharType(HanyuPinyinVCharType.WITH_U_AND_COLON); //ǜ的输出格式设置  'ü' 输出为 "u:"
//      outputFormat.setVCharType(HanyuPinyinVCharType.WITH_U_UNICODE); //ǜ的输出格式设置  'ü' 输出为 "ü" in Unicode form
//      outputFormat.setVCharType(HanyuPinyinVCharType.WITH_V); //ǜ的输出格式设置  'ü' 输出为 "v"outputFormat.setCaseType(HanyuPinyinCaseType.UPPERCASE); //输出拼音为大写
//      outputFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE); //输出拼音为小写String[] pinyinArray = PinyinHelper.toHanyuPinyinStringArray(chineseCharacter, outputFormat); //汉字拼音for(String str: pinyinArray){ //多音字输出,会返回多音字的格式System.out.println(str);}String pinyinstr = PinyinHelper.toHanyuPinyinString("绿色", outputFormat, "|");System.out.println(pinyinstr);//其他拼音系统的输出String[] GwoyeuRomatzyhStringArray = PinyinHelper.toGwoyeuRomatzyhStringArray(chineseCharacter);for(String str: GwoyeuRomatzyhStringArray){ //多音字输出,会返回多音字的格式System.out.println(str);}String[] MPS2PinyinStringArray = PinyinHelper.toMPS2PinyinStringArray(chineseCharacter);for(String str: MPS2PinyinStringArray){ //多音字输出,会返回多音字的格式System.out.println(str);}String[] TongyongPinyinStringArray = PinyinHelper.toTongyongPinyinStringArray(chineseCharacter);for(String str: TongyongPinyinStringArray){ //多音字输出,会返回多音字的格式System.out.println(str);}String[] WadeGilesPinyinStringArray = PinyinHelper.toWadeGilesPinyinStringArray(chineseCharacter);for(String str: WadeGilesPinyinStringArray){ //多音字输出,会返回多音字的格式System.out.println(str);}String[] YalePinyinStringArray = PinyinHelper.toYalePinyinStringArray(chineseCharacter);for(String str: YalePinyinStringArray){ //多音字输出,会返回多音字的格式System.out.println(str);}}}

输出:

LU:4
LU4
LU:4|SE4
liuh
luh
liu4
lu4
lyu4
lu4
lu:4
lu4
lyu4
lu4

这个拼音包里还自带了一个demo, Pinyin4jAppletDemo.java

至于实现,其实很简单,就是有一个词典,汉字跟拼音的对应关系文件词典,unicode_to_hanyu_pinyin.txt是汉字的unicode字符对应的拼音对应表,pinyin_mapping.xml是汉语拼音系统跟其他系统的对照表,pinyin_Gwoyeu_mapping.xml是汉语系统跟Gwoyeu拼音系统的对照列表。格式参考如下,其实整理完这些之后就很容易实现了。

<?xml version="1.0"?>
<pinyin_mapping><item><Hanyu>a</Hanyu><Wade>a</Wade><MPSII>a</MPSII><Yale>a</Yale><Tongyong>a</Tongyong></item><item><Hanyu>ai</Hanyu><Wade>ai</Wade><MPSII>ai</MPSII><Yale>ai</Yale><Tongyong>ai</Tongyong></item>
<pinyin_gwoyeu_mapping><item><Hanyu>a</Hanyu><Gwoyeu_I>a</Gwoyeu_I><Gwoyeu_II>ar</Gwoyeu_II><Gwoyeu_III>aa</Gwoyeu_III><Gwoyeu_IV>ah</Gwoyeu_IV><Gwoyeu_V>.a</Gwoyeu_V></item><item><Hanyu>ai</Hanyu><Gwoyeu_I>ai</Gwoyeu_I><Gwoyeu_II>air</Gwoyeu_II><Gwoyeu_III>ae</Gwoyeu_III><Gwoyeu_IV>ay</Gwoyeu_IV><Gwoyeu_V>.ai</Gwoyeu_V></item>

java 使用sourceforge.pinyin4j查询汉字拼音相关推荐

  1. mysql查询汉字拼音首字母的方法_MySQL查询汉字拼音首字母的方法

    下面为您介绍了MySQL查询汉字拼音首字母的方法,该方法极具实用价值,如果您之前遇到过类似方面的问题,不妨一看. MySQL查询汉字拼音首字母方法如下: 1.建立拼音首字母资料表 Sql代码: DRO ...

  2. mysql 汉字首字母_MySQL查询汉字拼音首字母的方法

    下面为您介绍了MySQL查询汉字拼音首字母的方法,该方法极具实用价值,如果您之前遇到过类似方面的问题,不妨一看. MySQL查询汉字拼音首字母方法如下: 1.建立拼音首字母资料表 Sql代码: DRO ...

  3. Java:利用pinyin4j实现汉字转拼音

    汉字转拼音是一个比较实用的功能,这里演示第三方库pinyin4j如何实现此功能 <!-- 导入pom依赖 --> <dependency><groupId>com. ...

  4. 使用pinyin4j获取汉字拼音首字母或全拼

    转载地址:http://yjck.iteye.com/blog/816107#bc2356769 pinyin4j是一个开源项目,使用它可以很容易的获取汉字的拼音,这也是我们经常需要用到的功能:下面是 ...

  5. mysql查询汉字拼音首字母_MySQL查询汉字的拼音首字母实例教程

    最好的方法还是用 PHP 来取拼音首字母,在 MySQL 里新建一个字段来存放 php 里查询汉字的拼音首字母已经有很多参考的代码了. 现在给出在mysql 里实现的, 测试环境是mysql-5.0. ...

  6. 类似qq的汉字拼音首字查询

    一个demo,大概思路是这样:用Properties的load()来加载一个固定格式的文本文件注1,然后直接当Hashtable用. package org.navyblue.tests; impor ...

  7. java 根据拼音查询汉字_java根据拼音搜索,但数据库为汉字的解决方案

    [Java] 纯文本查看 复制代码**1.以下代码是一个文字转拼音的工具类** import org.springframework.stereotype.Component; import net. ...

  8. java 拼音转汉字_Java通过pinyin4j实现汉字转拼音

    package com.zxy.timecard.utils; import net.sourceforge.pinyin4j.PinyinHelper; /** * 拼音工具类 * @author ...

  9. java 汉字拼音排序_Java汉字排序(2)按拼音排序

    1.前言 对于包含汉字的字符串来说,排序的方式主要有两种: 一种是拼音,一种是笔画. 本文就讲述如何实现按拼音排序的比较器(Comparator). 作者:Jeff 发表于:2007年12月21日 1 ...

最新文章

  1. 机器翻译难敌人类灵活多变的语言
  2. Metasploit技巧命令支持tips
  3. LIBRARY_PATH和LD_LIBRARY_PATH环境变量的区别
  4. 不是内部或外部命令也不是可运行的程序?
  5. 我的嵌入式开发之路(.Net Micro Framework)
  6. 判断输入的整数是否为素数_C语言 | 判断是否素数
  7. 最全的CSS浏览器兼容问题(转至http://68design.net/Web-Guide/HTMLCSS/37154-1.html)
  8. 防止android应用的内存泄露
  9. POJ 3178 凸包+DP (巨坑)
  10. 凸优化第六章逼近与拟合 6.3 正则化逼近
  11. ffmpeg截取视频
  12. 翟菜花:搭上营销快通车的乳业,又是如何玩转互联网营销时代的?
  13. 《回炉重造》——注解
  14. MAXIMO开发代码记录
  15. 开学季好物怎么选,学生党必备的几款好物分享
  16. 沃尔沃升级刷藏功能取消限速180km解除行车播放视频关闭自动启停系统
  17. C#实现屏幕键盘(软键盘 ScreenKeyboard)
  18. python 频数统计_日常答疑:Python实现分类频数统计
  19. Codeblocks 新建操作(单c文件新建和c工程创建)
  20. 5码默认版块_短说社区论坛系统版块权限功能

热门文章

  1. 关于uint与int
  2. 学习笔记0518----nginx和php-fpm配置
  3. 【笔记分享】H桥电机正反转
  4. 【springboot进阶】RestTemplate 集成 okhttp3 请求带p12证书
  5. 8g内存学习计算机专业够吗,电脑8g内存够用吗 内存多大才够
  6. 机器学习算法:kNN和Weighted kNN
  7. 浅析Volatility内存取证
  8. JAVA集合专题+源码分析
  9. 计算机考研调剂专业课,一波七折的计算机考研初试调剂经验教训贴,别放弃,太阳还在...
  10. 五面阿里拿下飞猪事业部offer,面试题附答案