URLEncoder和URLDecoder实现转码和解码

在Java开发中，URL跳转经常遇到中文乱码问题。实际上，如果细心的话，我们会发现在访问网页时经常会在URL中看到一些16进制格式的字符串,如:http://xxx.com/s?w=%e7%bc

这其实就是用到Java.net包下的URLEncoder和URLDecoder这两个类来对URL参数实现转码和解码。

1、URLDecoder（解码）

源码上对此解释是：

Utility class for HTML form decoding. This class contains static methods for decoding a String from the <CODE>application/x-www-form-urlencoded</CODE>MIME format.

即这是一个HTML格式的解码工具类。该类包含了对一个字符串解码的静态方法！

从源码可看出给出了两种解码方法：

（1）默认格式解码

 public static String decode(String s) {String str = null;try {str = decode(s, dfltEncName);} catch (UnsupportedEncodingException e) {// The system should always have the platform default}return str;}

（2）指定格式解码

 public static String decode(String s, String enc)throws UnsupportedEncodingException{boolean needToChange = false;int numChars = s.length();StringBuffer sb = new StringBuffer(numChars > 500 ? numChars / 2 : numChars);int i = 0;if (enc.length() == 0) {throw new UnsupportedEncodingException ("URLDecoder: empty string enc parameter");}char c;byte[] bytes = null;while (i < numChars) {c = s.charAt(i);switch (c) {case '+':sb.append(' ');i++;needToChange = true;break;case '%':/** Starting with this instance of %, process all* consecutive substrings of the form %xy. Each* substring %xy will yield a byte. Convert all* consecutive  bytes obtained this way to whatever* character(s) they represent in the provided* encoding.*/try {// (numChars-i)/3 is an upper bound for the number// of remaining bytesif (bytes == null)bytes = new byte[(numChars-i)/3];int pos = 0;while ( ((i+2) < numChars) &&(c=='%')) {int v = Integer.parseInt(s.substring(i+1,i+3),16);if (v < 0)throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value");bytes[pos++] = (byte) v;i+= 3;if (i < numChars)c = s.charAt(i);}// A trailing, incomplete byte encoding such as// "%x" will cause an exception to be thrownif ((i < numChars) && (c=='%'))throw new IllegalArgumentException("URLDecoder: Incomplete trailing escape (%) pattern");sb.append(new String(bytes, 0, pos, enc));} catch (NumberFormatException e) {throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - "+ e.getMessage());}needToChange = true;break;default:sb.append(c);i++;break;}}return (needToChange? sb.toString() : s);}

附：解码规则

 <ul>* <li>The alphanumeric characters "{@code a}" through*     "{@code z}", "{@code A}" through*     "{@code Z}" and "{@code 0}"*     through "{@code 9}" remain the same.* <li>The special characters "{@code .}",*     "{@code -}", "{@code *}", and*     "{@code _}" remain the same.* <li>The space character "   " is*     converted into a plus sign "{@code +}".* <li>All other characters are unsafe and are first converted into*     one or more bytes using some encoding scheme. Then each byte is*     represented by the 3-character string*     "<i>{@code %xy}</i>", where <i>xy</i> is the*     two-digit hexadecimal representation of the byte.*     The recommended encoding scheme to use is UTF-8. However,*     for compatibility reasons, if an encoding is not specified,*     then the default encoding of the platform is used.* </ul>

翻译过来就是：

字母数字字符 "a" 到 "z"、"A" 到 "Z" 和 "0" 到 "9" 保持不变。
特殊字符 "."、"-"、"*" 和 "_" 保持不变。
加号 "+" 转换为空格字符 " "。
将把 "%xy" 格式序列视为一个字节，其中 xy 为 8 位的两位十六进制表示形式。然后，所有连续包含一个或多个这些字节序列的子字符串，将被其编码可生成这些连续字节的字符所代替。可以指定对这些字符进行解码的编码机制，或者如果未指定的话，则使用平台的默认编码机制。

示例如下：

  public static void main(String[] args) throws Exception {String encodedString = “%e7%bc%96%e7%a0%81%e6%a0%bc%e5%bc%8f”;URLDecoder.decode(encodedString, "UTF-8");}

2、URLEncoder（转码）

源码上对此解释是：

Utility class for HTML form encoding. This class contains static methods for converting a String to the <CODE>application/x-www-form-urlencoded</CODE> MIME format.

即这是一个HTML格式的转码工具类。该类包含了对一个字符串转码的静态方法！

从源码可看出给出了两种转码方法：

（1）默认格式转码

 public static String encode(String s) {String str = null;try {str = encode(s, dfltEncName);} catch (UnsupportedEncodingException e) {// The system should always have the platform default}return str;}

（2）指定格式转码

  public static String encode(String s, String enc)throws UnsupportedEncodingException {boolean needToChange = false;StringBuffer out = new StringBuffer(s.length());Charset charset;CharArrayWriter charArrayWriter = new CharArrayWriter();if (enc == null)throw new NullPointerException("charsetName");try {charset = Charset.forName(enc);} catch (IllegalCharsetNameException e) {throw new UnsupportedEncodingException(enc);} catch (UnsupportedCharsetException e) {throw new UnsupportedEncodingException(enc);}for (int i = 0; i < s.length();) {int c = (int) s.charAt(i);//System.out.println("Examining character: " + c);if (dontNeedEncoding.get(c)) {if (c == ' ') {c = '+';needToChange = true;}//System.out.println("Storing: " + c);out.append((char)c);i++;} else {// convert to external encoding before hex conversiondo {charArrayWriter.write(c);/** If this character represents the start of a Unicode* surrogate pair, then pass in two characters. It's not* clear what should be done if a bytes reserved in the* surrogate pairs range occurs outside of a legal* surrogate pair. For now, just treat it as if it were* any other character.*/if (c >= 0xD800 && c <= 0xDBFF) {/*System.out.println(Integer.toHexString(c)+ " is high surrogate");*/if ( (i+1) < s.length()) {int d = (int) s.charAt(i+1);/*System.out.println("\tExamining "+ Integer.toHexString(d));*/if (d >= 0xDC00 && d <= 0xDFFF) {/*System.out.println("\t"+ Integer.toHexString(d)+ " is low surrogate");*/charArrayWriter.write(d);i++;}}}i++;} while (i < s.length() && !dontNeedEncoding.get((c = (int) s.charAt(i))));charArrayWriter.flush();String str = new String(charArrayWriter.toCharArray());byte[] ba = str.getBytes(charset);for (int j = 0; j < ba.length; j++) {out.append('%');char ch = Character.forDigit((ba[j] >> 4) & 0xF, 16);// converting to use uppercase letter as part of// the hex value if ch is a letter.if (Character.isLetter(ch)) {ch -= caseDiff;}out.append(ch);ch = Character.forDigit(ba[j] & 0xF, 16);if (Character.isLetter(ch)) {ch -= caseDiff;}out.append(ch);}charArrayWriter.reset();needToChange = true;}}return (needToChange? out.toString() : s);}

附：转码规则

<ul>* <li>The alphanumeric characters "{@code a}" through*     "{@code z}", "{@code A}" through*     "{@code Z}" and "{@code 0}"*     through "{@code 9}" remain the same.* <li>The special characters "{@code .}",*     "{@code -}", "{@code *}", and*     "{@code _}" remain the same.* <li>The space character "   " is*     converted into a plus sign "{@code +}".* <li>All other characters are unsafe and are first converted into*     one or more bytes using some encoding scheme. Then each byte is*     represented by the 3-character string*     "<i>{@code %xy}</i>", where <i>xy</i> is the*     two-digit hexadecimal representation of the byte.*     The recommended encoding scheme to use is UTF-8. However,*     for compatibility reasons, if an encoding is not specified,*     then the default encoding of the platform is used.* </ul>

翻译过来就是：

字母数字字符 "a" 到 "z"、"A" 到 "Z" 和 "0" 到 "9" 保持不变。
特殊字符 "."、"-"、"*" 和 "_" 保持不变。
空格字符 " " 转换为一个加号 "+"。
所有其他字符都是不安全的，因此首先使用一些编码机制将它们转换为一个或多个字节。然后每个字节用一个包含 3 个字符的字符串 "%xy" 表示，其中 xy 为该字节的两位十六进制表示形式。推荐的编码机制是 UTF-8。但是，出于兼容性考虑，如果未指定一种编码，则使用相应平台的默认编码。

示例如下：

  public static void main(String[] args) throws Exception {String encodedString = “编码格式”;URLEncoder.encode(encodedString, "UTF-8");}

URLEncoder和URLDecoder实现转码和解码相关推荐

Java：URLEncoder、URLDecoder、Base64编码与解码
1. URL 主要用来http get请求url不能传输中文参数问题.http请求是不接受中文参数的 1.1 URLEncoder编码使用指定的编码机制将字符串转换为 application/x-w ...
URLEncoder 、URLDecoder 对中文转码解码使用
URLEncoder .URLDecoder 转码解码使用传递参数,转码传递 String encodeStr = null; try { encodeStr = URLEncoder.en ...
使用URLEncoder、URLDecoder进行URL参数的转码与解码
url后参数的转码与解码 import java.net.URLDecoder; import java.net.URLEncoder; String strTest = "?=abc?中% ...
java 中文解码_java使用URLDecoder和URLEncoder对中文字符进行编码和解码
摘要: URLDecoder 和 URLEncoder 用于完成普通字符串和 application/x-www-form-urlencoded MIME 字符串之间的相互转换.在本文中,我们以使用 ...
URLEncoder 、URLDecoder 对 URL 编解码，HttpURLConnection 文件下载
目录 URLEncoder 编码 URLDecoder 解码 URL 空格问题与 HttpURLConnection 文件下载 URLEncoder 编码 1.public class URLE ...
URLEncoder和URLDecoder中特殊字符的处理方案 URL传值问题
在Java中,我们会经常对一些中文字符进行URL编码,这样的就可以在数据传递中解决中文乱码的现象. 但是在对于一些特殊字符的URLEncoder编码后在通过URLDecoder解码处理会出现丢 ...
URLEncoder和URLDecoder（乱码处理）
前言在进行向服务器传递表单数据的实验的时候,发现得到的英文字符正常而中文字符都是乱码.在百思不得其解的时候,学习了一下URLEncoder和URLDecoder,以及顺藤摸瓜找到了产生乱码的原因和解 ...
java qlv转mp4 代码_Java实用工具类五：URL转码、解码类
package com.cn.hnust.util; import java.io.UnsupportedEncodingException; import java.util.HashMap; im ...
Java实用工具类五：URL转码、解码类
此文仅对自己工作中用到的类进行总结,方便以后的使用. package com.cn.hnust.util;import java.io.UnsupportedEncodingException; im ...

URLEncoder和URLDecoder实现转码和解码

URLEncoder和URLDecoder实现转码和解码相关推荐

最新文章

热门文章