thinking-in-java(13) String字符串

【13.1】不可变String

1）String对象是不可变的，具有只读特性；

【荔枝-String对象是不可变的】

public class Immutable {public static String upcase(String s) {return s.toUpperCase();}public static void main(String[] args) {String q = "howdy";print(q); // howdyString qq = upcase(q);print(qq); // HOWDYprint(q); // howdy(原有 String 没有改变)}
}
// howdy
// HOWDY
// howdy

【代码解说】

字符串q 传给 upcase() 方法时，实际上传递的是 s 引用的拷贝；

// Stirng.toUpperCase() 源码
public String toUpperCase(Locale locale) {if (locale == null) {throw new NullPointerException();}int firstLower;final int len = value.length;/* Now check if there are any characters that need to be changed. */scan: {for (firstLower = 0 ; firstLower < len; ) {int c = (int)value[firstLower];int srcCount;if ((c >= Character.MIN_HIGH_SURROGATE)&& (c <= Character.MAX_HIGH_SURROGATE)) {c = codePointAt(firstLower);srcCount = Character.charCount(c);} else {srcCount = 1;}int upperCaseChar = Character.toUpperCaseEx(c);if ((upperCaseChar == Character.ERROR)|| (c != upperCaseChar)) {break scan;}firstLower += srcCount;}return this;}/* result may grow, so i+resultOffset is the write location in result */int resultOffset = 0;char[] result = new char[len]; /* may grow *//* Just copy the first few upperCase characters. */System.arraycopy(value, 0, result, 0, firstLower);String lang = locale.getLanguage();boolean localeDependent =(lang == "tr" || lang == "az" || lang == "lt");char[] upperCharArray;int upperChar;int srcChar;int srcCount;for (int i = firstLower; i < len; i += srcCount) {srcChar = (int)value[i];if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&(char)srcChar <= Character.MAX_HIGH_SURROGATE) {srcChar = codePointAt(i);srcCount = Character.charCount(srcChar);} else {srcCount = 1;}if (localeDependent) {upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);} else {upperChar = Character.toUpperCaseEx(srcChar);}if ((upperChar == Character.ERROR)|| (upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {if (upperChar == Character.ERROR) {if (localeDependent) {upperCharArray =ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);} else {upperCharArray = Character.toUpperCaseCharArray(srcChar);}} else if (srcCount == 2) {resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;continue;} else {upperCharArray = Character.toChars(upperChar);}/* Grow result if needed */int mapLen = upperCharArray.length;if (mapLen > srcCount) {char[] result2 = new char[result.length + mapLen - srcCount];System.arraycopy(result, 0, result2, 0, i + resultOffset);result = result2;}for (int x = 0; x < mapLen; ++x) {result[i + resultOffset + x] = upperCharArray[x];}resultOffset += (mapLen - srcCount);} else {result[i + resultOffset] = (char)upperChar;}}return new String(result, 0, len + resultOffset);}

【13.2】重载运算符 + 与 StringBuilder

1）重载的意思： 一个操作符在应用于特定的类时，被赋予特殊意义；

（Attention：用于String 的 + 和 += 是java中仅有的两个重载过的操作符，而java 并不允许程序员重载任何操作符）

【荔枝-字符串重载符+】

// 对于 + 运算符，编译器实际上创建了一个 StringBuilder()
// append() 方法 表示重载的  + 运算符
public class Concatenation {public static void main(String[] args) {String mango = "mango";String s = "abc" + mango + "def" + 47;System.out.println(s);}
}
/*
abcmangodef47
*/

【代码解说】

字符串连接符 + 的性能非常低下。。因为为了生成最终的string，会产生大量需要垃圾回收的中间对象；

2）通过javap 来反编译Concatenation

E:\bench-cluster\spring_in_action_eclipse\AThinkingInJava\src>javap -c chapter13.Concatenation
Compiled from "Concatenation.java"
public class chapter13.Concatenation {public chapter13.Concatenation();Code:0: aload_01: invokespecial #1                  // Method java/lang/Object."<init>":()V4: returnpublic static void main(java.lang.String[]);Code:0: ldc           #2                  // String mango2: astore_13: new           #3                  // class java/lang/StringBuilder6: dup7: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V10: ldc           #5                  // String abc12: invokevirtual #6                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;15: aload_116: invokevirtual #6                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;19: ldc           #7                  // String def21: invokevirtual #6                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;24: bipush        4726: invokevirtual #8                  // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;29: invokevirtual #9                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;32: astore_233: getstatic     #10                 // Field java/lang/System.out:Ljava/io/PrintStream;36: aload_237: invokevirtual #11                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V40: return
}

【代码解说】

第3行： 编译器自动引入了 java.lang.StringBuilder 类，即使源代码中没有使用 StringBuilder，但是显然StringBuilder 更加有效；

3）编译器能为String 处理效率优化到什么程度？

// 利用 StringBuilder.append() 来重载 + 运算符
public class WhitherStringBuilder {public String implicit(String[] fields) { // 方法一：使用多个String对象String result = "";for (int i = 0; i < fields.length; i++) // （效率低）隐式创建 StringBuilderresult += fields[i]; return result;} // 因为 StringBuilder是在循环内创建的，这意味着 每经过循环一次，就会创建一个新的 StringBuilder对象public String explicit(String[] fields) {  // 方法二：使用StringBuilder，因为效率高StringBuilder result = new StringBuilder(); // （效率高）显式创建 StringBuilderfor (int i = 0; i < fields.length; i++)result.append(fields[i]);return result.toString();}
}

4）StringBuilder 补充： 可以为StringBuilder 预先指定大小，如果知道最终的字符串长度，可以预先指定StringBuilder的大小，以避免多次重新分配缓冲；

【StringBuilder的荔枝】

5）如果要在toString() 方法中使用循环的话，最好自己创建一个StringBuidler 对象；

/* toString() 方法中使用循环的荔枝 */
public class UsingStringBuilder {public static Random rand = new Random(47);public String toString() {StringBuilder result = new StringBuilder("[");for (int i = 0; i < 25; i++) {result.append(rand.nextInt(100));result.append(", ");}result.delete(result.length() - 2, result.length()); // 删除最后两个字符result.append("]");return result.toString();}public static void main(String[] args) {UsingStringBuilder usb = new UsingStringBuilder();System.out.println(usb);}
}
/*
[58, 55, 93, 61, 61, 29, 68, 0, 22, 7, 88, 28, 51,89, 9, 78, 98, 61, 20, 58, 16, 40, 11, 22, 4 ]
*/

6）StringBuilder方法列表： insert, replace, substring, reverse, 最常用的方法是 append 和 toString() 方法；

7）StringBuilder 和 StringBuffer

7.1）StringBuilder： 线程不安全，效率高；（java SE5 引入）

7.2）StringBuffer： 线程安全，效率高；（java se 5 之前使用）

【13.3.】无意识的递归

1）所有java的根基类都是 Object， 所以容器类都有 toString() 方法。容器的toString() 方法都能够表达容器自身和容器所包含的对象；

【看个荔枝】

class Latte extends Coffee {}
class Americano extends Coffee {}
class Cappuccino extends Coffee {}
class Mocha extends Coffee {}
class Breve extends Coffee {}public class CoffeeGenerator implements Generator<Coffee>, Iterable<Coffee> {private Class[] types = { Latte.class, Mocha.class, Cappuccino.class,Americano.class, Breve.class, };private static Random rand = new Random(47);public CoffeeGenerator() {}// For iteration:private int size = 0;public CoffeeGenerator(int sz) {size = sz;}public Coffee next() {try {return (Coffee) types[rand.nextInt(types.length)].newInstance();// Report programmer errors at run time:} catch (Exception e) {throw new RuntimeException(e);}}class CoffeeIterator implements Iterator<Coffee> { // 内部迭代器类int count = size;public boolean hasNext() {return count > 0;}public Coffee next() {count--;return CoffeeGenerator.this.next();}public void remove() { // Not implementedthrow new UnsupportedOperationException();}};public Iterator<Coffee> iterator() { // 返回迭代器return new CoffeeIterator();}public static void main(String[] args) {CoffeeGenerator gen = new CoffeeGenerator();for (int i = 0; i < 10; i++)System.out.print(gen.next() + " ");}
}
/*
Americano 0 Latte 1 Americano 2 Mocha 3 Mocha 4 Breve 5 Americano 6 Latte 7 Cappuccino 8 Cappuccino 9
*/

【荔枝-toString方法调用内存地址】

// 无限递归 使得 java虚拟机栈被顶满, 然后抛出异常
public class InfiniteRecursion {@Overridepublic String toString() {// toString() 中的this关键字是 引起无限递归的原因
//      return " InfiniteRecursion address: " + this + "\n"; // Exception in thread "main" java.lang.StackOverflowErrorreturn " InfiniteRecursion address: " + super.toString() + "\n";}public static void main(String[] args) {List<InfiniteRecursion> v = new ArrayList<InfiniteRecursion>();for (int i = 0; i < 10; i++)v.add(new InfiniteRecursion());System.out.println(v);}
}
/*
[ InfiniteRecursion address: chapter13.InfiniteRecursion@15db9742
,  InfiniteRecursion address: chapter13.InfiniteRecursion@6d06d69c
,  InfiniteRecursion address: chapter13.InfiniteRecursion@7852e922
,  InfiniteRecursion address: chapter13.InfiniteRecursion@4e25154f
,  InfiniteRecursion address: chapter13.InfiniteRecursion@70dea4e
,  InfiniteRecursion address: chapter13.InfiniteRecursion@5c647e05
,  InfiniteRecursion address: chapter13.InfiniteRecursion@33909752
,  InfiniteRecursion address: chapter13.InfiniteRecursion@55f96302
,  InfiniteRecursion address: chapter13.InfiniteRecursion@3d4eac69
,  InfiniteRecursion address: chapter13.InfiniteRecursion@42a57993
]
*/

【代码解说】这里发生了自动类型转换： 由 InfiniteRecursion类型转换为 String 类型。 this前面的是字符串，后面是换行符，所以 this 转换为 String，即调用了 this.toString() 方法，于是就发生了 递归调用 toString() 方法，无限递归使得 java 虚拟机栈被顶满；然后抛出异常； 把this换做 super.toString() 方法后执行成功；

【13.4】String 上的操作

1）String 对象的基本方法列表如下：

2）当需要改变字符串的内容时： String 类的方法都会返回一个新的String 对象；如果没有改变，则返回原对象的引用；

【13.5】格式化输出

【13.5.1】printf() 方法

【13.5.2】System.out.format() 方法： format方法可以用于 PrintStream' 和 PrintWriter 对象；

【荔枝-System.out.format() 输出格式】

// System.out.format() 输出格式
public class SimpleFormat {public static void main(String[] args) {int x = 5;double y = 5.332542;// The old way:System.out.println("Row 1: [" + x + " " + y + "]");// The new way:System.out.format("Row 1: [%d %f]\n", x, y); // format() 方法的荔枝// orSystem.out.printf("Row 1: [%d %f]\n", x, y); // printf() 方法荔枝}
}
/*
Row 1: [5 5.332542]
Row 1: [5 5.332542]
Row 1: [5 5.332542]
*/

【注意】 PrintStream.printf() 方法实际上调用了 format() 方法

// PrintStream.printf() 方法源码
public PrintStream printf(String format, Object ... args) {return format(format, args);}

// System.out 实际上是 不可变的PrintStream对象常量
public final static PrintStream out = null;

【13.5.3】java.util.Formatter 类

1）格式化功能都由 java.util.Formatter 类处理；

1.1）Formatter 是一个翻译器： 将格式化字符串与数据翻译成期望的结果；

1.2）Formatter 构造器需要传入目的地输出流参数： 最常用的目的地是： PrintStream、OutputStream 和 File；

【Formatter荔枝】

// Formatter() 的荔枝
public class Turtle {private String name;private Formatter f;public Turtle(String name, Formatter f) {this.name = name;this.f = f;}public void move(int x, int y) {f.format("%s The Turtle is at (%d,%d)\n", name, x, y);}public static void main(String[] args) {PrintStream outAlias = System.out;// new Formatter(dest), 设置输出目的地Turtle tommy = new Turtle("Tommy", new Formatter(System.out));Turtle terry = new Turtle("Terry", new Formatter(outAlias));tommy.move(0, 0);terry.move(4, 8);tommy.move(3, 4);terry.move(2, 5);tommy.move(3, 3);terry.move(3, 3);}
}
/*
Tommy The Turtle is at (0,0)
Terry The Turtle is at (4,8)
Tommy The Turtle is at (3,4)
Terry The Turtle is at (2,5)
Tommy The Turtle is at (3,3)
Terry The Turtle is at (3,3)
*/

// Formatter 构造器
public Formatter(PrintStream ps) {this(Locale.getDefault(Locale.Category.FORMAT),(Appendable)Objects.requireNonNull(ps));}

【13.5.4】格式化说明符（如 %d, %s）

1）如何控制输出的空格与格式对齐： 默认或+右对齐， -表示左对齐；

2）字符串格式化语法： %[argument_index$][flags][width][.precision]conversion

2.1）argument_index： 参数序号；

2.2）flags： + 或者 - ；

2.3）width： 最小长度；

2.4） precision： 用于格式化字符串时，表示最大长度；用于格式化浮点数时，表示小数部分位数（默认6位，少则补0，多则舍入）；无法格式化整数（否则抛出异常）；

2.5）conversion： 表示类型转换字符： d, c, b, s, f, e, x, h, %；

【格式化说明符的荔枝】

// 输出格式的 左(-) 右(默认)对齐设置
public class Receipt {private double total = 0;private Formatter f = new Formatter(System.out);public void printTitle() {// - 左对齐f.format("%-15s %5s %10s\n", "Item", "Qty", "Price"); // - 左对齐f.format("%-15s %5s %10s\n", "----", "---", "-----");}public void print(String name, int qty, double price) {f.format("%-15.15s %5d %10.2f\n", name, qty, price);total += price;}public void printTotal() {f.format("%-15s %5s %10.2f\n", "Tax", "", total * 0.06);f.format("%-15s %5s %10s\n", "", "", "-----");f.format("%-15s %5s %10.2f\n", "Total", "", total * 1.06);}public static void main(String[] args) {Receipt receipt = new Receipt();receipt.printTitle();receipt.print("Jack's Magic Beans", 40, 4.25);receipt.print("Princess Peas", 311, 5.1);receipt.print("Three Bears Porridge", 1, 14.29);receipt.printTotal();}
}
/*
Item              Qty      Price
----              ---      -----
Jack's Magic Be    40       4.25
Princess Peas     311       5.10
Three Bears Por     1      14.29
Tax                         1.42-----
Total                      25.06
*/

【13.5.5】Formatter转换

1）下面的表格包含了最常用的类型转换：

2）类型转换字符有： d, c, b, s, f, e, x, h, % ；

d：整数型（10进制）；

c： Unicode 字符；

b：Boolean 值；

s：String；

f：浮点数（10进制）；

x：整数（16进制）；

h：散列码（16进制）；

%：字符% 或类型转换字符前缀（必须是单个%，多个% 不是）

【类型转换的荔枝】

/* Formatter 对各种数据类型转换的荔枝 */
public class Conversion {public static void main(String[] args) {Formatter f = new Formatter(System.out);char u = 'a';  System.out.println("u = 'a'"); // u = 'a'f.format("%%s: %s\n", u); // %s: af.format("%%c: %c\n", u); // %c: af.format("%%b: %b\n", u); // %b: truef.format("%%h: %h\n", u); // %h: 61
//       f.format("d: %d\n", u); //  java.util.IllegalFormatConversionException: d != java.lang.Character
//       f.format("f: %f\n", u); // java.util.IllegalFormatConversionException: f != java.lang.Character
//       f.format("e: %e\n", u); // java.util.IllegalFormatConversionException: e != java.lang.Character
//       f.format("x: %x\n", u); // java.util.IllegalFormatConversionException: x != java.lang.Characterint v = 121;System.out.println();System.out.println("v = 121"); // v = 121f.format("%%d: %d\n", v); // %d: 121f.format("%%c: %c\n", v); // %c: yf.format("%%b: %b\n", v); // %b: true f.format("%%s: %s\n", v); // %s: 121f.format("%%x: %x\n", v); // %x: 79f.format("%%h: %h\n", v); // %h: 79
//       f.format("f: %f\n", v); // java.util.IllegalFormatConversionException: f != java.lang.Integer
//       f.format("e: %e\n", v); // java.util.IllegalFormatConversionException: e != java.lang.IntegerBigInteger w = new BigInteger("50000000000000");System.out.println();System.out.println("w = new BigInteger(\"50000000000000\")"); // w = new BigInteger("50000000000000")f.format("%%d: %d\n", w); // %d: 50000000000000f.format("%%b: %b\n", w); // %b: truef.format("%%s: %s\n", w); // %s: 50000000000000f.format("%%x: %x\n", w); // %x: 2d79883d2000f.format("%%h: %h\n", w); // %h: 8842a1a7
//       f.format("c: %c\n", w); // java.util.IllegalFormatConversionException: c != java.math.BigInteger
//       f.format("f: %f\n", w); // java.util.IllegalFormatConversionException: f != java.math.BigInteger
//       f.format("e: %e\n", w); // java.util.IllegalFormatConversionException: e != java.math.BigIntegerdouble x = 179.543;System.out.println();System.out.println("x = 179.543"); // x = 179.543f.format("%%b: %b\n", x); // %b: truef.format("%%s: %s\n", x); // %s: 179.543f.format("%%f: %f\n", x); // %f: 179.543000f.format("%%e: %e\n", x); //%e: 1.795430e+02, 科学表示法f.format("%%h: %h\n", x); // %h: 1ef462c
//       f.format("d: %d\n", x); // java.util.IllegalFormatConversionException: d != java.lang.Double
//       f.format("c: %c\n", x); // java.util.IllegalFormatConversionException: c != java.lang.Double
//       f.format("x: %x\n", x); // java.util.IllegalFormatConversionException: x != java.lang.DoubleConversion y = new Conversion();System.out.println();System.out.println("y = new Conversion()"); // y = new Conversion()f.format("%%b: %b\n", y); // %b: truef.format("%%s: %s\n", y); // %s: chapter13.Conversion@4aa298b7f.format("%%h: %h\n", y); // %h: 4aa298b7
//       f.format("d: %d\n", y); // java.util.IllegalFormatConversionException: d != chapter13.Conversion
//       f.format("c: %c\n", y); // java.util.IllegalFormatConversionException: c != chapter13.Conversion
//       f.format("f: %f\n", y); // java.util.IllegalFormatConversionException: f != chapter13.Conversion
//       f.format("e: %e\n", y); // java.util.IllegalFormatConversionException: e != chapter13.Conversion
//       f.format("x: %x\n", y); // java.util.IllegalFormatConversionException: x != chapter13.Conversionboolean z = false;System.out.println();System.out.println("z = false"); // z = falsef.format("%%b: %b\n", z); // %b: falsef.format("%%s: %s\n", z); // %s: falsef.format("%%h: %h\n", z); // %h: 4d5
//       f.format("d: %d\n", z); // java.util.IllegalFormatConversionException: d != java.lang.Boolean
//       f.format("c: %c\n", z); // java.util.IllegalFormatConversionException: c != java.lang.Boolean
//       f.format("f: %f\n", z); // java.util.IllegalFormatConversionException: f != java.lang.Boolean
//       f.format("e: %e\n", z); // java.util.IllegalFormatConversionException: e != java.lang.Boolean
//       f.format("x: %x\n", z); // java.util.IllegalFormatConversionException: x != java.lang.Boolean}
}

【13.5.6】String.format() 方法

1）String.format方法源码： 接受的参数与 Formatter.format()方法一样，但返回一个 String 对象；

【String.format() 荔枝】

public class DatabaseException extends Exception {public DatabaseException(int transactionID, int queryID, String message) {super(String.format("(t%d, q%d) %s", transactionID, queryID, message));} /** String.format() 源码详解： String.format() 方法也是创建一个 Formatter对象.public static String format(String format, Object... args) {return new Formatter().format(format, args).toString();}*/public static void main(String[] args) {try {throw new DatabaseException(3, 7, "Write failed");} catch (Exception e) {System.out.println(e);System.out.println(e.getMessage());}}
}
/*
chapter13.DatabaseException: (t3, q7) Write failed
(t3, q7) Write failed
*/

2）16进制转储（dump）工具

【荔枝-使用String.format() 方法以可读的16 进制格式把字节数组打印出来】

// 16进制转储工具
public class Hex {public static String format(byte[] data) {StringBuilder result = new StringBuilder();int n = 0;for (byte b : data) {if (n % 16 == 0)result.append(String.format("%05X: ", n)); // 占用5个位置(16进制表示)result.append(String.format("%02X ", b)); // 占用2个位置(16进制表示)n++;if (n % 16 == 0)result.append("\n");}result.append("\n");return result.toString();}public static void main(String[] args) throws Exception {if (args.length == 0)System.out.println(format(BinaryFile.read(MyConstant.path + "Hex.class")));elseSystem.out.println(format(BinaryFile.read(new File(args[0]))));}
}
/*
00000: CA FE BA BE 00 00 00 34 00 58 0A 00 05 00 26 07
00010: 00 27 0A 00 02 00 26 08 00 28 07 00 29 0A 00 2A ......
*/

public class BinaryFile {public static byte[] read(File bFile) throws IOException {BufferedInputStream bf = new BufferedInputStream(new FileInputStream(bFile));try {byte[] data = new byte[bf.available()];bf.read(data);return data;} finally {bf.close();}}public static byte[] read(String bFile) throws IOException {return read(new File(bFile).getAbsoluteFile());}
} // /:~

【13.6】正则表达式 regex

【13.6.1】基础

1）java 对反斜线 '\' 的不同处理

1.1）其他语言： \\ 表示在regex 插入字面量反斜线 '\'；

1.2）java： \\ 表示插入一个regex 的反斜线，所以反斜线后面的字符具有特殊意义；

2）荔枝-java 反斜线：

2.1）数字的regex： \\d；

2.2）普通反斜线的 regex ： \\\\；

2.3）换行和制表符的regex： \n\t （无需转换）；

3）使用 regex 的最简单途径： 利用String 类内建功能： String.matches(regex)；

4）String的内建匹配的 regex的荔枝

/* String的内建匹配的 regex的荔枝 */
public class IntegerMatch {public static void main(String[] args) {System.out.println("-1234".matches("-?\\d+")); // trueSystem.out.println("5678".matches("-?\\d+")); // trueSystem.out.println("+911".matches("-?\\d+")); // falseSystem.out.println("+911".matches("(-|\\+)?\\d+")); // true}
}
/*
true
true
false
true
*/

// String.matches() 源码，实际上调用了 Pattern.matches()
public boolean matches(String regex) {return Pattern.matches(regex, this);}

【代码解说】 (-|\\+)? ： 表示字符串的起始字符可能是一个 - 或 + （\\+ 是对 + 的转义，转义后是普通字符），或二者都没有，（？表示0个或1个）；

5）String.split(regex)： regex也可以是空格，把字符串从regex 匹配的地方切开；

【荔枝-利用String.split(regex) 分割字符串】

/* 荔枝-利用String.split(regex) 分割字符串  */
public class Splitting {public static String knights = "Then, when you have found the shrubbery, you must "+ "cut down the mightiest tree in the forest... "+ "with... a herring!";public static void split(String regex) {String[] array = knights.split(regex);for(String s : array) {System.out.print(s + " ");}        }public static void main(String[] args) {System.out.println("knights = \"" + knights + "\"\n");split(" "); // 利用空格进行分割, Doesn't have to contain regex chars(不必包含正则表达式字符)System.out.println();split("\\W+"); // (大写W)基于非单词字符进行分割, Non-word charactersSystem.out.println();split("n\\W+"); // (大写W)基于n之后跟非单词字符进行分割, 'n' followed by non-word charactersSystem.out.println("\nknights = \"" + knights + "\"\n"); // 显然 String.split(regex) 不会修改string 而是重新创建一个String}// 基于谁进行分割，这个谁最后都会被移除.
}
/*
knights = "Then, when you have found the shrubbery, you must cut down the mightiest tree in the forest... with... a herring!"Then, when you have found the shrubbery, you must cut down the mightiest tree in the forest... with... a herring!
Then when you have found the shrubbery you must cut down the mightiest tree in the forest with a herring
The whe you have found the shrubbery, you must cut dow the mightiest tree i the forest... with... a herring!
knights = "Then, when you have found the shrubbery, you must cut down the mightiest tree in the forest... with... a herring!"
*/

【代码解说】

1）\W： 匹配非单词字符；

2）\w：匹配单词字符；

6）String.split() 重载版本： 允许你限制字符串分割的次数；

7）利用regex 进行字符串替换： 仅替换regex 第一次匹配的子串，也可以替换所有匹配的地方；

【荔枝-利用regex进行字符串替换(replaceFirst ， replaceAll )】

/* 利用regex进行字符串替换(replaceFirst, replaceAll ) */
public class Replacing {static String s = Splitting.knights;public static void main(String[] args) {System.out.println(s);// 以y字母开头的单词 被替换为 Tom(且仅被替换一次)print(s.replaceFirst("y\\w+", "Tom")); // String.replaceFirst 荔枝System.out.println();// shrubbery 或 tree 或 herring 全部替换为bananaprint(s.replaceAll("shrubbery|tree|herring", "banana")); // String.replaceAll 荔枝}
}
/*
Then, when you have found the shrubbery, you must cut down the mightiest tree in the forest... with... a herring!
Then, when Tom have found the shrubbery, you must cut down the mightiest tree in the forest... with... a herring!Then, when you have found the banana, you must cut down the mightiest banana in the forest... with... a banana!
*/

【13.6.2】创建正则表达式

1）正则表达式字符，字符类，逻辑操作符，边界匹配符；

【荔枝-利用regex匹配字符序列】

// 正则表达式的 模式匹配
public class Rudolph {public static void main(String[] args) {CharSequence seq ;CharSequence str = new String();for (String pattern : new String[] { "Rudolph", "[rR]udolph", "[rR][aeiou][a-z]ol.*", "R.*" })System.out.println("Rudolph".matches(pattern)); // 全为 true, 全匹配.}
}
/*
String.matches(String regex) 源码
public boolean matches(String regex) {return Pattern.matches(regex, this);
}
*/

【13.6.3】量词

1）量词描述了一个模式吸收输入文本的方式：

贪婪型： 发现尽可能多的匹配；

勉强型；

占有型；

【注意】：表达式X 通常必须用圆括号括起来；

【CharSequence-字符序列】接口 CharSequence 从 CharBuffer, String, StringBuffer, StringBuilder 类中抽象出了字符序列的一般化定义：多数正则表达式都接受 CharSequence类型的参数呢。

/* CharSequence接口源码  */
public interface CharSequence {int length();char charAt(int index);CharSequence subSequence(int start, int end);public String toString();public default IntStream chars() {class CharIterator implements PrimitiveIterator.OfInt {int cur = 0;public boolean hasNext() {return cur < length();}public int nextInt() {if (hasNext()) {return charAt(cur++);} else {throw new NoSuchElementException();}}@Overridepublic void forEachRemaining(IntConsumer block) {for (; cur < length(); cur++) {block.accept(charAt(cur));}}}return StreamSupport.intStream(() ->Spliterators.spliterator(new CharIterator(),length(),Spliterator.ORDERED),Spliterator.SUBSIZED | Spliterator.SIZED | Spliterator.ORDERED,false);}public default IntStream codePoints() {class CodePointIterator implements PrimitiveIterator.OfInt {int cur = 0;@Overridepublic void forEachRemaining(IntConsumer block) {final int length = length();int i = cur;try {while (i < length) {char c1 = charAt(i++);if (!Character.isHighSurrogate(c1) || i >= length) {block.accept(c1);} else {char c2 = charAt(i);if (Character.isLowSurrogate(c2)) {i++;block.accept(Character.toCodePoint(c1, c2));} else {block.accept(c1);}}}} finally {cur = i;}}public boolean hasNext() {return cur < length();}public int nextInt() {final int length = length();if (cur >= length) {throw new NoSuchElementException();}char c1 = charAt(cur++);if (Character.isHighSurrogate(c1) && cur < length) {char c2 = charAt(cur);if (Character.isLowSurrogate(c2)) {cur++;return Character.toCodePoint(c1, c2);}}return c1;}}return StreamSupport.intStream(() ->Spliterators.spliteratorUnknownSize(new CodePointIterator(),Spliterator.ORDERED),Spliterator.ORDERED,false);}
}

【13.6.4】Patter 和 Matcher

1）如何构建功能强大的regex 对象？

step1： Pattern.compile(regex) 编译regex 并产生 Pattern 对象；

step2：Patter.matcher(检索的字符串) 生成一个 Matcher 对象；

2）Matcher对象有许多方法如下：

// 利用 Pattern 和 Matcher 测试正则表达式的荔枝
public class TestRegularExpression {public static void main(String[] args) {String[] array = {"aabbcc", "aab", "aab+", "(b+)"};for (String arg : array) {System.out.println();print("Regular expression: \"" + arg + "\"");Pattern p = Pattern.compile(arg); // step1: Pattern 表示编译后的匹配模型Pattern.（编译后的正则表达式）Matcher m = p.matcher("aabbcc"); // step2: 模型实例 检索 待匹配字符串并 生成一个匹配对象Matcher， Matcher有很多方法while (m.find()) {print("Match \"" + m.group() // 待匹配的字符串+ "\" at positions " + m.start() // 字符串匹配regex的起始位置+ "-" + (m.end() - 1)); // 字符串匹配regex的终点位置}}}
}
/*
Regular expression: "aabbcc"
Match "aabbcc" at positions 0-5Regular expression: "aab"
Match "aab" at positions 0-2Regular expression: "aab+"
Match "aabb" at positions 0-3Regular expression: "(b+)"
Match "bb" at positions 2-3
*/

【代码解说】Pattern对象表示编译后的 regex-正则表达式，是具有更强功能的正则表达式对象；

【编译后的regex-Pattern】

1）Pattern 提供了 static 方法： 它实际上要经过 Pattern.compile(regex) 生成 Pattern对象， pattern obj.matcher(str) 生成 Matcher 对象，最后返回 metcher.matches() 结果，即 input 是否匹配 regex；

// Pattern.matches() 方法 public static boolean matches(String regex, CharSequence input) {Pattern p = Pattern.compile(regex);Matcher m = p.matcher(input);return m.matches();}

// Pattern.compile() 源码public static Pattern compile(String regex) {return new Pattern(regex, 0);}

2）Pattern 方法列表：

split() 方法： 它从字符串匹配regex的地方分割字符串，并返回分割后的字符串数组；

pattern 方法：返回pattern；

3）Matcher 方法列表：

boolean matches(); //判断 输入字符串 是否匹配正则表达式regex；
boolean lookingAt(); //判断输入字符串（不是整个）的开始部分是否匹配 regex；
boolean find(); //用于 在 CharSequence 输入字符串中查找多个匹配；
boolean find(int start);  //用于在 CharSequence 输入字符串的start 位置开始查找多个匹配；
String group(); //用于返回匹配regex的输入字符串的子串；

【荔枝-Matcher.find() 方法荔枝】

public class Finding {public static void main(String[] args) {// step1, 对regex进行编译 得到 编译后的regex对象Pattern// step2, Pattern 对 输入字符串进行检索 得到 匹配对象Matcher.Matcher m = Pattern.compile("\\w+").matcher( // (小写w) 表示匹配单词字符"Evening is full of the linnet's wings");while (m.find())printnb(m.group() + ", ");System.out.println("\n======");int i = 0;while (m.find(i)) {printnb(m.group() + " \n");i++;}}
}
/*
Evening, is, full, of, the, linnet, s, wings,
======
Evening
vening
ening
ning
ing
ng
g
is
is
s
full
full
ull
ll
l
of
of
f
the
the
he
e
linnet
linnet
innet
nnet
net
et
t
s
s
wings
wings
ings
ngs
gs
s
*/

【代码解说】 模式 \\w+ 将字符串划分为单词。 find() 前向遍历输入字符串； find(int start) 把 start 作为输入字符串搜索的起点；

4）组group： 组是用括号划分的regex，可以根据组编号来引用某个组。组号为0 表示整个regex，组号为1 表示被第一对括号括起来的组；

【荔枝-group】

A(B(C))D ：有3个组；

组0：ABCD;

组1： BC

组2： C

【荔枝-regex group - 正则表达式组的荔枝】

public class Groups {static public final String POEM = "Twas brillig, and the slithy toves\n"+ "Did gyre and gimble in the wabe.\n"+ "All mimsy were the borogoves,\n"+ "And the mome raths outgrabe.\n\n"+ "Beware the Jabberwock, my son,\n"+ "The jaws that bite, the claws that catch.\n"+ "Beware the Jubjub bird, and shun\n"+ "The frumious Bandersnatch.";public static void main(String[] args) {// \S 非空白符, \s 空白符, 补充: 圆括号阔起来的是分组// 目的是捕获每行最后的3个词，每行最后以 $ 结束。 ?m 是模式标记，用于指定输入序列中的换行符Matcher m = Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$") .matcher(POEM); // 对 输入字符串 POEM 进行正则表达式匹配.while (m.find()) {for (int j = 0; j <= m.groupCount(); j++)printnb("[" + m.group(j) + "]");print();}}
}
/*
[the slithy toves][the][slithy toves][slithy][toves]
[in the wabe.][in][the wabe.][the][wabe.]
[were the borogoves,][were][the borogoves,][the][borogoves,]
[mome raths outgrabe.][mome][raths outgrabe.][raths][outgrabe.]
[Jabberwock, my son,][Jabberwock,][my son,][my][son,]
[claws that catch.][claws][that catch.][that][catch.]
[bird, and shun][bird,][and shun][and][shun]
[The frumious Bandersnatch.][The][frumious Bandersnatch.][frumious][Bandersnatch.]
*/

5）start() 与 end() 方法：

5.1）返回值： start方法返回先前匹配的起始位置的索引，而end方法返回所匹配的最后字符的索引加一的值；

5.2）匹配操作失败后： 调用 start() 或 end() 方法报错 IllegalStateException ；

【荔枝-Matcher方法列表】

public class StartEnd {public static String input = "As long as there is injustice, whenever a\n"+ "Targathian baby cries out, wherever a distress\n"+ "signal sounds among the stars ... We'll be there.\n"+ "This fine ship, and this fine crew ...\n"+ "Never give up! Never surrender!";private static class Display {private boolean regexPrinted = false;private String regex;Display(String regex) {this.regex = regex;}void display(String message) {if (!regexPrinted) {
//              print(regex);regexPrinted = true;}print(message);}}/* 校验输入字符串s 是否匹配 regex  */static void examine(String s, String regex) {Display d = new Display(regex);Pattern p = Pattern.compile(regex);Matcher m = p.matcher(s);/* find() 遍历 输入字符串，并以匹配regex的输入字符串子串的终点作为下次遍历的起点 */while (m.find())/* Matcher.group() 返回的是匹配regex的输入字符串的子串*/d.display("find() '" + m.group() + "' start = " + m.start()+ " end = " + m.end());/* 判断输入字符串的开始部分是否匹配regex */if (m.lookingAt()){ // No reset() necessary System.out.println("\n m.lookingAt() : ");d.display("lookingAt() start = " + m.start() + " end = " + m.end());}/* 判断整个输入字符串是否匹配 regex */if (m.matches()) // No reset() necessaryd.display("matches() start = " + m.start() + " end = " + m.end());}public static void main(String[] args) {int i = 0;for (String in : input.split("\n")) {System.out.println("[" + ++i +"]====================================");print("input : " + in);int j = 0;for (String regex : new String[] { "\\w*ere\\w*", "\\w*ever","T\\w+", "Never.*?!" }) {System.out.println("regex" + ++j + " = " + regex);examine(in, regex);}}}
}
/*
[1]====================================
input : As long as there is injustice, whenever a
regex1 = \w*ere\w*
find() 'there' start = 11 end = 16
regex2 = \w*ever
find() 'whenever' start = 31 end = 39
regex3 = T\w+
regex4 = Never.*?!
[2]====================================
input : Targathian baby cries out, wherever a distress
regex1 = \w*ere\w*
find() 'wherever' start = 27 end = 35
regex2 = \w*ever
find() 'wherever' start = 27 end = 35
regex3 = T\w+
find() 'Targathian' start = 0 end = 10m.lookingAt() :
lookingAt() start = 0 end = 10
regex4 = Never.*?!
[3]====================================
input : signal sounds among the stars ... We'll be there.
regex1 = \w*ere\w*
find() 'there' start = 43 end = 48
regex2 = \w*ever
regex3 = T\w+
regex4 = Never.*?!
[4]====================================
input : This fine ship, and this fine crew ...
regex1 = \w*ere\w*
regex2 = \w*ever
regex3 = T\w+
find() 'This' start = 0 end = 4m.lookingAt() :
lookingAt() start = 0 end = 4
regex4 = Never.*?!
[5]====================================
input : Never give up! Never surrender!
regex1 = \w*ere\w*
regex2 = \w*ever
find() 'Never' start = 0 end = 5
find() 'Never' start = 15 end = 20m.lookingAt() :
lookingAt() start = 0 end = 5
regex3 = T\w+
regex4 = Never.*?!
find() 'Never give up!' start = 0 end = 14
find() 'Never surrender!' start = 15 end = 31m.lookingAt() :
lookingAt() start = 0 end = 14
matches() start = 0 end = 31
*/

【代码解说-Matcher方法列表】

1）find()： 从输入字符串的任意位置匹配 regex；而 find(int start) ：从输入字符串的第start字符开始匹配 regex；

2）lookingAt()： 判断输入字符串是否从最开始处就匹配 regex；

3）matches()： 判断整个输入字符串是否匹配 regex；

【Pattern标记】

1）Pattern.compile() 方法的重载版本：该方法可以调整 regex 的匹配行为：

// Pattern.compile(String, int) 源码public static Pattern compile(String regex, int flags) {return new Pattern(regex, flags);}

2）上述 flags 表示匹配行为，必须为 Pattern类常量，如下：

3）常用的Pattern 标记如下：

3.1）Pattern.CASE_INSENSITIVE： 不区分大小写；

3.2）Pattern.MULTILINE： 允许多行，即不以换行字符作为分隔符；

3.3）Pattern.COMMENTS： 模式中允许空格和注释，不以空格和注释作为分隔符；

【荔枝-Pattern标记】

/* Pattern标记的荔枝 */
public class ReFlags {public static void main(String[] args) {Pattern p = Pattern.compile("^java", Pattern.CASE_INSENSITIVE| Pattern.MULTILINE);// Pattern.CASE_INSENSITIVE: 不区分大小写；// | Pattern.MULTILINE: 允许多行，即不以换行字符作为分隔符；Matcher m = p.matcher("java has regex\nJava has regex\n"+ "JAVA has pretty good regular expressions\n"+ "Regular expressions are in Java");/* Matcher.find() ： 从输入字符串的任意位置校验输入字符串是否匹配regex*/while (m.find())System.out.println(m.group()); // m.group() 返回匹配regex的输入字符串子串}
}
/*
java
Java
JAVA
*/

【注意】模式Pattern 表示的是： 编译后的regex；

【13.6.5】Pattern.split() 方法

1）Patter.split() 方法 将输入字符串分割为字符串对象数组，分割边界由 regex 确定（分割边界在分割结果中被删除）；

// Pattern.split(CharSequence input)源码
public String[] split(CharSequence input) {return split(input, 0);}
// Pattern.split(CharSequence input, int limit) 源码
public String[] split(CharSequence input, int limit) {int index = 0;boolean matchLimited = limit > 0;ArrayList<String> matchList = new ArrayList<>();Matcher m = matcher(input);// Add segments before each match foundwhile(m.find()) {if (!matchLimited || matchList.size() < limit - 1) {if (index == 0 && index == m.start() && m.start() == m.end()) {// no empty leading substring included for zero-width match// at the beginning of the input char sequence.continue;}String match = input.subSequence(index, m.start()).toString();matchList.add(match);index = m.end();} else if (matchList.size() == limit - 1) { // last oneString match = input.subSequence(index,input.length()).toString();matchList.add(match);index = m.end();}}// If no match was found, return thisif (index == 0)return new String[] {input.toString()};// Add remaining segmentif (!matchLimited || matchList.size() < limit)matchList.add(input.subSequence(index, input.length()).toString());// Construct resultint resultSize = matchList.size();if (limit == 0)while (resultSize > 0 && matchList.get(resultSize-1).equals(""))resultSize--;String[] result = new String[resultSize];return matchList.subList(0, resultSize).toArray(result);}

【荔枝-Patter.split() 方法分割输入字符串】

// Pattern.split() 方法的测试用例
public class SplitDemo {public static void main(String[] args) {String input = "This!!unusual use!!of exclamation!!points";print(Arrays.toString(Pattern.compile("!!").split(input))); // split(input, 0); 对匹配次数不做任何限制/* (只匹配前2个 !! ) *//* 注意：分割边界在分割结果中被删除 */print(Arrays.toString(Pattern.compile("!!").split(input, 3))); // 限定匹配次数，limit限制将输入字符串分割成数组的数组大小}
}
/*
[This, unusual use, of exclamation, points]
[This, unusual use, of exclamation!!points]
*/

【13.6.6】替换操作

1） Matcher.appendReplacement 和 Matcher.appendTail 方法的荔枝

public class TheReplacements {public static void main(String[] args) throws Exception {String s = TextFile.read(MyConstant.path + "TheReplacements.java");//  匹配在 /*! 和 !*/ 之间的所有文字。//              如 /*! 今天     2017    年11月26日 , i love you. !*/Matcher mInput = Pattern.compile("/\\*!(.*)!\\*/", Pattern.DOTALL).matcher(s);if (mInput.find()) {s = mInput.group(1); // Captured by parentheses(圆括号)System.out.println("matched.");System.out.println("s1 = " + s); }// Replace two or more spaces with a single space:/* 用一个空格替换2个或多个空格（缩进字符 \t 不起作用） */s = s.replaceAll(" {2,}", " ");System.out.println("after s.replaceAll(\" {2,}\", \" \"), s2 = " + s); // // Replace one or more spaces at the beginning of each line with no spaces. Must enable MULTILINE mode:// 在每行的开头替换一个或多个空格，不要有空格。 必须启用MULTILINE模式：s = s.replaceAll("(?m)^ +", "");System.out.println("after s = s.replaceAll(\"(?m)^ +\", \"\"), s3 = " + s);s = s.replaceFirst("[aeiou]", "(Tr)"); // 用 (VOWEL1) 替换第一次匹配到的 任何一个aeiou元音字母, 这里调用的是 String.replaceFirst()方法System.out.println("after s.replaceFirst(\"[aeiou]\", \"(Tr)\"), s4 = " + s);/* 构建模式，即编译后的regex. */StringBuffer sbuf = new StringBuffer();Pattern p = Pattern.compile("[aeiou]");Matcher m = p.matcher(s);// Process the find information as you perform the replacements:// 在执行替换时处理查找信息：while (m.find())m.appendReplacement(sbuf, m.group().toUpperCase()); // 将regex找到的元音字母转换为大写字母print("s = " + s); // s 没有改变// Put in the remainder of the text:m.appendTail(sbuf); // 将未处理的部分存入sbuf;print(sbuf); // 最后 sbuf 是  字符串s被匹配的元音字母转换为大写后的 结果。}
}

// 打印结果：
matched.
s1 =  和 !*/ 之间的所有文字。//              如 /*! 今天     2017    年11月26日 , i love you.
after s.replaceAll(" {2,}", " "), s2 =  和 !*/ 之间的所有文字。// 如 /*! 今天 2017 年11月26日 , i love you.
after s = s.replaceAll("(?m)^ +", ""), s3 = 和 !*/ 之间的所有文字。// 如 /*! 今天 2017 年11月26日 , i love you.
after s.replaceFirst("[aeiou]", "(Tr)"), s4 = 和 !*/ 之间的所有文字。// 如 /*! 今天 2017 年11月26日 , (Tr) love you.
s = 和 !*/ 之间的所有文字。// 如 /*! 今天 2017 年11月26日 , (Tr) love you.
和 !*/ 之间的所有文字。// 如 /*! 今天 2017 年11月26日 , (Tr) lOvE yOU.

【代码解说】方法列表：

1）Matcher.appendReplacement(StringBuffer sb, String replacement) 方法： 是将匹配到的字符串部分（或子串）做处理后追加到 sb；

2）Matcher.appendTail(StringBuffer sb) ： 在执行一次或多次 appendReplacement() 方法后，调用 appendTail() 方法可以将输入字符串余下的部分复制到sb中；

3） Matcher.replaceFirst(String replacement)： 调用一次 appendReplacement，再调用一次 appendTail 方法就可以了；

// Matcher.replaceFirst() 源码public String replaceFirst(String replacement) {if (replacement == null)throw new NullPointerException("replacement");reset();if (!find())return text.toString();StringBuffer sb = new StringBuffer();appendReplacement(sb, replacement);appendTail(sb);return sb.toString();}

4）Matcher.replaceAll(String replacement)：用replacement 替换输入字符串中 所有 匹配regex 的部分：调用多次 appendReplacement，再调用一次 appendTail 方法就可以了；

// Matcher.replaceAll() 源码public String replaceAll(String replacement) {reset();boolean result = find();if (result) {StringBuffer sb = new StringBuffer();do {appendReplacement(sb, replacement);result = find();} while (result);appendTail(sb);return sb.toString();}return text.toString();}

5）代码 s = s.replaceFirst("[aeiou]", "(Tr)"); 调用的是 String.replaceFirst() 方法，源码如下：String.replaceFirst() 方法实际上也是调用了 Matcher.replaceFirst()

// String.replace() 方法源码public String replaceFirst(String regex, String replacement) {return Pattern.compile(regex).matcher(this).replaceFirst(replacement);}

【13.6.7】reset()

1）reset方法： 可以将 Matcher对象应用于一个新的字符序列；

// reset()方法 重新设置  Matcher 的输入字符串
public class Resetting {public static void main(String[] args) throws Exception {Matcher m = Pattern.compile("[frb][aiu][gx]").matcher("fix the rug with bags"); // 设置输入字符串  “fix the rug with bags”while (m.find()) {System.out.println(m.group() + " " + m.start() + " to " + m.end());}System.out.println("\nafter m.reset(\"fix the rig with rags\") by regex [frb][aiu][gx]");/* Matcher.reset() 方法 可以将 Matcher对象重置到当前字符串或字符序列的起始位置 */m.reset("fix the rig with rags"); while (m.find())System.out.println(m.group() + " " + m.start() + " to " + m.end());}
}
/*
fix 0 to 3
rug 8 to 11
bag 17 to 20=== reset ===
fix 0 to 3
rig 8 to 11
rag 17 to 20
*/

【13.6.8】正则表达式regex 与 java io

1）如何应用 regex 在一个文件中进行搜索匹配操作？

2）unix 系统的 grep函数： 有两个输入参数，文件名和 regex；输出匹配部分和匹配部分在行中的位置；

// Matcher.reset(str) 的荔枝
public class JGrep {public static void main(String[] args) throws Exception {args = new String[2];args[1] = "[a-z]+"; // 定义 正则表达式regexPattern p = Pattern.compile(args[1]);  // Iterate through the lines of the input file:// 遍历输入文件的行：int index = 1;Matcher m = p.matcher(""); // 随便设置一个输入字符串 ""args[0] = MyConstant.path + "JGrep.java"; // 输入字符串所在文件的 dirfor (String line : new TextFile(args[0])) {/* Matcher.reset()方法 将 Matcher对象重置到当前字符串或字符序列的起始位置 */m.reset(line);  while (m.find())System.out.println(index++ + " , " + m.group() + " , start = "+ m.start());}}
}

【代码解说】循环外线创建一个空的 Matcher 对象，然后在for循环内部调用 Matcher.reset() 方法为 Matcher加载一行输入，这种处理会有一定的性能优化。

【13.7】扫描输入

1）如何从文件或标准输入读取数据？： 读入一行文本，对其进行分词，然后使用 Integer， Double 等类的各种解析方法来解析数据；

【荔枝-原始扫描输入BufferedReader实现】

// 原始扫描输入的荔枝
public class SimpleRead {public static BufferedReader input = new BufferedReader(new StringReader("Sir Robin of Camelot\n22 1.61803"));public static void main(String[] args) {try {System.out.println("\n1.What is your name?");String name = input.readLine();System.out.println(name); // Sir Robin of CamelotSystem.out.println("\n2.input: <age> <double>");String numbers = input.readLine();System.out.println("input.readLine() = " + numbers); // 22 1.61803String[] numArray = numbers.split(" ");int age = Integer.parseInt(numArray[0]); // 22double favorite = Double.parseDouble(numArray[1]); // 1.61803System.out.format("Hi %s.\n", name);System.out.format("In 5 years you will be %d.\n", age + 5);System.out.format("My favorite double is %f.", favorite / 2);} catch (IOException e) {System.err.println("I/O exception");}}
}
/*1.What is your name?
Sir Robin of Camelot2.input: <age> <double>
input.readLine() = 22 1.61803
Hi Sir Robin of Camelot.
In 5 years you will be 27.
My favorite double is 0.809015.*/

【代码解说】 显然，上面的扫描读入代码有一个问题：当 integer 和 double 类型数据在同一行的时候，还必须对 string 进行分割；

所以， java se5 新增了 Scanner 类，这减轻了扫描输入的工作负担；

【荔枝-Java SE5 引入的Scanner实现扫描输入】

// 应用 Scanner 进行扫描输入操作
public class BetterRead {public static void main(String[] args) {// Scanner 可以接受任何类型的 Readable 输入对象Scanner stdin = new Scanner(SimpleRead.input);System.out.println("What is your name?");// 所有的输入，分词以及翻译的操作都隐藏在不同类型的 next 方法 中.String name = stdin.nextLine(); // nextLine() 返回 StringSystem.out.println(name);System.out.println("How old are you? What is your favorite double?");System.out.println("(input: <age> <double>)");// Scanner 直接读入 integer 和 double 类型数据int age = stdin.nextInt();double favorite = stdin.nextDouble();System.out.println(age);System.out.println(favorite);System.out.format("Hi %s.\n", name);System.out.format("In 5 years you will be %d.\n", age + 5);System.out.format("My favorite double is %f.", favorite / 2);}
}
/*
What is your name?
Sir Robin of Camelot
How old are you? What is your favorite double?
(input: <age> <double>)
22
1.61803
Hi Sir Robin of Camelot.
In 5 years you will be 27.
My favorite double is 0.809015.
*/

【代码解说】

1）Scanner 构造器可以接受任何类型的输入对象 Readable对象：包括 File, InputStream, String 等； Readable 接口时 Java SE5 引入的一个接口；

【13.7.1】Scanner 定界符（分割符）（定界符 delimiter ==分隔符，这个概念非常重要）

1）默认： Scanner 使用空白符对字符串进行分割，但可以自定义的 regex 作为分隔符；

2） Scanner.useDelimiter（regex）的自定义regex 作为分隔符的荔枝

// 使用正则表达式regex 指定 Scanner 所需的定界符( 小\s== 空白符， 而 大\S == 非空白符)
public class ScannerDelimiter {public static void main(String[] args) {Scanner scanner = new Scanner("12, 42, 78, 99, 42");scanner.useDelimiter("\\s*,\\s*"); // 使用 逗号, 作为分隔符while (scanner.hasNextInt())System.out.print(scanner.nextInt() + " ");}
}
/** Output: 12 42 78 99 42*/// :~

【13.7.2】用regex 扫描

1）除了扫描基本数据类型外，还可以使用描自定义的regex进行扫描；

【例子-使用regex 扫描日志文件中记录的威胁数据】

// Scanner 与 正则表达式相结合 扫描输入字符串
public class ThreatAnalyzer {static String threatData = "58.27.82.161@02/10/2005\n"+ "204.45.234.40@02/11/2005\n" + "58.27.82.161@02/11/2005\n"+ "58.27.82.161@02/12/2005\n" + "58.27.82.161@02/12/2005\n"+ "[Next log section with different data format]";public static void main(String[] args) {Scanner scanner = new Scanner(threatData);String pattern = "(\\d+[.]\\d+[.]\\d+[.]\\d+)@(\\d{2}/\\d{2}/\\d{4})"; // 正则表达式./* 注意scanner的方法列表，如 hasNext, match, group() 等*/while (scanner.hasNext(pattern)) {scanner.next(pattern);MatchResult match = scanner.match();String ip = match.group(1);String date = match.group(2);System.out.format("Threat on %s from %s\n", date, ip); // PrintStream.format()方法返回的就是一个 PrintStream，直接输出到控制台}}
}
/*
Threat on 02/10/2005 from 58.27.82.161
Threat on 02/11/2005 from 204.45.234.40
Threat on 02/11/2005 from 58.27.82.161
Threat on 02/12/2005 from 58.27.82.161
Threat on 02/12/2005 from 58.27.82.161*/

// PrintStream.format() 源码public PrintStream format(String format, Object ... args) {try {synchronized (this) {ensureOpen();if ((formatter == null)|| (formatter.locale() != Locale.getDefault()))formatter = new Formatter((Appendable) this);formatter.format(Locale.getDefault(), format, args);}} catch (InterruptedIOException x) {Thread.currentThread().interrupt();} catch (IOException x) {trouble = true;}return this;}

【代码解说】

Scanner.next()：找到下一个匹配该模式的输入部分；

Scanner.match()：获得匹配结果；

注意： 在使用 Scanner 和 regex 进行扫描输入时，扫描方式仅仅针对下一个输入分词进行匹配， 如果你的 regex 中含有定界符，那永远都不会匹配成功的；

【13.8】StringTokenize（string 字符串分词器）

1）regex正则表达式：在 J2SE 4 中引入的；

2）Scanner类： 是在 Java SE5 中引入的；

在regex 和 Scanner 被引入之前，分割字符串的唯一方式是使用 StringTokenizer 来分词；

因为使用 regex 或 Scanner，能够使用更加复杂的模式分割字符串，StringTokenizer 可以废弃了；

【不过还是给出荔枝-StringTokenizer】

StringTokenizer ， regex， Scanner 分词结果比较：

public class ReplacingStringTokenizer {public static void main(String[] args) {String input = "But I'm not dead yet! I feel happy!";StringTokenizer stoke = new StringTokenizer(input);/* 使用  StringTokenizer 进行分词 */System.out.print("使用  StringTokenizer 进行分词: ");while (stoke.hasMoreElements())System.out.print(stoke.nextToken() + " ");/* 使用 String.split() + regex 进行分词 */System.out.print("\n使用 String.split() + regex 进行分词: ");System.out.println(Arrays.toString(input.split("\\W+")));/* 使用Scanner 进行分词 ，定界符默认为 空格 */System.out.print("使用Scanner 进行分词 ，定界符默认为 空格： ");Scanner scanner = new Scanner(input);scanner.useDelimiter("\\s+"); // 自定义定界符为 空格while (scanner.hasNext())System.out.print(scanner.next() + ";");}
}
/*
使用  StringTokenizer 进行分词: But I'm not dead yet! I feel happy!
使用 String.split() + regex 进行分词: [But, I, m, not, dead, yet, I, feel, happy]
使用Scanner 进行分词 ，定界符默认为 空格： But;I'm;not;dead;yet!;I;feel;happy!;
*/

thinking-in-java(13) String字符串相关推荐

Java中String字符串截取几种方法（substring,split）
Java中String字符串截取几种方法 substring,split 这是一个Java中的String的基础用法的演示. 下面通过代码对大家进行讲解 substring 这里用来ndexOf,la ...
Java中String 字符串与List＜String＞互转
Java中String 字符串与List互转大家好,我是酷酷的韩~ 1.String转List public static void main(String[] args) {String aa = ...
Java中String字符串：空字符串、存放空的字符串、null的区别
Java中String字符串:空字符串.存放空的字符串.null的区别 Java String字符串中有三种特殊的字符串:空字符串.存放空的字符串.字符串为Null,如下所示: String str1 ...
Java将String字符串里的每个字符都分割取出来
Java将String字符串里的每个字符都分割取出来方法一 String str = "play with style"; char[] strArray = str.toCha ...
java基础-String字符串字符长度校验
java基础-String字符串字符长度校验 /*** 校验字符串是否在规定字符数内* @param str* @param maxLength* @return*/ public static bo ...
基于Java的String字符串基本用法总结
在Java中,String类是字符串操作类,提供了多种方法对字符串进行操作,经过学习对String类的常用方法总结如下: 一.String字符串的创建由于String类位于Java的lang包下,所 ...
Java中string字符串的值_Java中的字符串（String）
一.基本概念: 字符串是一系列字符串的序列.在Java语言中字符串是用一对对双引号"",括起来的字符系列.例如"Hello","你好".从数 ...
【Java】String字符串的最大长度是多少？
先说结论: 对于Java字符串的最大长度,要分为2种情况进行讨论. ① 编译期:一般情况下,最大长度为65534. ② 运行期:Integer.MAX_VALUE. 接下来通过对一个字节码文件反编译来 ...
Java中string字符串和char字符之间的千丝万缕
目录前言 String字符串和char字符的区别拓展相互转换 String字符串转换成char字符数组 char字符转换成String字符串 char字符数组转换成String字符串 Strin ...

thinking-in-java(13) String字符串

thinking-in-java(13) String字符串相关推荐

最新文章

热门文章