String类源码剖析

hulingF

2023-09-08

JDK源码

String类源码剖析

一、String类的简介

该String类表示字符串。Java程序中的所有字符串文本（如”abc”）都作为此类的实例实现。字符串是常量，它们的值在创建后无法更改。字符串缓冲区（也就是StringBuffer或者StringBuilder）支持可变字符串。由于字符串对象是不可变的，因此可以共享它们(不存在线程安全问题)。下面是如何使用字符串的更多示例：

System.out.println("abc");
String cde = "cde";
System.out.println("abc" + cde);
String c = "abc".substring(2,3);
String d = cde.substring(1, 2);

下面是String类的详细解读，需要注意特别注意String的拼接操作和对象转String操作。例如，javac编译器可能会利用StringBuffer、StringBuilder、StringConcatFactory等实现字符串拼接操作，而字符串转换操作往往利用从Object类继承的toString方法实现。

当String采用UTF-16编码格式时，辅助多语言平面的字符由一项代理对表示，String类中的索引值表示char码元，因此一个辅助字符在String中占据两个索引位置(两个码元)。String类提供了处理码位的方法以及处理码元(char值)的方法。关于编码相关的内容参考Character类以及Java基础-实用类 | 大军的秘密花园 (hulingf.github.io)。

二、String类的重要字段

public final class String implements Serializable, Comparable<String>, CharSequence {
   @Stable
   //字节数组，存放String的内容，如果你看的是较低版本的源代码，这个变量可能是char[]类型，这个其实是JDK9开始对String做的一个优化
   private final byte[] value;
   //也是和String压缩优化有关，指定当前的LATIN1编码还是UTF16编码
   private final byte coder;
   //缓存的哈希值
   private int hash;
   //序列化版本号
   private static final long serialVersionUID = -6849794470754667710L;
   //优化压缩开关，默认开启
   static final boolean COMPACT_STRINGS = true;
   private static final ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0];
   static final byte LATIN1 = 0;
   static final byte UTF16 = 1;
   //... 下面部分代码省略
 }

从实现的接口角度看String：

String类被final关键字修饰，因此不能被继承。
String的成员变量value使用final修饰，因此是不可变的，线程安全；
String类实现了Serializable接口，可以实现序列化。
String类实现了Comparable，可以比较大小。
String类实现了CharSequence接口，String本质是个数组，低版本中是char数组，JDK9以后优化成byte数组，从String的成员变量value就可以看出来。

关于value字段：字符存储的字节数组，该字段受虚拟机信任，如果String实例是常量，则该字段会被常量折叠。

关于coder字段：用于对value中的字节进行编码的编码标识符。本实现支持的编码值为LATIN1和UTF16。

关于hash字段：缓存的字符串的hashcode值，默认为0。

关于serialVersionUID字段：serialVersionUID是一个序列化版本号，Java 通过这个 UID 来判定反序列化时的字节流与本地类的一致性，如果相同则可以进行反序列化，不同就会异常。

关于serialPersistentFields字段：这个用来保存要进行序列化的字段。默认情况下，所有的非transient、非static修饰的字段都会被序列化，但可以用这个来选择序列化的字段。

关于COMPACT_STRINGS字段：如果禁用了字符串压缩，value中的字节总是以 UTF16 编码。对于有几种可能实现路径的方法，如果禁用字符串压缩，则只采用一种编码路径。对于优化 JIT 编译器来说，实例字段值通常是不透明的。因此，在对性能敏感的地方，首先要明确检查静态布尔值 COMPACT_STRINGS，然后再检查 coder 字段，因为静态布尔值 COMPACT_STRINGS 会被优化的 JIT 编译器常量折叠。对于代码如： if (coder == LATIN1) { … } 可以更优化地写成 if (coder() == LATIN1) { … } 或if (COMPACT_STRINGS && coder == LATIN1) { … } ，优化的 JIT 编译器可以将上述条件折叠为 COMPACT_STRINGS == true => if (coder == LATIN1) { … } 或 COMPACT_STRINGS == false => if (false) { … }。

三、String类的构造方法

第一个构造方法是无参构造方法，因为String是不可变类，因此创建一个空的String类没有必要！

public String() {
    this.value = "".value;
    this.coder = "".coder;
}

第二个构造方法是复制构造方法，注意value是个字节数组，直接赋值相当于两个字符型的value引用都指向同一个字节数组，但是因为String中的字节数组被final修饰不可变且String类没有暴露任何修改字节数组内容的方法，因此共享是允许的！

public String(String original) {
    this.value = original.value;
    this.coder = original.coder;
    this.hash = original.hash;
}

第三个构造方法根据给定的char数组创建一个String对象，需要一一复制字符数组的内容，对字符数组的后续修改不会影响该String对象！

1
2
3

public String(char value[]) {
    this(value, 0, value.length, null);
}

第四个构造方法与上一个类似，不过更加灵活，能够控制char数组的拷贝起始位置和拷贝长度，同时内部函数会检查参数有效性。

public String(char value[], int offset, int count) {
    this(value, offset, count, rangeCheck(value, offset, count));
}

private static Void rangeCheck(char[] value, int offset, int count) {
    checkBoundsOffCount(offset, count, value.length);
    return null;
}

static void checkBoundsOffCount(int offset, int count, int length) {
    if (offset < 0 || count < 0 || offset > length - count) {
        throw new StringIndexOutOfBoundsException(
            "offset " + offset + ", count " + count + ", length " + length);
    }
}

第五个构造方法是包私有的，尾部的Void参数是为了区别于其他（公开的）的构造方法，默认情况下COMPACT_STRING是开启的，关闭该选项可以使用-XX:-CompactStrings参数，如果字符数组的所有字符元素都位于Lantin1的范围内(0x00~0xFF)则使用单字节表示一个char码元，否则使用UTF-16即双字节表示一个char码元（BMP内的字符，如果是辅助平面的字符需要4字节即2个char码元）。

String(char[] value, int off, int len, Void sig) {
    if (len == 0) {
        this.value = "".value;
        this.coder = "".coder;
        return;
    }
    if (COMPACT_STRINGS) {
        byte[] val = StringUTF16.compress(value, off, len);
        if (val != null) {
            this.value = val;
            this.coder = LATIN1;
            return;
        }
    }
    this.coder = UTF16;
    this.value = StringUTF16.toBytes(value, off, len);
}

观察StringUTF16.compress方法，逐一判断char数组的元素是否超出Latin1的最大表示范围0xFF，如果未超出则把char数组的char元素一一强转为byte类型，否则使用UTF-16编码方式。

public static byte[] compress(char[] val, int off, int len) {
        byte[] ret = new byte[len];
        if (compress(val, off, ret, 0, len) == len) {
            return ret;
        }
        return null;
    }

// compressedCopy char[] -> byte[]
@HotSpotIntrinsicCandidate
public static int compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) {
    for (int i = 0; i < len; i++) {
        char c = src[srcOff];
        if (c > 0xFF) {
            len = 0;
            break;
        }
        dst[dstOff] = (byte)c;
        srcOff++;
        dstOff++;
    }
    return len;
}

观察StringUTF16.toBytes方法，所做的工作就是把char数组转换成byte数组，具体看代码注释：

public static byte[] toBytes(char[] value, int off, int len) {
    // 计算字节数组长度并创建
    byte[] val = newBytesFor(len);
    // 填充字节数组
    for (int i = 0; i < len; i++) {
        putChar(val, i, value[off]);
        off++;
    }
    return val;
}

public static byte[] newBytesFor(int len) {
    // 检查len参数的有效性
    if (len < 0) {
        throw new NegativeArraySizeException();
    }
    if (len > MAX_LENGTH) {
        throw new OutOfMemoryError("UTF16 String size is " + len +
                                   ", should be less than " + MAX_LENGTH);
    }
    // 两个字节表示一个char码元
    return new byte[len << 1];
}

static void putChar(byte[] val, int index, int c) {
    // 内部函数执行无边界检查
    assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
    // 两个字节表示一个char码元，字节数据的先后位置考虑到大小端，默认是大端字节序
    index <<= 1;
    val[index++] = (byte)(c >> HI_BYTE_SHIFT);
    val[index]   = (byte)(c >> LO_BYTE_SHIFT);
}

public static int length(byte[] value) {
    return value.length >> 1;
}

static final int HI_BYTE_SHIFT;
static final int LO_BYTE_SHIFT;
static {
if (isBigEndian()) {
    HI_BYTE_SHIFT = 8;
    LO_BYTE_SHIFT = 0;
} else {
    HI_BYTE_SHIFT = 0;
    LO_BYTE_SHIFT = 8;
}
}

第六个构造函数是根据int码位数组创建String对象的，这跟上一个构造方法类似，不过需要注意的是采用UTF-16编码方式时需要计算int码位数组对应的byte数组的长度，尤其是==辅助平面的码位需要两个char码元==。

public String(int[] codePoints, int offset, int count) {
    // 检查参数有效性
    checkBoundsOffCount(offset, count, codePoints.length);
    // 空字符串
    if (count == 0) {
        this.value = "".value;
        this.coder = "".coder;
        return;
    }
    // 字符压缩，尝试使用Lantin1编码方式创建String对象
    if (COMPACT_STRINGS) {
        byte[] val = StringLatin1.toBytes(codePoints, offset, count);
        if (val != null) {
            this.coder = LATIN1;
            this.value = val;
            return;
        }
    }
    this.coder = UTF16;
    this.value = StringUTF16.toBytes(codePoints, offset, count);
}

观察StringLatin1.toBytes方法，使用单字节表示一个码位，压缩了String对象的占用空间(相比于UTF-16减少了50%)：

public static byte[] toBytes(int[] val, int off, int len) {
    // 创建等长的字节数组
    byte[] ret = new byte[len];
    for (int i = 0; i < len; i++) {
        int cp = val[off++];
        // 如果超出Lantin1最大范围，直接返回null
        if (!canEncode(cp)) {
            return null;
        }
        // 单字节表示一个码位
        ret[i] = (byte)cp;
    }
    return ret;
}

public static boolean canEncode(int cp) {
    // cp位于0x00~0xFF范围（Lantin1）
    return cp >>> 8 == 0;
}

观察StringUTF16.toBytes，具体逻辑见代码注释：

public static byte[] toBytes(int[] val, int index, int len) {
    final int end = index + len;
    // Pass 1: Compute precise size of char[]
    int n = len;
    for (int i = index; i < end; i++) {
        int cp = val[i];
        // 码位处于基本多语言平面内，一个char码元可以表示一个码位
        if (Character.isBmpCodePoint(cp))
            continue;
        // 码位处于辅助平面内，两个char码元可以表示一个码位
        else if (Character.isValidCodePoint(cp))
            n++;
        // 码位不正常，抛出异常
        else throw new IllegalArgumentException(Integer.toString(cp));
    }
    // Pass 2: Allocate and fill in <high, low> pair
    byte[] buf = newBytesFor(n);
    for (int i = index, j = 0; i < end; i++, j++) {
        int cp = val[i];
        if (Character.isBmpCodePoint(cp)) {
            // 码位处于基本多语言平面内
            putChar(buf, j, cp);
        } else {
            // 码位处于辅助平面内，需要填充代理对（高位代理/低位代理）
            putChar(buf, j++, Character.highSurrogate(cp));
            putChar(buf, j, Character.lowSurrogate(cp));
        }
    }
    return buf;
}

// len是char码元的数量，转换为byte需要加倍
public static byte[] newBytesFor(int len) {
    if (len < 0) {
        throw new NegativeArraySizeException();
    }
    if (len > MAX_LENGTH) {
        throw new OutOfMemoryError("UTF16 String size is " + len +
                                   ", should be less than " + MAX_LENGTH);
    }
    return new byte[len << 1];
}

public static boolean isBmpCodePoint(int codePoint) {
    return codePoint >>> 16 == 0;
    // Optimized form of:
    //     codePoint >= MIN_VALUE('\u0000') && codePoint <= MAX_VALUE('\uFFFF')
    // We consistently use logical shift (>>>) to facilitate
    // additional runtime optimizations.
}

public static boolean isValidCodePoint(int codePoint) {
    // Optimized form of:
    //     codePoint >= MIN_CODE_POINT(0x000000) && codePoint <= MAX_CODE_POINT(0X10FFFF)
    int plane = codePoint >>> 16;
    return plane < ((MAX_CODE_POINT + 1) >>> 16);
}

public static char highSurrogate(int codePoint) {
    // public static final char MIN_HIGH_SURROGATE = '\uD800';
    // public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;
    // 辅助码位范围是0x010000~10FFFF,码位-0x010000的范围是0x00000~0FFFFF,使用20位表示
    // 相当于[(码位-0x010000)的高10位+0xD800]作为高位代理
    return (char) ((codePoint >>> 10)
                   + (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));
}

public static char lowSurrogate(int codePoint) {
    // public static final char MIN_LOW_SURROGATE  = '\uDC00';
    // 相当于[(码位-0x010000)的低10位+0xDC00]作为低位代理
    return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);
}

static void putChar(byte[] val, int index, int c) {
    assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
    index <<= 1;
    val[index++] = (byte)(c >> HI_BYTE_SHIFT);
    val[index]   = (byte)(c >> LO_BYTE_SHIFT);
}

四、String类的一些方法

length()方法返回String对象中char码元的数量：

public int length() {
    return value.length >> coder();
}
byte coder() {
    return COMPACT_STRINGS ? coder : UTF16;
}
@Native static final byte LATIN1 = 0;
@Native static final byte UTF16  = 1;

isEmpty()方法判断String对象是否是空字符串：

1
2
3

public boolean isEmpty() {
    return value.length == 0;
}

isLatin1()方法判断String对象是否采用Latin1编码方式：

1
2
3

private boolean isLatin1() {
    return COMPACT_STRINGS && coder == LATIN1;
}

charAt(int index)方法返回char指定索引处的值。索引的范围从0到length()-1。char序列的第一个值在索引0处，下一个值在索引1处，依此类推，就像数组索引一样。如果索引指定的chat值是代理项，则返回代理项值。

public char charAt(int index) {
    if (isLatin1()) {
        return StringLatin1.charAt(value, index);
    } else {
        return StringUTF16.charAt(value, index);
    }
}

// StringLatin1.charAt
public static char charAt(byte[] value, int index) {
    if (index < 0 || index >= value.length) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return (char)(value[index] & 0xff);
}

// StringUTF16.charAt
public static char charAt(byte[] value, int index) {
    // 检查index参数有效性
    checkIndex(index, value);
    // 获取指定索引处的char码元值
    return getChar(value, index);
}

public static void checkIndex(int off, byte[] val) {
    String.checkIndex(off, length(val));
}

// 计算码元的数量=byte数组长度/2
public static int length(byte[] value) {
    return value.length >> 1;
}

// index为负数或者超出length，则抛出字符串索引越界异常
static void checkIndex(int index, int length) {
    if (index < 0 || index >= length) {
        throw new StringIndexOutOfBoundsException("index " + index +
                                                  ",length " + length);
    }
}

static char getChar(byte[] val, int index) {
    assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
    index <<= 1;
    return (char)(((val[index++] & 0xff) << HI_BYTE_SHIFT) |
                  ((val[index]   & 0xff) << LO_BYTE_SHIFT));
}

public int codePointAt(int index) {
    if (isLatin1()) {
        checkIndex(index, value.length);
        return value[index] & 0xff;
    }
    int length = value.length >> 1;
    checkIndex(index, length);
    return StringUTF16.codePointAt(value, index, length);
}

public static int codePointAt(byte[] value, int index, int end) {
    return codePointAt(value, index, end, false /* unchecked */);
}

private static int codePointAt(byte[] value, int index, int end, boolean checked) {
    assert index < end;
    if (checked) {
        checkIndex(index, value);
    }
    // 获取指定索引处的码元值
    char c1 = getChar(value, index);
    // 如果该码元值是高位代理项的值并且后一个码元存在
    if (Character.isHighSurrogate(c1) && ++index < end) {
        if (checked) {
            checkIndex(index, value);
        }
        // 获取后一个索引处的码元值
        char c2 = getChar(value, index);
        // 如果该码元值是低位代理项的值，则返回整个辅助码位(由高/低代理对表示)
        if (Character.isLowSurrogate(c2)) {
            return Character.toCodePoint(c1, c2);
        }
    }
    return c1;
}

// 通过代理对计算辅助码位
public static int toCodePoint(char high, char low) {
    // public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;
    // public static final char MIN_HIGH_SURROGATE = '\uD800';
    // public static final char MIN_LOW_SURROGATE  = '\uDC00';
    // Optimized form of:
    // return ((high - MIN_HIGH_SURROGATE) << 10)
    //         + (low - MIN_LOW_SURROGATE)
    //         + MIN_SUPPLEMENTARY_CODE_POINT;
    return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT
                                   - (MIN_HIGH_SURROGATE << 10)
                                   - MIN_LOW_SURROGATE);

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
    checkBoundsBeginEnd(srcBegin, srcEnd, length());
    checkBoundsOffCount(dstBegin, srcEnd - srcBegin, dst.length);
    if (isLatin1()) {
        StringLatin1.getChars(value, srcBegin, srcEnd, dst, dstBegin);
    } else {
        StringUTF16.getChars(value, srcBegin, srcEnd, dst, dstBegin);
    }
}

static void checkBoundsBeginEnd(int begin, int end, int length) {
    if (begin < 0 || begin > end || end > length) {
        throw new StringIndexOutOfBoundsException(
            "begin " + begin + ", end " + end + ", length " + length);
    }
}

static void checkBoundsOffCount(int offset, int count, int length) {
    if (offset < 0 || count < 0 || offset > length - count) {
        throw new StringIndexOutOfBoundsException(
            "offset " + offset + ", count " + count + ", length " + length);
    }
}

// StringLatin1.getChars
public static void getChars(byte[] value, int srcBegin, int srcEnd, char dst[], int dstBegin) {
    inflate(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}

// inflatedCopy byte[] -> char[]
@HotSpotIntrinsicCandidate
public static void inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) {
    for (int i = 0; i < len; i++) {
        dst[dstOff++] = (char)(src[srcOff++] & 0xff);
    }
}

// StringUTF16.getChars
public static void getChars(byte[] value, int srcBegin, int srcEnd, char dst[], int dstBegin) {
    // We need a range check here because 'getChar' has no checks
    if (srcBegin < srcEnd) {
        checkBoundsOffCount(srcBegin, srcEnd - srcBegin, value);
    }
    for (int i = srcBegin; i < srcEnd; i++) {
        dst[dstBegin++] = getChar(value, i);
    }
}

static char getChar(byte[] val, int index) {
    assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
    index <<= 1;
    return (char)(((val[index++] & 0xff) << HI_BYTE_SHIFT) |
                  ((val[index]   & 0xff) << LO_BYTE_SHIFT));
}

五、String类的equals方法

将此字符串与指定的对象进行比较。当且仅当参数不是null，并且是与此对象具有相同字符序列的String对象。

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String aString = (String)anObject;
        if (coder() == aString.coder()) {
            return isLatin1() ? StringLatin1.equals(value, aString.value)
                : StringUTF16.equals(value, aString.value);
        }
    }
    return false;
}

// StringLatin1.equals
public static boolean equals(byte[] value, byte[] other) {
    if (value.length == other.length) {
        for (int i = 0; i < value.length; i++) {
            if (value[i] != other[i]) {
                return false;
            }
        }
        return true;
    }
    return false;
}

// StringUTF16.equals
public static boolean equals(byte[] value, byte[] other) {
    if (value.length == other.length) {
        int len = value.length >> 1;
        for (int i = 0; i < len; i++) {
            if (getChar(value, i) != getChar(other, i)) {
                return false;
            }
        }
        return true;
    }
    return false;
}

六、String类的compareTo方法

String继承Comparable接口并重写compareTo方法，比较的原则是从前往后逐个char码元按字典顺序比较大小，如果都相同则比较String对象的长度！

public int compareTo(String anotherString) {
    byte v1[] = value;
    byte v2[] = anotherString.value;
    if (coder() == anotherString.coder()) {
        return isLatin1() ? StringLatin1.compareTo(v1, v2)
            : StringUTF16.compareTo(v1, v2);
    }
    return isLatin1() ? StringLatin1.compareToUTF16(v1, v2)
        : StringUTF16.compareToLatin1(v1, v2);
}

// StringLatin1.compareTo
public static int compareTo(byte[] value, byte[] other) {
    int len1 = value.length;
    int len2 = other.length;
    return compareTo(value, other, len1, len2);
}

public static int compareTo(byte[] value, byte[] other, int len1, int len2) {
    int lim = Math.min(len1, len2);
    for (int k = 0; k < lim; k++) {
        if (value[k] != other[k]) {
            return getChar(value, k) - getChar(other, k);
        }
    }
    return len1 - len2;
}

// StringUTF16.compareTo
public static int compareTo(byte[] value, byte[] other) {
    int len1 = length(value);
    int len2 = length(other);
    return compareValues(value, other, len1, len2);
}

private static int compareValues(byte[] value, byte[] other, int len1, int len2) {
    int lim = Math.min(len1, len2);
    for (int k = 0; k < lim; k++) {
        char c1 = getChar(value, k);
        char c2 = getChar(other, k);
        if (c1 != c2) {
            return c1 - c2;
        }
    }
    return len1 - len2;
}

// StringLatin1.compareToUTF16
public static int compareToUTF16(byte[] value, byte[] other) {
    int len1 = length(value);
    int len2 = StringUTF16.length(other);
    return compareToUTF16Values(value, other, len1, len2);
}

private static int compareToUTF16Values(byte[] value, byte[] other, int len1, int len2) {
    int lim = Math.min(len1, len2);
    for (int k = 0; k < lim; k++) {
        char c1 = getChar(value, k);
        char c2 = StringUTF16.getChar(other, k);
        if (c1 != c2) {
            return c1 - c2;
        }
    }
    return len1 - len2;
}

// StringUTF16.compareToLatin1
public static int compareToLatin1(byte[] value, byte[] other) {
    return -StringLatin1.compareToUTF16(other, value);
}

七、String类的startWith方法

public boolean startsWith(String prefix, int toffset) {
    // Note: toffset might be near -1>>>1.
    if (toffset < 0 || toffset > length() - prefix.length()) {
        return false;
    }
    byte ta[] = value;
    byte pa[] = prefix.value;
    int po = 0;
    int pc = pa.length;
    if (coder() == prefix.coder()) {
        int to = isLatin1() ? toffset : toffset << 1;
        while (po < pc) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
    } else {
        if (isLatin1()) {  // && pcoder == UTF16
            return false;
        }
        // coder == UTF16 && pcoder == LATIN1)
        while (po < pc) {
            if (StringUTF16.getChar(ta, toffset++) != (pa[po++] & 0xff)) {
                return false;
            }
        }
    }
    return true;
}

public boolean startsWith(String prefix) {
    return startsWith(prefix, 0);
}

public boolean endsWith(String suffix) {
    return startsWith(suffix, length() - suffix.length());
}

八、String类的hashCode方法

public int hashCode() {
    int h = hash;
    // 空字符串的哈希值为0
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
            : StringUTF16.hashCode(value);
    }
    return h;
}

// StringLatin1.hashCode
public static int hashCode(byte[] value) {
    int h = 0;
    for (byte v : value) {
        h = 31 * h + (v & 0xff);
    }
    return h;
}

// StringUTF16.hashCode
public static int hashCode(byte[] value) {
    int h = 0;
    int length = value.length >> 1;
    for (int i = 0; i < length; i++) {
        h = 31 * h + getChar(value, i);
    }
    return h;
}

九、String类的subString方法

subString方法是截取子字符串的方法，截取的开始索引范围是0~length。

public String substring(int beginIndex) {
    // 起始坐标不能为负数
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    // 子字符串的长度不能为负数
    int subLen = length() - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    // 快速返回自身引用
    if (beginIndex == 0) {
        return this;
    }
    return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
        : StringUTF16.newString(value, beginIndex, subLen);
}

// StringLatin1.newString
public static String newString(byte[] val, int index, int len) {
    return new String(Arrays.copyOfRange(val, index, index + len),
                      LATIN1);
}

// StringUTF16.newString
public static String newString(byte[] val, int index, int len) {
    // 尝试使用Lantin1编码方式是否可行
    if (String.COMPACT_STRINGS) {
        byte[] buf = compress(val, index, len);
        if (buf != null) {
            return new String(buf, LATIN1);
        }
    }
    int last = index + len;
    return new String(Arrays.copyOfRange(val, index << 1, last << 1), UTF16);
}

public static byte[] compress(byte[] val, int off, int len) {
    byte[] ret = new byte[len];
    if (compress(val, off, ret, 0, len) == len) {
        return ret;
    }
    return null;
}

public static int compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) {
    // We need a range check here because 'getChar' has no checks
    checkBoundsOffCount(srcOff, len, src);
    for (int i = 0; i < len; i++) {
        char c = getChar(src, srcOff);
        // 检查char码元值是否超出Lantin1的最大表示范围
        if (c > 0xFF) {
            len = 0;
            break;
        }
        dst[dstOff] = (byte)c;
        srcOff++;
        dstOff++;
    }
    return len;
}

// UTF-16编码方式下两个字节表示一个16位的char码元
static char getChar(byte[] val, int index) {
    assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
    index <<= 1;
    return (char)(((val[index++] & 0xff) << HI_BYTE_SHIFT) |
                  ((val[index]   & 0xff) << LO_BYTE_SHIFT));
}

从上面可以看出，最终很多都会调用到Arrays.copyOfRange方法进行实际的数组拷贝操作，需要注意的是当需要拷贝的长度即to-from大于原始数组的剩余长度即origin.length-from时，进行补零操作！

public static byte[] copyOfRange(byte[] original, int from, int to) {
        int newLength = to - from;
        if (newLength < 0)
            throw new IllegalArgumentException(from + " > " + to);
        byte[] copy = new byte[newLength];
        System.arraycopy(original, from, copy, 0,
                         Math.min(original.length - from, newLength));
        return copy;
    }

public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

当然，subString方法还有重载方法，可以控制子字符串的起始位置和终止位置，具体如下：

public String substring(int beginIndex, int endIndex) {
    int length = length();
    checkBoundsBeginEnd(beginIndex, endIndex, length);
    int subLen = endIndex - beginIndex;
    // 等价于subString(0)
    if (beginIndex == 0 && endIndex == length) {
        return this;
    }
    return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
        : StringUTF16.newString(value, beginIndex, subLen);
}

static void checkBoundsBeginEnd(int begin, int end, int length) {
    if (begin < 0 || begin > end || end > length) {
        throw new StringIndexOutOfBoundsException(
            "begin " + begin + ", end " + end + ", length " + length);
    }
}

十、String类的concat方法

将指定的字符串连接到此字符串的末尾。如果参数字符串的长度为0，则返回此String对象。否则，将返回一个String表示字符序列的对象，该字符序列是此String对象表示的字符序列和参数字符串表示的字符序列的串联。

public String concat(String str) {
    // 参数字符串是空字符串，直接返回原字符串对象
    if (str.isEmpty()) {
        return this;
    }
    // 编码方式相同时，直接拼接即可
    if (coder() == str.coder()) {
        byte[] val = this.value;
        byte[] oval = str.value;
        int len = val.length + oval.length;
        byte[] buf = Arrays.copyOf(val, len);
        System.arraycopy(oval, 0, buf, val.length, oval.length);
        return new String(buf, coder);
    }
    // 编码方式不同时，统一转换成UTF-16
    int len = length();
    int olen = str.length();
    byte[] buf = StringUTF16.newBytesFor(len + olen);
    getBytes(buf, 0, UTF16);
    str.getBytes(buf, len, UTF16);
    return new String(buf, UTF16);
}

void getBytes(byte dst[], int dstBegin, byte coder) {
    if (coder() == coder) {
        System.arraycopy(value, 0, dst, dstBegin << coder, value.length);
    } else {    // this.coder == LATIN && coder == UTF16
        StringLatin1.inflate(value, 0, dst, dstBegin, value.length);
    }
}