AbstractStringBuilder类源码剖析

hulingF

2023-09-09

JDK源码

AbstractStringBuilder类源码剖析

一、AbstractStringBuilder类的简介

AbstractStringBuilder类代表可变字符序列，实现可修改的字符串，在任何时间点，它都包含一些特定的字符序列，但序列的长度和内容可以通过某些方法调用来更改。除非另有说明，否则将null参数传递给此类中的构造函数或方法将导致引发NullPointerException。

AbstractStringBuilder类实现了两个接口Appendable和CharSequence，Appendable接口表明可以往该类添加字符序列，Appendable接口本身不保证多线程访问的安全性，具体内容由继承的子类选择是否实现，CharSequence接口表明该类是一个可读的字符序列，并且为不同类型的字符序列(具体的实现类)提供了通用的、只读的访问方法，其中中的字符值(char码元)表示BMP内的一个码位或者辅助平面的一个高/低位代理项(surrogate)。

注意CharSequence接口本身没有完善equals和hashCode方法的一般规约，因此继承了CharSequence接口的两个不同类直接的相等性比较是未定义的，因此，将任意CharSequence实例用作set中的元素或map中的键是不合适的。

二、AbstractStringBuilder类的字段

// 存储字符序列的实际字节数组
byte[] value;
// 针对value的编码字段
byte coder;
// 已经使用的字符数量
int count;
// 预定义的空字节数组
private static final byte[] EMPTYVALUE = new byte[0];
// 待分配的数组的最大长度，之所以是Integer.MAX_VALUE-8是因为[某些]VM会在数组中保留一些标头字。尝试分配更大的数组可能会导致OutOfMemoryError即请求的数组大小超过VM限制
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

作为可变字符串的经典代表，AbstractStringBuilder的字段跟String的非常相似，具体含义也是基本保持一致。==尤其注意，count字段表示使用的字符数量，coder采用LATIN1时单个字节表示一个字符，coder采用UTF-16时双字节表示一个字符==。

三、AbstractStringBuilder类的构造方法

第一个是无参构造方法，源代码注释里面提到这个无参构造函数对于子类的序列化来说是必要的。这很重要，因为如果父类既没有实现Serializable接口也没有无参构造方法，那么当子类实现了Serializable接口后进行序列化与反序列化时会抛出异常（具体来说序列化时忽略父类，反序列化时因为缺少父类的无参构造函数导致InvalidClassException）。

1
2
3

AbstractStringBuilder() {
    value = EMPTYVALUE;
}

第二个是含有容量参数的构造方法，当COMPACT_STRINGS选项开启时(默认开启)，AbstractStringBuilder采用LATIN1编码方式存储字符序列数据，否则转而使用UTF16编码方式。

AbstractStringBuilder(int capacity) {
    if (COMPACT_STRINGS) {
        value = new byte[capacity];
        coder = LATIN1;
    } else {
        value = StringUTF16.newBytesFor(capacity);
        coder = UTF16;
    }
}

四、AbstractStringBuilder类的一些方法

compareTo也是非常简单的一个方法，就是按照字典顺序比较两个AbstractStringBuilder对象的大小，主要需要考虑比较的两个对象是否采用一致的编码方式，如果一致（都是LATIN1或者UTF16）直接从前往后依次取字符比较即可，对于LATIN1来说单字节表示一个字符，对于UTF16来说双字节表示一个char码元（基本多语言平面的字符或者辅助平面的代理字符项）。如果编码方式不一样的话，需要根据编码方式的特点从前往后分别取出相应位置的字符比较，其中StringUTF16.compareToLatin1复用StringLatin1.compareToUTF16的实现。

int compareTo(AbstractStringBuilder another) {
    if (this == another) {
        return 0;
    }

    byte val1[] = value;
    byte val2[] = another.value;
    int count1 = this.count;
    int count2 = another.count;

    if (coder == another.coder) {
        return isLatin1() ? StringLatin1.compareTo(val1, val2, count1, count2)
            : StringUTF16.compareTo(val1, val2, count1, count2);
    }
    return isLatin1() ? StringLatin1.compareToUTF16(val1, val2, count1, count2)
        : StringUTF16.compareToLatin1(val1, val2, count1, count2);
}

public AbstractStringBuilder reverse() {
    byte[] val = this.value;
    int count = this.count;
    int coder = this.coder;
    int n = count - 1;
    if (COMPACT_STRINGS && coder == LATIN1) {
        for (int j = (n-1) >> 1; j >= 0; j--) {
            int k = n - j;
            byte cj = val[j];
            val[j] = val[k];
            val[k] = cj;
        }
    } else {
        StringUTF16.reverse(val, count);
    }
    return this;
}

五、AbstractStringBuilder类的查找方法

length方法返回此对象当前表示的字符序列的长度。

@Override
public int length() {
    return count;
}

capacity方法返回当前容量。容量是可用于新插入字符的存储量，超出该存储量将进行分配。

1
2
3

public int capacity() {
    return value.length >> coder;
}

charAt方法返回指定索引位置的字符（如果是UTF16的代理项也要返回）。

public char charAt(int index) {
    checkIndex(index, count);
    if (isLatin1()) {
        return (char)(value[index] & 0xff);
    }
    return StringUTF16.charAt(value, index);
}

// StringUTF16.charAt
public static char charAt(byte[] value, int index) {
    checkIndex(index, value);
    return getChar(value, index);
}

codePointAt与charAt类似，不过该方法返回的是码位，如果指定索引位置的字符码元是高代理项，并且后一个字符码元是低代理项同时未超出字符序列的长度，则返回由高、低位代理对共同组成的增补字符码位。

public int codePointAt(int index) {
    int count = this.count;
    byte[] value = this.value;
    checkIndex(index, count);
    // Lantin1编码直接返回数组索引处的字符码位
    if (isLatin1()) {
        return value[index] & 0xff;
    }
    // UTF16编码针对基本/辅助多语言平面内的字符码位作不同的处理
    return StringUTF16.codePointAtSB(value, index, count);
}

// 最终调用到StringUTF16的这个方法
private static int codePointAt(byte[] value, int index, int end, boolean checked) {
    assert index < end;
    if (checked) {
        checkIndex(index, value);
    }
    char c1 = getChar(value, index);
    if (Character.isHighSurrogate(c1) && ++index < end) {
        if (checked) {
            checkIndex(index, value);
        }
        char c2 = getChar(value, index);
        if (Character.isLowSurrogate(c2)) {
            return Character.toCodePoint(c1, c2);
        }
    }
    return c1;
}

getChars方法返回由srcBegin、srcEnd确定的字符序列范围[srcBegin,srcEnd)并拷贝至目标字符数组中。

public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
{
    checkRangeSIOOBE(srcBegin, srcEnd, count);  // compatible to old version
    int n = srcEnd - srcBegin;
    checkRange(dstBegin, dstBegin + n, dst.length);
    if (isLatin1()) {
        StringLatin1.getChars(value, srcBegin, srcEnd, dst, dstBegin);
    } else {
        StringUTF16.getChars(value, srcBegin, srcEnd, dst, dstBegin);
    }
}

// StringLatin1.getChars
public static void getChars(byte[] value, int srcBegin, int srcEnd, char dst[], int dstBegin) {
    inflate(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}

// StringLatin1.inflate
public static void inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) {
    for (int i = 0; i < len; i++) {
        dst[dstOff++] = (char)(src[srcOff++] & 0xff);
    }
}

// StringUTF16.getChars
public static void getChars(byte[] value, int srcBegin, int srcEnd, char dst[], int dstBegin) {
    // We need a range check here because 'getChar' has no checks
    if (srcBegin < srcEnd) {
        checkBoundsOffCount(srcBegin, srcEnd - srcBegin, value);
    }
    for (int i = srcBegin; i < srcEnd; i++) {
        dst[dstBegin++] = getChar(value, i);
    }
}

六、AbstractStringBuilder类的添加方法

append方法有很多重载，下面这个就是把String.valueOf(obj)追加到该类表示的字符序列。

public AbstractStringBuilder append(Object obj) {
    return append(String.valueOf(obj));
}

public AbstractStringBuilder append(String str) {
    if (str == null) {
        return appendNull();
    }
    int len = str.length();
    // 需要保证字符容量能够承载下追加字符串后的字符总数
    ensureCapacityInternal(count + len);
    // 在count字符索引处追加字符串
    putStringAt(count, str);
    // 修改已使用字符的数量
    count += len;
    return this;
}

private final void putStringAt(int index, String str) {
    // 可变字符序列的编码方案与添加的字符串编码方案不一致，需要先把可变字符序列编码方案升级为UTF16
    if (getCoder() != str.coder()) {
        inflate();
    }
    // 添加的字符串的内容拷贝至可变字符序列中
    str.getBytes(value, index, coder);
}

// 可变字符串的编码方案修改为UTF-16
private void inflate() {
    if (!isLatin1()) {
        return;
    }
    byte[] buf = StringUTF16.newBytesFor(value.length);
    StringLatin1.inflate(value, 0, buf, 0, count);
    this.value = buf;
    this.coder = UTF16;
}

append方法的另一种重载可以追加AbstractStringBuilder对象，具体逻辑也很简单，见下面代码：

// Documentation in subclasses because of synchro difference
public AbstractStringBuilder append(StringBuffer sb) {
    return this.append((AbstractStringBuilder)sb);
}

AbstractStringBuilder append(AbstractStringBuilder asb) {
    // 如果传入的asb参数为null，则追加"null"四个字符
    if (asb == null) {
        return appendNull();
    }
    int len = asb.length();
    // 确保容量足够
    ensureCapacityInternal(count + len);
    // 如果编码方式不一致，则把该类的字符序列改为UTF16编码（如果本来就是UTF16则忽略）
    if (getCoder() != asb.getCoder()) {
        inflate();
    }
    // 追加字符
    asb.getBytes(value, count, coder);
    // 修改已使用字符数
    count += len;
    return this;
}

private AbstractStringBuilder appendNull() {
    ensureCapacityInternal(count + 4);
    int count = this.count;
    byte[] val = this.value;
    if (isLatin1()) {
        val[count++] = 'n';
        val[count++] = 'u';
        val[count++] = 'l';
        val[count++] = 'l';
    } else {
        count = StringUTF16.putCharsAt(val, count, 'n', 'u', 'l', 'l');
    }
    this.count = count;
    return this;
}

void getBytes(byte dst[], int dstBegin, byte coder) {
    if (this.coder == coder) {
        System.arraycopy(value, 0, dst, dstBegin << coder, count << coder);
    } else {        // this.coder == LATIN && coder == UTF16
        StringLatin1.inflate(value, 0, dst, dstBegin, count);
    }
}

append追加CharSequence对象时，如果对象为null则追加”null”四个字符，如果对象是String类的实例则调用其重载方法append(String str)，如果对象是AbstractStringBuilder类的实例则调用其重载方法append(AbstractStringBuilder asb)，否则调用重载方法append(CharSequence s, int start, int end)。

// Documentation in subclasses because of synchro difference
@Override
public AbstractStringBuilder append(CharSequence s) {
    if (s == null) {
        return appendNull();
    }
    if (s instanceof String) {
        return this.append((String)s);
    }
    if (s instanceof AbstractStringBuilder) {
        return this.append((AbstractStringBuilder)s);
    }
    return this.append(s, 0, s.length());
}

public AbstractStringBuilder append(CharSequence s, int start, int end) {
    if (s == null) {
        s = "null";
    }
    checkRange(start, end, s.length());
    int len = end - start;
    ensureCapacityInternal(count + len);
    appendChars(s, start, end);
    return this;
}

private final void appendChars(CharSequence s, int off, int end) {
    if (isLatin1()) {
        byte[] val = this.value;
        for (int i = off, j = count; i < end; i++) {
            char c = s.charAt(i);
            // 待添加的字符位于Lantin1范围，可以直接追加
            if (StringLatin1.canEncode(c)) {
                val[j++] = (byte)c;
            } else {
                // j表示当前的字符序列的长度
                count = j;
                // 超出Latin1范围的话需要先把Latin1编码的字符序列转化为UTF16编码
                inflate();
                // 在字符索引j处继续追加剩余的字符[i,end)
                StringUTF16.putCharsSB(this.value, j, s, i, end);
                // 修改有效字符数
                count += end - i;
                return;
            }
        }
    } else {
        StringUTF16.putCharsSB(this.value, count, s, off, end);
    }
    count += end - off;
}

append追加基本数据类型的几个重载函数如下，仅仅是举了几个例子：

public AbstractStringBuilder append(boolean b) {
    ensureCapacityInternal(count + (b ? 4 : 5));
    int count = this.count;
    byte[] val = this.value;
    if (isLatin1()) {
        if (b) {
            val[count++] = 't';
            val[count++] = 'r';
            val[count++] = 'u';
            val[count++] = 'e';
        } else {
            val[count++] = 'f';
            val[count++] = 'a';
            val[count++] = 'l';
            val[count++] = 's';
            val[count++] = 'e';
        }
    } else {
        if (b) {
            count = StringUTF16.putCharsAt(val, count, 't', 'r', 'u', 'e');
        } else {
            count = StringUTF16.putCharsAt(val, count, 'f', 'a', 'l', 's', 'e');
        }
    }
    this.count = count;
    return this;
}

@Override
public AbstractStringBuilder append(char c) {
    ensureCapacityInternal(count + 1);
    if (isLatin1() && StringLatin1.canEncode(c)) {
        value[count++] = (byte)c;
    } else {
        if (isLatin1()) {
            inflate();
        }
        StringUTF16.putCharSB(value, count++, c);
    }
    return this;
}

public AbstractStringBuilder append(int i) {
    int count = this.count;
    int spaceNeeded = count + Integer.stringSize(i);
    ensureCapacityInternal(spaceNeeded);
    if (isLatin1()) {
        Integer.getChars(i, spaceNeeded, value);
    } else {
        StringUTF16.getChars(i, count, spaceNeeded, value);
    }
    this.count = spaceNeeded;
    return this;
}

七、AbstractStringBuilder类的删除方法

delete操作用于删除字符序列的某一部分，详细解释见源代码注释：

public AbstractStringBuilder delete(int start, int end) {
        int count = this.count;
    	// end最多是字符序列的长度count
        if (end > count) {
            end = count;
        }
    	// 检查参数有效性
        checkRangeSIOOBE(start, end, count);
    	// 待删除的字符数量
        int len = end - start;
        if (len > 0) {
            // 删除字符序列的某一部分
            shift(end, -len);
            // 减少字符序列的长度
            this.count = count - len;
        }
        return this;
    }

private static void checkRangeSIOOBE(int start, int end, int len) {
        if (start < 0 || start > end || end > len) {
            throw new StringIndexOutOfBoundsException(
                "start " + start + ", end " + end + ", length " + len);
        }
    }

//  注意：这里的n是负数，因此下面的代码含义是把删除序列后面的子字符序列前移拼接
private void shift(int offset, int n) {
        System.arraycopy(value, offset << coder,
                         value, (offset + n) << coder, (count - offset) << coder);
    }

public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

deleteCharAt方法删除此序列中指定位置的字符，此序列缩短了一个字符。如果给定索引处的字符是增补字符，则此方法不会删除整个字符。如果需要正确处理增补字符，请通过调用Character.charCount(thisSequence.codePointAt(index))来确定要删除的字符数，其中thisSequence是此序列。

public AbstractStringBuilder deleteCharAt(int index) {
    checkIndex(index, count);
    shift(index + 1, -1);
    count--;
    return this;
}

八、AbstractStringBuilder类的修改方法

ensureCapacity方法确保容量至少等于指定的最小值。如果当前容量小于参数值，则会分配具有更大容量的新内部数组。新容量是以下容量中较大的一个：minimumCapacity参数和原来的capacity参数*2+2。如果minimumCapacity参数为非正数，则此方法不执行任何操作仅返回。请注意，对此对象的后续操作可能会使实际容量低于此处请求的容量。

public void ensureCapacity(int minimumCapacity) {
    if (minimumCapacity > 0) {
        ensureCapacityInternal(minimumCapacity);
    }
}

ensureCapacityInternal方法的行为与ensureCapacity类似，不过不保证线程同步。如果minimumCapacity参数小于当前容量，则不做任何改动。

private void ensureCapacityInternal(int minimumCapacity) {
    // overflow-conscious code
    int oldCapacity = value.length >> coder;
    if (minimumCapacity - oldCapacity > 0) {
        value = Arrays.copyOf(value,
                              newCapacity(minimumCapacity) << coder);
    }
}

newCapacity返回至少与给定minCapacity参数值一样大的容量。返回原容量*2+2（如果足够）。不会返回大于MAX_ARRAY_SIZE>>coder（绝对安全容量界限）的容量，除非给定的minCapacity参数值大于该容量（即使如此，也不会返回大于Integer.MAX_VALUE>>coder的容量）。

private int newCapacity(int minCapacity) {
    // overflow-conscious code
    // 当前容量
    int oldCapacity = value.length >> coder;
    // 计算新容量=当前容量*2+2
    int newCapacity = (oldCapacity << 1) + 2;
    // 计算新容量=max(当前容量*2+2,minCapacity)
    if (newCapacity - minCapacity < 0) {
        newCapacity = minCapacity;
    }
    // 绝对安全容量界限
    int SAFE_BOUND = MAX_ARRAY_SIZE >> coder;
    // 未到达SAFE_BOUND之前直接使用计算的新容量，否则使用hugeCapacity重新计算修正
    return (newCapacity <= 0 || SAFE_BOUND - newCapacity < 0)
        ? hugeCapacity(minCapacity)
        : newCapacity;
}

private int hugeCapacity(int minCapacity) {
    // 绝对安全容量界限
    int SAFE_BOUND = MAX_ARRAY_SIZE >> coder;
    // 不安全容量界限,已经是极限了
    int UNSAFE_BOUND = Integer.MAX_VALUE >> coder;
    // 请求的minCapacity参数值过大，甚至超出了UNSAFE_BOUND，数组下标用整数类型都无法覆盖
    if (UNSAFE_BOUND - minCapacity < 0) { // overflow
        throw new OutOfMemoryError();
    }
    // 返回的容量范围位于[SAFE_BOUND,UNSAFE_BOUND]
    return (minCapacity > SAFE_BOUND)
        ? minCapacity : SAFE_BOUND;
}

inflate方法把原来由LATIN1编码的value数组扩展为UTF16编码。

private void inflate() {
    if (!isLatin1()) {
        return;
    }
    byte[] buf = StringUTF16.newBytesFor(value.length);
    StringLatin1.inflate(value, 0, buf, 0, count);
    this.value = buf;
    this.coder = UTF16;
}

// StringLatin1.inflate
public static void inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) {
    StringUTF16.inflate(src, srcOff, dst, dstOff, len);
}

// StringUTF16.inflate
public static void inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) {
    // We need a range check here because 'putChar' has no checks
    checkBoundsOffCount(dstOff, len, dst);
    for (int i = 0; i < len; i++) {
        putChar(dst, dstOff++, src[srcOff++] & 0xff);
    }
}

trimToSize方法尝试减少用于存储字符序列的空间。如果缓冲区大于保存当前字符序列所需的空间，则可以调整其大小以提高空间效率。调用此方法可能会影响后续调用capacity方法的返回值，但不是必需的。

public void trimToSize() {
    int length = count << coder;
    if (length < value.length) {
        value = Arrays.copyOf(value, length);
    }
}

setLength方法设置AbstractStringBuilder的长度即count，根据newLength参数的大小进行value数组的补空字符或截断操作。具体源代码注释如下：

public void setLength(int newLength) {
    if (newLength < 0) {
        throw new StringIndexOutOfBoundsException(newLength);
    }
    // 首先需要保证容量足够即不低于newLength
    ensureCapacityInternal(newLength);
    // 如果newLength大于当前容量，多余的填充空字符(LANTIN1==>'\u00'或UTF16==>'\u0000')
    if (count < newLength) {
        if (isLatin1()) {
            StringLatin1.fillNull(value, count, newLength);
        } else {
            StringUTF16.fillNull(value, count, newLength);
        }
    }
    // 设置当前长度，如果newLength小于或等于当前容量，多余的直接截断
    count = newLength;
}

// StringLatin1.fillNull
public static void fillNull(byte[] val, int index, int end) {
    Arrays.fill(val, index, end, (byte)0);
}

// StringUTF16.fillNull
public static void fillNull(byte[] val, int index, int end) {
    Arrays.fill(val, index << 1, end << 1, (byte)0);
}

public static void fill(byte[] a, int fromIndex, int toIndex, byte val) {
    rangeCheck(a.length, fromIndex, toIndex);
    for (int i = fromIndex; i < toIndex; i++)
        a[i] = val;
}

setCharAt方法指定index索引处的字符设置为ch，当然需要考虑字符编码的问题，具体见代码注释：

public void setCharAt(int index, char ch) {
    checkIndex(index, count);
    // 该类采用Latin1编码方式，并且待设置的字符位于Latin1表示范围内
    if (isLatin1() && StringLatin1.canEncode(ch)) {
        value[index] = (byte)ch;
    } else {
        // 该类采用Latin1编码方式，但是因为待设置的字符超出Latin1表示范围，因此需要扩展为UTF16编码方式
        if (isLatin1()) {
            inflate();
        }
        // 设置指定索引处的字符码元
        StringUTF16.putCharSB(value, index, ch);
    }
}

// Lantin1表示的范围是[0x00,0xFF]
public static boolean canEncode(int cp) {
    return cp >>> 8 == 0;
}

public static void putCharSB(byte[] val, int index, int c) {
    checkIndex(index, val);
    putChar(val, index, c);
}

static void putChar(byte[] val, int index, int c) {
    assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
    index <<= 1;
    val[index++] = (byte)(c >> HI_BYTE_SHIFT);
    val[index]   = (byte)(c >> LO_BYTE_SHIFT);
}

replace方法用于替换给定区间[start,end)字符序列为目标字符串str表示的字符序列

public AbstractStringBuilder replace(int start, int end, String str) {
    int count = this.count;
    if (end > count) {
        end = count;
    }
    checkRangeSIOOBE(start, end, count);
    int len = str.length();
    int newCount = count + len - (end - start);
    ensureCapacityInternal(newCount);
    // 这一步是关键（把删除区间后的子序列[end,count)移动到正确的位置[start+len,newCount)，给str腾出空间），newCount-count其实就是len-(end-start)
    shift(end, newCount - count);
    this.count = newCount;
    // 填充目标字符串，填充区间是[start,start+len)
    putStringAt(start, str);
    return this;
}

private final void putStringAt(int index, String str) {
    if (getCoder() != str.coder()) {
        inflate();
    }
    str.getBytes(value, index, coder);
}