

  • github地址
  • 论文-《Better bitmap performance with Roaring bitmaps》
  • 论文-《Consistently faster and smaller compressed bitmaps with Roaring》




有40亿个不重复且未排序的unsigned int整数,如何用一台内存为2G的PC判断某一个数是否在这40亿个整数中

先看下处理这40亿个整数至少需要消耗多少内存:一个int占4个字节,40亿*4/1024/1024/1024≈14.9G 远远大于指定的2G内存,按照通常的int存储明显无法在2G内存中工作,这时候就需要位图来处理了







How does Roaring compares with the alternatives?
Most alternatives to Roaring are part of a larger family of compressed bitmaps that are run-length-encoded bitmaps. They identify long runs of 1s or 0s and they represent them with a marker word. If you have a local mix of 1s and 0, you use an uncompressed word.
There are many formats in this family:

  • Oracle’s BBC is an obsolete format at this point: though it may provide good compression, it is likely much slower than more recent alternatives due to excessive branching.
    WAH is a patented variation on BBC that provides better performance.
  • Concise is a variation on the patented WAH. It some specific instances, it can compress much better than WAH (up to 2x better), but it is generally slower.
  • EWAH is both free of patent, and it is faster than all the above. On the downside, it does not compress quite as well. It is faster because it allows some form of “skipping” over uncompressed words. So though none of these formats are great at random access, EWAH is better than the alternatives.

There is a big problem with these formats however that can hurt you badly in some cases: there is no random access. If you want to check whether a given value is present in the set, you have to start from the beginning and “uncompress” the whole thing. This means that if you want to intersect a big set with a large set, you still have to uncompress the whole big set in the worst case…
Roaring solves this problem. It works in the following manner. It divides the data into chunks of 216 integers (e.g., [0, 216), [216, 2 x 216), …). Within a chunk, it can use an uncompressed bitmap, a simple list of integers, or a list of runs. Whatever format it uses, they all allow you to check for the present of any one value quickly (e.g., with a binary search). The net result is that Roaring can compute many operations much faster that run-length-encoded formats like WAH, EWAH, Concise… Maybe surprisingly, Roaring also generally offers better compression ratios.






  RoaringArray highLowContainer = null;/*** Create an empty bitmap*/public RoaringBitmap() {highLowContainer = new RoaringArray();}


  static final int INITIAL_CAPACITY = 4;//存储高16位作为索引short[] keys = null;//用不同的Container存储低16位Container[] values = null;int size = 0;protected RoaringArray() {this(INITIAL_CAPACITY);}





  • ArrayContainer
  • BitmapContainer
  • RunContainer


  //ArrayContainer允许的最大数据量static final int DEFAULT_MAX_SIZE = 4096;// containers with DEFAULT_MAX_SZE or less integers// should be ArrayContainers//记录基数protected int cardinality = 0;//用short数组存储数据short[] content;


  //最大容量protected static final int MAX_CAPACITY = 1 << 16;//用一个定长的long数组按bit存储数据final long[] bitmap;//记录基数int cardinality;


  private short[] valueslength;// we interleave values and lengths, so// that if you have the values 11,12,13,14,15, you store that as 11,4 where 4 means that beyond 11// itself, there are// 4 contiguous values that follows.// Other example: e.g., 1, 10, 20,0, 31,2 would be a concise representation of 1, 2, ..., 11, 20,// 31, 32, 33int nbrruns = 0;// how many runs, this number should fit in 16 bits.



        RoaringBitmap roaringBitmap = new RoaringBitmap();roaringBitmap.add(1);roaringBitmap.add(10);roaringBitmap.add(100);roaringBitmap.add(1000);roaringBitmap.add(10000);for (int i = 65536; i < 65536*2; i+=2) {roaringBitmap.add(i);}roaringBitmap.add(65536L*3, 65536L*4);roaringBitmap.runOptimize();





  /*** Add the value to the container (set the value to "true"), whether it already appears or not.** Java lacks native unsigned integers but the x argument is considered to be unsigned.* Within bitmaps, numbers are ordered according to {@link Integer#compareUnsigned}.* We order the numbers like 0, 1, ..., 2147483647, -2147483648, -2147483647,..., -1.** @param x integer value*/@Overridepublic void add(final int x) {// 获取待插入数x的高16位final short hb = Util.highbits(x);// 计算高16位对应的索引值的下标位置final int i = highLowContainer.getIndex(hb);// 索引下标大于0说明该索引已存在且创建了对应的Container,则将低16位存入该Container中if (i >= 0) {highLowContainer.setContainerAtIndex(i,highLowContainer.getContainerAtIndex(i).add(Util.lowbits(x)));// 若索引下标小于0说明该索引不存在,则直接创建一个ArrayContainer并将低16位放入其中} else {final ArrayContainer newac = new ArrayContainer();highLowContainer.insertNewKeyValueAt(-i - 1, hb, newac.add(Util.lowbits(x)));}}



Java lacks native unsigned integers but integers are still considered to be unsigned within Roaring and ordered according to Integer.compareUnsigned. This means that Java will order the numbers like so 0, 1, …, 2147483647, -2147483648, -2147483647,…, -1. To interpret correctly, you can use Integer.toUnsignedLong and Integer.toUnsignedString.

java缺少原生的无符号int,但是在RBM中加入的数字是被认为无符号的,于是RBM根据Integer.compareUnsigned的结果对数字进行排序,从小到大依次是0, 1, ..., 2147483647, -2147483648, -2147483647,..., -1


  // 将x右移16位并转化为short,就是取x的高16位protected static short highbits(int x) {return (short) (x >>> 16);}


  // involves a binary searchprotected int getIndex(short x) {// before the binary search, we optimize for frequent cases// 两种常见场景可以快速判断,无需走二分查找:1、RoaringArray的大小为0,直接返回-1 // 2、当前索引是keys数组中的最大值,直接返回size-1,之所以可以这样判断是因为keys是有序的if ((size == 0) || (keys[size - 1] == x)) {return size - 1;}// no luck we have to go through the list// 其他情况需要走二分查找return this.binarySearch(0, size, x);}private int binarySearch(int begin, int end, short key) {return Util.unsignedBinarySearch(keys, begin, end, key);}/*** Look for value k in array in the range [begin,end). If the value is found, return its index. If* not, return -(i+1) where i is the index where the value would be inserted. The array is assumed* to contain sorted values where shorts are interpreted as unsigned integers.** @param array array where we search* @param begin first index (inclusive)* @param end last index (exclusive)* @param k value we search for* @return count*/public static int unsignedBinarySearch(final short[] array, final int begin, final int end,final short k) {// 混合二分查找法:二分查找+顺序查找,始终采用该策略if (USE_HYBRID_BINSEARCH) {return hybridUnsignedBinarySearch(array, begin, end, k);} else {return branchyUnsignedBinarySearch(array, begin, end, k);}}// starts with binary search and finishes with a sequential searchprotected static int hybridUnsignedBinarySearch(final short[] array, final int begin,final int end, final short k) {int ikey = toIntUnsigned(k);// next line accelerates the possibly common case where the value would// be inserted at the endif ((end > 0) && (toIntUnsigned(array[end - 1]) < ikey)) {return -end - 1;}int low = begin;int high = end - 1;// 32 in the next line matches the size of a cache linewhile (low + 32 <= high) {final int middleIndex = (low + high) >>> 1;final int middleValue = toIntUnsigned(array[middleIndex]);if (middleValue < ikey) {low = middleIndex + 1;} else if (middleValue > ikey) {high = middleIndex - 1;} else {return middleIndex;}}// we finish the job with a sequential searchint x = low;for (; x <= high; ++x) {final int val = toIntUnsigned(array[x]);if (val >= ikey) {if (val == ikey) {return x;}break;}}return -(x + 1);}// 上面提到的无符号int,正数无变化,负数相当于+2^16protected static int toIntUnsigned(short x) {return x & 0xFFFF;}


  • ArrayContainer添加过程
  /*** running time is in O(n) time if insert is not in order.*/@Overridepublic Container add(final short x) {// 两种场景可以不走二分查找:1、基数为0 // 2、当前值大于容器中的最大值,之所以可以这样操作是因为content是有序的,最后一个即最大值if (cardinality == 0 || (cardinality > 0&& toIntUnsigned(x) > toIntUnsigned(content[cardinality - 1]))) {// 基数大于等于阈值4096转化为BitmapContainer并添加元素,转化逻辑下面会有说明if (cardinality >= DEFAULT_MAX_SIZE) {return toBitmapContainer().add(x);}// 若基础大于等于content数组长度则需要扩容if (cardinality >= this.content.length) {increaseCapacity();}// 赋值content[cardinality++] = x;} else {// 通过二分查找找到对应的插入位置int loc = Util.unsignedBinarySearch(content, 0, cardinality, x);//不存在,需要插入,存在则不处理直接返回(去重效果)if (loc < 0) {// Transform the ArrayContainer to a BitmapContainer// when cardinality = DEFAULT_MAX_SIZE// 同上,基数大于等于阈值4096转化为BitmapContainer并添加元素if (cardinality >= DEFAULT_MAX_SIZE) {return toBitmapContainer().add(x);}// 同上,若基础大于等于content数组长度则需要扩容if (cardinality >= this.content.length) {increaseCapacity();}// insertion : shift the elements > x by one position to// the right// and put x in it's appropriate place// 通过拷贝数组将x插入content数组中System.arraycopy(content, -loc - 1, content, -loc, cardinality + loc + 1);content[-loc - 1] = x;++cardinality;}}return this;}


  /*** Copies the data in a bitmap container.** @return the bitmap container*/@Overridepublic BitmapContainer toBitmapContainer() {BitmapContainer bc = new BitmapContainer();bc.loadData(this);return bc;}/*** Create a bitmap container with all bits set to false*/public BitmapContainer() {this.cardinality = 0;// 长度固定为1024this.bitmap = new long[MAX_CAPACITY / 64];}protected void loadData(final ArrayContainer arrayContainer) {this.cardinality = arrayContainer.cardinality;for (int k = 0; k < arrayContainer.cardinality; ++k) {final short x = arrayContainer.content[k];//循环赋值,这里的算法会在BitmapContainer添加过程中详述bitmap[Util.toIntUnsigned(x) / 64] |= (1L << x);}}


  // temporarily allow an illegally large size, as long as the operation creating// the illegal container does not return it.// 根据不同的情况进行扩容,不是很难理解private void increaseCapacity(boolean allowIllegalSize) {int newCapacity = (this.content.length == 0) ? DEFAULT_INIT_SIZE: this.content.length < 64 ? this.content.length * 2: this.content.length < 1067 ? this.content.length * 3 / 2: this.content.length * 5 / 4;// never allocate more than we will ever needif (newCapacity > ArrayContainer.DEFAULT_MAX_SIZE && !allowIllegalSize) {newCapacity = ArrayContainer.DEFAULT_MAX_SIZE;}// if we are within 1/16th of the max, go to maxif (newCapacity > ArrayContainer.DEFAULT_MAX_SIZE - ArrayContainer.DEFAULT_MAX_SIZE / 16&& !allowIllegalSize) {newCapacity = ArrayContainer.DEFAULT_MAX_SIZE;}this.content = Arrays.copyOf(this.content, newCapacity);}

  • BitmapContainer添加过程
  @Overridepublic Container add(final short i) {final int x = Util.toIntUnsigned(i);final long previous = bitmap[x / 64];long newval = previous | (1L << x);bitmap[x / 64] = newval;if (USE_BRANCHLESS) {cardinality += (previous ^ newval) >>> x;} else if (previous != newval) {++cardinality;}return this;}

x/64取整找到long数组的索引,final long previous = bitmap[x / 64]得到了对应long的旧值,1L<<x等效于1L<<(x%64),即把对应位置的bit置为1,再和旧值做位或,得到新值
为什么1L<<x等效于1L<<(x%64)呢?我们看一下官方说明15.19. Shift Operators

If the promoted type of the left-hand operand is int, then only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive.
If the promoted type of the left-hand operand is long, then only the six lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x3f (0b111111). The shift distance actually used is therefore always in the range 0 to 63, inclusive.



  • RunContainer添加过程
  @Overridepublic Container add(short k) {// TODO: it might be better and simpler to do return// toBitmapOrArrayContainer(getCardinality()).add(k)// but note that some unit tests use this method to build up test runcontainers without calling// runOptimize// 同样使用二分查找+顺序查找,唯一区别是每隔2个查询一次,这是为了查询起始值int index = unsignedInterleavedBinarySearch(valueslength, 0, nbrruns, k);// 大于等于0说明k就是某个起始值,已经存在,直接返回if (index >= 0) {return this;// already there}// 小于0说明k不是起始值,需要进一步判断// 指向前一个起始值(即小于当前值的一个起始值)的索引index = -index - 2;// points to preceding value, possibly -1// 前一个起始值的索引大于0说明当前值不是最小值if (index >= 0) {// possible match// 计算当前值和前一个起始值的偏移量int offset = toIntUnsigned(k) - toIntUnsigned(getValue(index));// 计算前一个起始值的行程长度int le = toIntUnsigned(getLength(index));// 若偏移量小于前面的行程长度说明当前值在这个行程范围内,直接返回if (offset <= le) {return this;}// 若偏移量等于行程长度+1,说明当前值是上一个行程最大值+1if (offset == le + 1) {// we may need to fuse// 说明前一个值并不是最后一个行程,那么有可能需要融合前后两个行程if (index + 1 < nbrruns) {// 若下一个行程的起始值等于当前值+1则需要将这两个相邻的行程做融合if (toIntUnsigned(getValue(index + 1)) == toIntUnsigned(k) + 1) {// indeed fusion is needed// 重置行程长度setLength(index,(short) (getValue(index + 1) + getLength(index + 1) - getValue(index)));// 通过数组拷贝将多余的行程范围删除并将行程数量nbrruns-1recoverRoomAtIndex(index + 1);return this;}}// 若不是融合则将上一个行程的长度+1即可incrementLength(index);return this;}// 若当前值后还有一个行程,则可能需要将当前值和下一个行程融合if (index + 1 < nbrruns) {// we may need to fuse// 若下一个行程起始值等于当前值+1则需要将当前值和下一个行程融合if (toIntUnsigned(getValue(index + 1)) == toIntUnsigned(k) + 1) {// indeed fusion is needed// 重置起始值以及行程长度setValue(index + 1, k);setLength(index + 1, (short) (getLength(index + 1) + 1));return this;}}}// 前一个起始值的索引等于-1说明当前值是最小值if (index == -1) {// we may need to extend the first run// 若存在行程且最小值等于当前值+1,则重置起始值以及行程长度if (0 < nbrruns) {if (getValue(0) == k + 1) {incrementLength(0);decrementValue(0);return this;}}}// 其他情况通用处理makeRoomAtIndex(index + 1);setValue(index + 1, k);setLength(index + 1, (short) 0);return this;}





Thus, when first creating a Roaring bitmap, it is usually made of array and bitmap containers.
Runs are not compressed. Upon request, the storage of the Roaring bitmap can be optimized using
the runOptimize function. This triggers a scan through the array and bitmap containers that
converts them, if helpful, to run containers. In a given application, this might be done prior to
storing the bitmaps as immutable objects to be queried. Run containers may also arise from calling
a function to add a range of values.
To decide the best container type, we are motivated to minimize storage. In serialized form, a run
container uses 2 + 4r bytes given r runs, a bitmap container always uses 8192 bytes and an array
container uses 2c + 2 bytes, where c is the cardinality.
Therefore, we apply the following rules:

  • All array containers are such that they use no more space than they would as a bitmap
    container: they contain no more than 4096 values.
  • Bitmap containers use less space than they would as array containers: they contain more than
    4096 values.
  • A run container is only allowed to exist if it is smaller than either the array container or
    the bitmap container that could equivalently store the same values. If the run container has
    cardinality greater than 4096 values, then it must contain no more than ⌈(8192 − 2)/4⌉ =
    2047 runs. If the run container has cardinality no more than 4096, then the number of runs
    must be less than half the cardinality.
  /*** Use a run-length encoding where it is more space efficient** @return whether a change was applied*/public boolean runOptimize() {boolean answer = false;for (int i = 0; i < this.highLowContainer.size(); i++) {Container c = this.highLowContainer.getContainerAtIndex(i).runOptimize();if (c instanceof RunContainer) {answer = true;}this.highLowContainer.setContainerAtIndex(i, c);}return answer;}


  • ArrayContainer
  @Overridepublic Container runOptimize() {// TODO: consider borrowing the BitmapContainer idea of early// abandonment// with ArrayContainers, when the number of runs in the arrayContainer// passes some threshold based on the cardinality.int numRuns = numberOfRuns();int sizeAsRunContainer = RunContainer.serializedSizeInBytes(numRuns);if (getArraySizeInBytes() > sizeAsRunContainer) {return new RunContainer(this, numRuns); // this could be maybe// faster if initial// container is a bitmap} else {return this;}}


  @Overrideint numberOfRuns() {if (cardinality == 0) {return 0; // should never happen}int numRuns = 1;int oldv = toIntUnsigned(content[0]);// 循环所有数字,若前后不连续则行程长度+1for (int i = 1; i < cardinality; i++) {int newv = toIntUnsigned(content[i]);if (oldv + 1 != newv) {++numRuns;}oldv = newv;}return numRuns;}


  protected static int serializedSizeInBytes(int numberOfRuns) {return 2 + 2 * 2 * numberOfRuns; // each run requires 2 2-byte entries.}


  @Overrideprotected int getArraySizeInBytes() {return cardinality * 2;}


  protected RunContainer(ArrayContainer arr, int nbrRuns) {this.nbrruns = nbrRuns;// 长度为行程个数的2倍valueslength = new short[2 * nbrRuns];if (nbrRuns == 0) {return;}int prevVal = -2;int runLen = 0;int runCount = 0;// 循环每个元素,判断前后是否连续并设置起始值和行程长度for (int i = 0; i < arr.cardinality; i++) {int curVal = toIntUnsigned(arr.content[i]);if (curVal == prevVal + 1) {++runLen;} else {if (runCount > 0) {setLength(runCount - 1, (short) runLen);}setValue(runCount, (short) curVal);runLen = 0;++runCount;}prevVal = curVal;}setLength(runCount - 1, (short) runLen);}

  • BitmapContainer
  @Overridepublic Container runOptimize() {int numRuns = numberOfRunsLowerBound(MAXRUNS); // decent choiceint sizeAsRunContainerLowerBound = RunContainer.serializedSizeInBytes(numRuns);if (sizeAsRunContainerLowerBound >= getArraySizeInBytes()) {return this;}// else numRuns is a relatively tight bound that needs to be exact// in some cases (or if we need to make the runContainer the right// size)numRuns += numberOfRunsAdjustment();int sizeAsRunContainer = RunContainer.serializedSizeInBytes(numRuns);if (getArraySizeInBytes() > sizeAsRunContainer) {return new RunContainer(this, numRuns);} else {return this;}}


  // nruns value for which RunContainer.serializedSizeInBytes ==// BitmapContainer.getArraySizeInBytes()private final int MAXRUNS = (getArraySizeInBytes() - 2) / 4;@Overrideprotected int getArraySizeInBytes() {return MAX_CAPACITY / 8;}/*** Counts how many runs there is in the bitmap, up to a maximum** @param mustNotExceed maximum of runs beyond which counting is pointless* @return estimated number of courses*/public int numberOfRunsLowerBound(int mustNotExceed) {int numRuns = 0;for (int blockOffset = 0; blockOffset + BLOCKSIZE <= bitmap.length; blockOffset += BLOCKSIZE) {for (int i = blockOffset; i < blockOffset + BLOCKSIZE; i++) {long word = bitmap[i];numRuns += Long.bitCount((~word) & (word << 1));}if (numRuns > mustNotExceed) {return numRuns;}}return numRuns;}

MAXRUNS:我们知道BitmapContainer大小固定为8kb即8192字节,我们就可以列一个等式去计算行程长度的最大数量,若超过这个值RunContainer占用更大空间,没有转化的意义。2 + 2 * 2 * runs=8192,求得临界值为2047
计算下界的过程不是很复杂,我们这里看一个有趣的算法numRuns += Long.bitCount((~word) & (word << 1)),很明显这是计算runs的数量的。我们来理解下原理:一个word里64位每个1代表一个数字,要统计run的数量其实就是统计这个word里有多少组连续的1,再进一步说就是统计有多少个0、1是相邻的,我们将word左移一位,再和word的反值做位与操作,如果有相邻的0和1,则计算后会出现一个1,我们再通过Long.bitCount(long i)方法统计计算结果有多少个1,即求得有多少个runs


  /*** Computes the number of runs** @return the number of runs*/public int numberOfRunsAdjustment() {int ans = 0;long nextWord = bitmap[0];for (int i = 0; i < bitmap.length - 1; i++) {final long word = nextWord;nextWord = bitmap[i + 1];ans += ((word >>> 63) & ~nextWord);}final long word = nextWord;if ((word & 0x8000000000000000L) != 0) {ans++;}return ans;}

ans += ((word >>> 63) & ~nextWord);:这个算法不难理解,需要调整的情况就是前一个值的第63位和下一个值的第0位不相同,这种情况runs需要加一


  // convert a bitmap container to a run container somewhat efficiently.protected RunContainer(BitmapContainer bc, int nbrRuns) {this.nbrruns = nbrRuns;valueslength = new short[2 * nbrRuns];if (nbrRuns == 0) {return;}int longCtr = 0; // index of current long in bitmaplong curWord = bc.bitmap[0]; // its valueint runCount = 0;while (true) {// potentially multiword advance to first 1 bitwhile (curWord == 0L && longCtr < bc.bitmap.length - 1) {curWord = bc.bitmap[++longCtr];}if (curWord == 0L) {// wrap up, no more runsreturn;}int localRunStart = Long.numberOfTrailingZeros(curWord);int runStart = localRunStart + 64 * longCtr;// stuff 1s into number's LSBslong curWordWith1s = curWord | (curWord - 1);// find the next 0, potentially in a later wordint runEnd = 0;while (curWordWith1s == -1L && longCtr < bc.bitmap.length - 1) {curWordWith1s = bc.bitmap[++longCtr];}if (curWordWith1s == -1L) {// a final unterminated run of 1s (32 of them)runEnd = 64 + longCtr * 64;setValue(runCount, (short) runStart);setLength(runCount, (short) (runEnd - runStart - 1));return;}int localRunEnd = Long.numberOfTrailingZeros(~curWordWith1s);runEnd = localRunEnd + longCtr * 64;setValue(runCount, (short) runStart);setLength(runCount, (short) (runEnd - runStart - 1));runCount++;// now, zero out everything right of runEnd.curWord = curWordWith1s & (curWordWith1s + 1);// We've lathered and rinsed, so repeat...}}

  • RunContainer



public void add(final int x)//添加范围数字
public void add(final long rangeStart, final long rangeEnd)//移除数字
public void remove(final int x)//遍历RBM
public void forEach(IntConsumer ic)//检测是否包含
public boolean contains(final int x)//获取基数
public int getCardinality()//位与,取两个RBM的交集,当前RBM会被修改
public void and(final RoaringBitmap x2)//同上,但是会返回一个新的RBM,不会修改原始的RBM,线程安全
public static RoaringBitmap and(final RoaringBitmap x1, final RoaringBitmap x2)//位或,取两个RBM的并集,当前RBM会被修改
public void or(final RoaringBitmap x2)//同上,但是会返回一个新的RBM,不会修改原始的RBM,线程安全
public static RoaringBitmap or(final RoaringBitmap x1, final RoaringBitmap x2)//异或,取两个RBM的对称差,当前RBM会被修改
public void xor(final RoaringBitmap x2)//同上,但是会返回一个新的RBM,不会修改原始的RBM,线程安全
public static RoaringBitmap xor(final RoaringBitmap x1, final RoaringBitmap x2)//取原始值和x2的差集,当前RBM会被修改
public void andNot(final RoaringBitmap x2)//同上,但是会返回一个新的RBM,不会修改原始的RBM,线程安全
public static RoaringBitmap andNot(final RoaringBitmap x1, final RoaringBitmap x2)//序列化
public void serialize(DataOutput out) throws IOException
public void serialize(ByteBuffer buffer)//反序列化
public void deserialize(DataInput in) throws IOException
public void deserialize(ByteBuffer bbf) throws IOException

FastAggregation: 一些快速聚合操作

public static RoaringBitmap and(Iterator<? extends RoaringBitmap> bitmaps)public static RoaringBitmap or(Iterator<? extends RoaringBitmap> bitmaps)public static RoaringBitmap xor(Iterator<? extends RoaringBitmap> bitmaps)



 RoaringBitmap roaringBitmap = new RoaringBitmap();roaringBitmap.add(1L,100L);int size = roaringBitmap.serializedSizeInBytes();ByteBuffer byteBuffer = ByteBuffer.allocate(size);roaringBitmap.serialize(byteBuffer);return byteBuffer.array();



    private RoaringBitmap deSerializeRoaringBitmap(Blob blob) throws SQLException {byte[] content = blob.getBytes(1, (int) blob.length());ByteBuffer byteBuffer = ByteBuffer.wrap(content);return new RoaringBitmap(new ImmutableRoaringBitmap(byteBuffer));}


Many applications use Kryo for serialization/deserialization. One can use Roaring bitmaps with Kryo efficiently thanks to a custom serializer (Kryo 5):

public class RoaringSerializer extends Serializer<RoaringBitmap> {@Overridepublic void write(Kryo kryo, Output output, RoaringBitmap bitmap) {try {bitmap.serialize(new KryoDataOutput(output));} catch (IOException e) {e.printStackTrace();throw new RuntimeException();}}@Overridepublic RoaringBitmap read(Kryo kryo, Input input, Class<? extends RoaringBitmap> type) {RoaringBitmap bitmap = new RoaringBitmap();try {bitmap.deserialize(new KryoDataInput(input));} catch (IOException e) {e.printStackTrace();throw new RuntimeException();}return bitmap;}}




64-bit integers (long)
Though Roaring Bitmaps were designed with the 32-bit case in mind, we have an extension to 64-bit integers:

      import org.roaringbitmap.longlong.*;LongBitmapDataProvider r = Roaring64NavigableMap.bitmapOf(1,2,100,1000);r.addLong(1234);System.out.println(r.contains(1)); // trueSystem.out.println(r.contains(3)); // falseLongIterator i = r.getLongIterator();while(i.hasNext()) System.out.println(i.next());

2、0.8.12版本中将所有的unsigned shorts替换为chars


有个哥们提了个issue,他将所有的unsigned short都替换为了char,并且去除了所有toIntUnsigned以及compareUnsigned方法。这个想法得到了作者的认可并且merge进了主分支



RoaringBitmap github
《Better bitmap performance with Roaring bitmaps》
Consistently faster and smaller compressed bitmaps with Roaring
精确去重和Roaring BitMap (咆哮位图)


  1. Redis 数据结构-字典源码分析

    2019独角兽企业重金招聘Python工程师标准>>> 相关文章 Redis 初探-安装与使用 Redis 数据结构-字符串源码分析 本文将从以下几个方面介绍 前言 字典结构图 字典 ...

  2. 鸿蒙轻内核M核源码分析:数据结构之任务就绪队列

    摘要:本文会给读者介绍鸿蒙轻内核M核源码中重要的数据结构,任务基于优先级的就绪队列Priority Queue. 本文分享自华为云社区<鸿蒙轻内核M核源码分析系列三 数据结构-任务就绪队列> ...

  3. Nginx源码分析:核心数据结构ngx_cycle_t与内存池概述

    nginx源码分析 nginx-1.11.1 参考书籍<深入理解nginx模块开发与架构解析> 核心数据结构与内存池概述 在Nginx中的核心数据结构就是ngx_cycle_t结构,在初始 ...

  4. HDFS源码分析心跳汇报之数据结构初始化

    在<HDFS源码分析心跳汇报之整体结构>一文中,我们详细了解了HDFS中关于心跳的整体结构,知道了BlockPoolManager.BPOfferService和BPServiceActo ...

  5. F2FS源码分析-6.6 [其他重要数据结构以及函数] F2FS的重命名过程-f2fs_rename函数

    F2FS源码分析系列文章 主目录 一.文件系统布局以及元数据结构 二.文件数据的存储以及读写 三.文件与目录的创建以及删除(未完成) 四.垃圾回收机制 五.数据恢复机制 六.重要数据结构或者函数的分析 ...

  6. v19.04 鸿蒙内核源码分析(位图管理) | 特节俭的苦命孩子 | 百篇博客分析HarmonyOS源码

    子曰:"饭疏食,饮水,曲肱而枕之,乐亦在其中矣.不义而富且贵,于我如浮云." <论语>:述而篇 百篇博客系列篇.本篇为: v19.xx 鸿蒙内核源码分析(位图管理篇) ...

  7. mysql源码分析——索引的数据结构

    引子 说几句题外话,在京被困三个月之久,不能回家,所以这个源码分析就中断了.之所以在家搞这个数据库的源码分析,主要是在家环境齐全,公司的电脑老旧不堪.意外事件往往打断正常的习惯和运行轨迹,但这却是正常 ...

  8. java-List集合的源码分析(数据结构方面,源码注释方面),迭代器快速失败机制

    List实现了Collection接口,产生三个子类:ArrayList,LinkedList,Vector 文章包含解释方面: 数据结构方面, 源码注释方面&迭代器快速失败机制 方面1-基于 ...

  9. android-smart-image-view源码分析

    SmartImageView源码分析 一.描述 目前Android应用开发涌出了各种各样出自大牛之手的成熟稳定的开源库,供开发者使用,虽然很明显的提高了App的开发效率,也同样凸显出部分问题: 我只知 ...

  10. 山东大学操作系统课程设计源码分析 filesys(2)

    一.写在前面 上一节我们分析了nachos文件系统底层的裸磁盘对象Disk和线程安全磁盘对象SynchDisk,在本节我们沿着文件系统的结构继续向上进行分析,介绍FileHeader.OpenFile ...


  1. 文档管理服务器文件的脱机编辑选项无法编辑,让MOSS2007文档的存取更具个性
  2. 2019你还没搭建个人博客吗?进来看看
  3. 在Mysql中显示所有用户的操作教程(Linux环境下)
  4. 【Windows】安装显卡驱动+cuda+cudnn
  5. matlab 人群疏散,建筑物内的人员数量确定方法和人群疏散方法
  6. pat 训练题 7-5 基友团 (25分) 暴力判团和最大团
  7. java 微信高级群发_java微信平台,高级群发接口开发
  8. 网站图片怎么优化搜索排名
  9. 射击末世--建造者模式
  10. access查询设计sol视图_选择查询-在access中如何建立一个选择查询只查询一个信?在access 爱问知识人...
  11. 基于mycat高可用方案——数据库负载(基于阿里云)
  12. python in arcgis_终于晓得arcgis-python入门教程
  13. php去掉字符串带逗号前面的字符,php 怎么去掉字符串最后一个逗号
  14. Property 'X' not found on type entity.Customer错误原因分析
  15. 回归算法(最小二乘法拟合)
  16. [项目管理-19]:在项目管理中, 如何用Jira对项目管理中的所有活动进行结构化、数字化和量化?
  17. SM1算法的EBC、CBC、OFB模式及其介绍
  18. 潘多拉开发板——emwin5.44裸机移植记录(ST7789驱动)
  19. oracle建表的时候同时创建主键,外键,注释,约束,索引
  20. 【文章阅读】The Devil is in the Decoder【计算机视觉中的上采样方式-6种】


  1. android 红外遥控程序,基于Dragonboard 410c android系统红外遥控功能的实现方法
  2. Linux硬盘分区方案与分区格式介绍
  3. SQLite开源库LitePal
  4. Memery of habit
  5. 2012服务器系统如何备份,windows server 2012 r2 如何进行系统备份?
  6. 本地html网页载入很慢,网页打开很慢是什么原因?怎么解决
  7. 一对一直播app源码开发的前端实现
  8. JAVA怎么提高cpu使用率_压力测试时,利用Java让CPU使用率达到100%
  9. pdf 转化为jpg python 批量转化
  10. Python3学习笔记十三