15分钟了解Java性能优化以及一切你想知道的

From http://java-performance.com/?utm_campaign=Manong_Weekly_Issue_13&utm_medium=EDM&utm_source=Manong_Weekly

最后更新：2013-12-07

JDK classes

NEW: Java7 新变化: JDK7 release的变化列表

Tags: Java 7, changes.

我会持续跟踪所有JDK7关于性能的改变，覆盖以下JDK的版本

Java 7u25 -> Java 7u45

使用 double/long VS BigDecimal 来进行货币运算:double, long, java.math.BigDecimal, java.lang.String:

Tags: finance, money, HFT, low latency.

如果你想使用Java来实现快速并且准确的金融计算操作，坚持下面的原则：
1. 使用long变量来保存货币，用最小的货币单位(比如分)
2. 避免使用非整值当使用double的时候(用最小的货币单位计算)
3. 用long做加/减运算
4. 用Math.round/rint/ceil/floor 来四舍五入乘除的结果(根据你系统要求)
5. 你的计算应该在52位内(double精度)
永远用MathContext来进行BigDecimal的乘除操作以避免ArithmeticException的无限长的十进制结果，不要使用MathContext.UNLIMITED, 这相当于没有上下文可言.
不要把double转换成BigDecimal, 如果可能用String转换成BigDecimal

UPDATED: 在Java 1.7.0 06中更改字符串的内部表示: java.lang.String,java.util.HashMap, java.util.Hashtable, java.util.HashSet,java.util.LinkedHashMap, java.util.LinkedHashSet, java.util.WeakHashMap and java.util.concurrent.ConcurrentHashMap:

Tags: String.substring, Java 1.7, memory consumption,low latency.

从Java1.7.0_06 String.substring总是会为每一个它创建的String创建一个新的潜在的char []的值。这意味着，这个方法现在有一个线性复杂度相比以往常量的复杂性。这种变化的好处是String的一个稍微更小的内存占用（比以前少8个字节），并保证避免因String.substring内存泄漏（见字符串包装第1部分：将字符转换为字节对Java对象的详细信息内存布局）。
Java的7u6+功能。在Java8中移除。从相同的Java更新开始，String类有名为hash32第二散列方法。这种方法是目前无法公开，可能不只是通过sun.misc.Hashing.stringHash32（字符串）调用反射来访问。这种方法是通过使用JDK7基于散列的集合，如果它们的大小将超过jdk.map.althashing.threshold系统属性。这是一个实验性的功能，目前我不建议在你的代码中使用它。
Java的7u6（含）到Java7u40（独家）功能。并不适用于Java8。所有标准的JDK非并发map和set中的所有Java版本的Java7u6（含）和Java7u40（不含）之间受所造成的新的散列实现一个性能问题。这个bug只影响多线程应用程序创建的每秒地图堆。看到这篇文章的更多细节。此问题已修复的Java7u40。

二进制序列化的各种方法的性能:java.nio.ByteBuffer, sun.misc.Unsafe, java.io.DataInputStream,java.io.DataOutputStream, java.io.ByteArrayInputStream,java.io.ByteArrayOutputStream: comparison of binary serialization performance using various classes:

Tags: serialization in Java, unsafe memory access in Java,high throughput, low latency.

直接写一个字节到字节缓冲区是极其缓慢的。你应该避免使用字节缓冲区直接写入单个字节来写入记录。
如果您有原始数组字段 - 始终使用批量的方法来处理它们。 ByteBuffer的批量方法性能接近那些不安全的（尽管ByteBuffer的方法总是有点慢）。如果你需要存储/加载任何其他的基本类型数组字节除外 - 使用ByteBuffer.to[YourType] Buffer.put（数组）方法调用之后的字节缓冲区的位置更新。不要在一个循环调用ByteBuffer.put[YourType]方法！
当块长度较高时 - 堆缓冲区是慢的，直接字节缓冲区比较快。不安全的访问，甚至到不同的领域依然较快。
在Java 7中，跟Java 6相比，许多类型对ByteBuffer的访问进行了优化。
应尽量直接使用字节缓冲区与您的平台本身字节顺序序列化基本类型数组 - 它的性能非常接近不安全的性能，它是可移植的，不同与不安全代码。

Java集合概述:all 在本概述里，对所有的JDK1.6/1.7标准的集合进行了描述和分类。

Tags: Java 1.6 collections, Java 1.7 collections,Java collections guide, overview.

这里是所有的JDK集合一个非常简短的总结：

	Single threaded	Concurrent
Lists	`ArrayList` - generic array-based `LinkedList` - do not use `Vector` - deprecated	`CopyOnWriteArrayList` - seldom updated, often traversed
Queues / deques	`ArrayDeque` - generic array-based `Stack` - deprecated `PriorityQueue` - sorted retrieval operations	`ArrayBlockingQueue` - bounded blocking queue `ConcurrentLinkedDeque / ConcurrentLinkedQueue` - unbounded linked queue (CAS) `DelayQueue` - queue with delays on each element `LinkedBlockingDeque / LinkedBlockingQueue` - optionally bounded linked queue (locks) `LinkedTransferQueue` - may transfer elements w/o storing `PriorityBlockingQueue` - concurrent `PriorityQueue` `SynchronousQueue` - `Exchanger` with `Queue` interface
Maps	`HashMap` - generic map `EnumMap` - `enum` keys `Hashtable` - deprecated `IdentityHashMap` - keys compared with `==` `LinkedHashMap` - keeps insertion order `TreeMap` - sorted keys `WeakHashMap` - useful for caches	`ConcurrentHashMap` - generic concurrent map `ConcurrentSkipListMap` - sorted concurrent map
Sets	`HashSet` - generic set `EnumSet` - set of `enum`s `BitSet` - set of bits/dense integers `LinkedHashSet` - keeps insertion order `TreeSet` - sorted set	`ConcurrentSkipListSet` - sorted concurrent set `CopyOnWriteArraySet` - seldom updated, often traversed

java.util.ArrayList performance guide:java.util.ArrayList:

Tags: low latency, high throughput, CPU cache friendly, Java collections, CPU optimization,memory optimization.

Try to follow these rules while using ArrayList:

Add elements to the end of the list
Remove elements from the end too
Avoid contains, indexOf and remove(Object) methods
Even more avoid removeAll and retainAll methods
Use subList(int, int).clear() idiom to quickly clean a part of the list

java.util.LinkedList performance:java.util.LinkedList, java.util.ArrayDeque:

Tags: Java collections, CPU optimization, avoid it.

If you need to write fast LinkedList code, try to stick to these rules:

Consider using ArrayDeque for queue-based algorithms
Use ListIterator with LinkedList
Avoid any LinkedList methods which accept or return index of an element in the list - they have nothing in common with performance
Check if you have a reason to use LinkedList.remove/removeFirst/removeLast methods, usepollFirst/pollLast instead
Try batch processing LinkedList

Bit sets:java.util.BitSet, java.util.Set<Integer>: representing set of integers in the most compact form, using bit sets to store set ofLong/long values:

Tags: low latency, high throughput, CPU cache friendly, Java collections, CPU optimization,memory optimization.

Do not forget about bit sets when you need to map a large number of integer keys to boolean flags.
Sets of integer values should be replaced with bit sets in a lot of cases in order to save a lot of memory.

java.util.IdentityHashMap: discussion why anIdentityHashMap is so special and what alternatives does it have.

Tags: Java collections, object graph, avoid it.

java.util.IdentityHashMap uses System.identityHashCode to get object identity hash code. Avoid usingIdentityHashMap if you either have primary key field in the objects (use them as a key for ordinaryHashMap) or use Trove maps custom hashing strategy if you need to add your own equals andhashCode methods, but can't update the objects you are working on.
Do not try to iterate IdentityHashMap contents, because iteration order will be different on every run of your program, thus making your program results inconsistent.
Accessing the object identity hash code is a very cheap Java intrinsic operation.
Beware that an object with the calculated identity hash code can not be used forbiased locking. While very rare in normal circumstances, you may end up in this situation if your lock will be accessed by any Java object graph traversal algorithm (serialization, for example).

Regexp-related methods of String:java.util.regex.Pattern, java.util.regex.Matcher, java.lang.String: pattern/matcher logic:

Tags: low latency, high throughput, CPU optimization.

Always (or nearly always) replace String.matches, split, replaceAll, replaceFirst methods withMatcher and Pattern methods - it will save you from unnecessary pattern compilation.
In Java 7 splitting by a single not regex-special character string is optimized inString.split method. Always use String.split to split such strings in Java 7.
In all other simple cases consider handwriting parsing methods for simple situations in the time-critical code. You can easily gain 10 times speedup by replacingPattern methods with handcrafted methods.

java.util.Date, java.util.Calendar and java.text.SimpleDateFormat performance:java.util.Date, java.util.Calendar, java.text.SimpleDateFormat: date storage, parsing and converting back to string:

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

Do not use java.util.Date unless you have to use it. Use an ordinarylong instead.
java.util.Calendar is useful for all sorts of date calculations and i18n, but avoid either storing a lot of such objects or extensively creating them - they consume a lot of memory and expensive to create.
java.text.SimpleDateFormat is useful for general case datetime parsing, but it is better to avoid it if you have to parse a lot of dates in the same format (especially dates without time). Implement a parser manually instead.

Joda Time library performance:org.joda.time.DateTime, org.joda.time.format.DateTimeFormat,org.joda.time.format.DateTimeFormatter.
This is a comparison of Joda Time library classes performance with standard JDK classes performance (java.util.Date,java.util.Calendar, java.text.SimpleDateFormat). I advice you to read this article in conjunction with ajava.util.Date, java.util.Calendar and java.text.SimpleDateFormat performance article.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

All Joda Time date/time objects are built on top of a long timestamp, so it is cheap to create those objects from along.
In Joda Time ver 2.1 creating a date/time object from date/time components (year, month, day, hour, min, sec) is ~3.5 time slower than the same operation for JDKGregorianCalendar.
Date components addition/subtraction is 3.5 times slower in Joda Time rather than inGregorianCalendar. On the contrary, time components operations are about the same 3.5 timesfaster than a GregorianCalendar implementation.
Date/time parsing is working at about the same speed as in JDK SimpleDateFormat. The advantage of Joda parsing is that creating a parser -DateTimeFormatter object is extremely cheap, unlike an expensive SimpleDateFormat, so you don't have to cache parsers anymore.

java.io.ByteArrayOutputStream:java.io.ByteArrayOutputStream, java.nio.ByteBuffer: why you should not useByteArrayOutputStream in the performance critical code.

Tags: Java IO, avoid it.

For performance critical code try to use ByteBuffer instead of ByteArrayOutputStream. If you still want to use ByteArrayOutputStream - get rid of its synchronization.
If you are working on a method which writes some sort of message to unknown OutputStream, always write your message to the ByteArrayOutputStream first and use itswriteTo(OutputStream) method after that. In some rare cases when you are building aString from its byte representation, do not forget about ByteArrayOutputStream.toString methods.
In most cases avoid ByteArrayOutputStream.toByteArray method - it creates a copy of internal byte array. Garbage collecting these copies may take a noticeable time if your application is using a few gigabytes of memory (seeInefficient byte[] to String constructor article for another example).

java.io.BufferedInputStream and java.util.zip.GZIPInputStream:java.io.BufferedInputStream, java.util.zip.GZIPInputStream,java.nio.channels.FileChannel: some minor performance pitfalls in these two streams.

Tags: high throughput, CPU optimization, memory optimization, data compression.

Both BufferedInputStream and GZIPInputStream have internal buffers. Default size for the former one is 8192 bytes and for the latter one is 512 bytes. Generally it worth increasing any of these sizes to at least 65536.
Do not use a BufferedInputStream as an input for a GZIPInputStream, instead explicitly setGZIPInputStream buffer size in the constructor. Though, keeping a BufferedInputStream is still safe.
If you have a new BufferedInputStream( new FileInputStream( file ) ) object and you call itsavailable method rather often (for example, once or twice per each input message), consider overridingBufferedInputStream.available method. It will greatly speed up file reading.

java.lang.Byte, Short, Integer, Long, Character (boxing and unboxing):java.lang.Byte, java.lang.Short, java.lang.Integer,java.lang.Long, java.lang.Character:

Tags: low latency, high throughput, CPU optimization, memory optimization.

Never call java.lang.Number subclasses valueOf(String) methods. If you need a primitive value - callparse[Type]. If you want an instance of a wrapper class, still call parse[Type] method and rely on the JVM-implemented boxing. It will support caching of most frequently used values. Never call wrapper classes constructors - they always return a newObject, thus bypassing the caching support. Here is the summary of caching support for primitive replacement classes:

Byte, Short, Long	Character	Integer	Float, Double
From -128 to 127	From 0 to 127	From -128 to java.lang.Integer.IntegerCache.high or 127, whichever is bigger	No caching

Map.containsKey/Set.contains:java.util.Map, java.util.Set and most of their implementations:

Tags: low latency, high throughput, CPU optimization, Java collections.

For sets, contains+add/remove call pairs should be replaced with singleadd/remove calls even if some extra logic was guarded by contains call.
For maps, contains+get pair shall always be replaced with get followed by null-check of get result. contains+remove pair should be replaced with a single remove call and check of its result.
Same ideas are applicable to Trove maps and sets too.

java.util.zip.CRC32 and java.util.zip.Adler32 performance:java.util.zip.CRC32, java.util.zip.Adler32 and java.util.zip.Checksum:

Tags: CPU optimization, checksum.

If you can choose which checksum implementation you can use - try Adler32 first. If its quality is sufficient for you, use it instead ofCRC32. In any case, use Checksum interface in order to accessAdler32/CRC32 logic.
Try to update checksum by at least 500 byte blocks. Shorter blocks will require a noticeable time to be spent in JNI calls.

hashCode method performance tuning:java.lang.String, java.util.HashMap, java.util.HashSet,java.util.Arrays:

Tags: low latency, high throughput, CPU optimization, memory optimization.

Try to improve distribution of results of your hashCode method. This is far more important than to optimize that method speed. Never write ahashCode method which returns a constant.
String.hashCode results distribution is nearly perfect, so you can sometimes substituteStrings with their hash codes. If you are working with sets of strings, try to end up withBitSets, as described in this article. Performance of your code will greatly improve.

Throwing an exception in Java is very slow: why it is too expensive to throw exceptions in Java:java.lang.Throwable, java.lang.Exception, java.lang.RuntimeException,sun.misc.BASE64Decoder, java.lang.NumberFormatException:

Tags: low latency, high throughput, CPU optimization.

Never use exceptions as return code replacement or for any likely to happen events. Throwing an exception is too expensive - you may experience 100 times slowdown for simple methods.
Avoid using any Number subclass parse*/valueOf methods if you call them for each piece of your data and you expect a lot of non-numerical data. Parse such values manually for top performance.

Java logging performance pitfalls: how to lose as little as possible performance while writing log messages:java.util.logging.Logger, java.util.logging.Handler, java.util.logging.Formatter, java.text.MessageFormat:

Tags: low latency, high throughput, CPU optimization, logging.

If you make expensive calculations while preparing data for log messages, either useLogger.isLoggable and do all data preparation inside or write an object which does all calculations in itstoString method.
Never call Object.toString method in order to obtain a log message argument - just pass an original object. Logging framework will calltoString method on your object.
Do not mix format string concatenation with log arguments - malicious concatenated string will allow your application user to break your logging/access data which was not supposed for user access.

Base64 encoding and decoding performance: an overview of several well-known Base64 Java implementations from the performance perspective:sun.misc.BASE64Encoder, sun.misc.BASE64Decoder, org.apache.commons.codec.binary.Base64,http://iharder.net/base64:

Tags: low latency, high throughput, CPU optimization, serialization in Java.

Definitely don't use sun.misc classes for Base64 encoding/decoding. They are both very slow and not public. Use any 3rd party decoder, but test its performance first.

A possible memory leak in the manual MultiMap implementation: an overview of multimap implementations in Java 8,Google Guava and Scala 2.10 as well as a description of a possible memory leak you can have while manually implementing a multimap using Java 6 or 7.

Tags: Java collections, Java 8, Scala/strong>,Google Guava.

As you have seen, it is quite easy to miss a memory leak while implementing a multilevel map. You need to be careful and split read and write accesses to the outer map.
Newer frameworks and languages, like Google Guava, Java 8 and Scala already provide you more convenient syntax and wider choice of collections thus allowing you to avoid possible memory leaks in the multilevel maps.

Memory optimization

An overview of memory saving techniques in Java: this article will give you the basic advices on memory optimization in Java. Most of other Java memory optimization techniques are based on those advices.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

Prefer primitive types to their Object wrappers. The main cause of wrapper types usage are JDK collections, so consider using one of primitive type collection frameworks likeTrove.
Try to minimize number of Objects you have. For example, prefer array-based structures likeArrayList/ArrayDeque to pointer based structures likeLinkedList.

Memory consumption of popular Java data types - part 1: this article will describe the memory consumption of enums andEnumMap / EnumSet / BitSet / ArrayList / LinkedList / ArrayDeque JDK classes in Java 7.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization,collections.

The following table summarizes the storage occupied per stored value assuming that a Java object reference occupies 4 bytes. Note that you must spend 4 byte perObject reference in any case, so subtract 4 bytes from the values in the following table to find out the storage overhead.

`EnumSet`, `BitSet`	1 bit per value
`EnumMap`	4 bytes (for value, nothing for key)
`ArrayList`	4 bytes (but may be more if `ArrayList` capacity is seriously more than its size)
`LinkedList`	24 bytes (fixed)
`ArrayDeque`	4 to 8 bytes, 6 bytes on average

Memory consumption of popular Java data types - part 2: this article will describe the memory consumption ofHashMap / HashSet, LinkedHashMap / LinkedHashSet, TreeMap / TreeSet andPriorityQueue JDK classes in Java 7 as well as their Trove replacements.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization,collections, primitive collections.

Always try to replace HashMap with Trove THashMap, HashSet with a THashSet and finally, LinkedHashSet with a TroveTLinkedHashSet. Such replacement requires adding a single letter to your code (letter 'T') and no other code changes except the import statement. Such replacement will give you significant memory savings - see table below.
The following table summarizes the storage occupied per stored value assuming that a reference occupies 4 bytes. Note that you must spend 4 byte perObject reference in any case, so subtract 4 bytes from the values in the following table to find out the storage overhead (subtract 8 bytes for maps, because there is a key as well as a value).

JDK collection	Size	Possible Trove substitution	Size
`HashMap`	32 * SIZE + 4 * CAPACITY bytes	`THashMap`	8 * CAPACITY bytes
`HashSet`	32 * SIZE + 4 * CAPACITY bytes	`THashSet`	4 * CAPACITY bytes
`LinkedHashMap`	40 * SIZE + 4 * CAPACITY bytes	None
`LinkedHashSet`	32 * SIZE + 4 * CAPACITY bytes	`TLinkedHashSet`	8 * CAPACITY bytes
`TreeMap, TreeSet`	40 * SIZE bytes	None
`PriorityQueue`	4 * CAPACITY bytes	None

A few more memory saving techniques in Java: this article describes the advantages of static inner classes, string pooling, boolean flag collections as well as special classes for tiny collections in JDK.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization,collections.

Make all your inner classes static by default. Remove static qualifier only when you have to.
If you have a collection of generally small collections, try to use java.util.Collections.empty*/singleton* methods for memory-efficient storage of tiny collections.
Prefer a BitSet to arrays/lists of boolean or dense sets of any integer types: bit sets are both memory and CPU cache friendly.

String.intern in Java 6, 7 and 8 - string pooling: This article will describe howString.intern() method was implemented in Java 6 and what changes were made in it in Java 7 and Java 8 (which finally made it extremely useful).

Tags: CPU optimization, memory optimization.

Stay away from String.intern() method on Java 6 due to a fixed size memory area (PermGen) used for JVM string pool storage.
Java 7 and 8 implement the string pool in the heap memory. It means that you are limited by the whole application memory for string pooling in Java 7 and 8.
Use -XX:StringTableSize JVM parameter in Java 7 and 8 to set the string pool map size. It is fixed, because it is implemented as a hash map with lists in the buckets. Approximate the number of distinct strings in your application (which you intend to intern) and set the pool size equal to some prime number close to this value. It will allowString.intern to run in the constant time and requires a rather small memory consumption per interned string (explicitly used JavaWeakHashMap will consume 4-5 times more memory for the same task).
The default value of -XX:StringTableSize parameter is 1009 in Java 7 and around 25-50K in Java 8.

String.intern in Java 6, 7 and 8 - multithreaded access: This article describes the performance impact of the multithreaded calls toString.intern().

Tags: CPU optimization, memory optimization.

Feel free to use String.intern() in the multithreaded code. "8 writers" scenario has only17% overhead compared to "1 writer" (singlethreaded) scenario. "1 writer, 7 readers" scenario has9% overhead in my test compared to the singlethreaded results.
JVM string pool is NOT thread local. Each string added to the pool will be available to all other threads in the JVM thus further improving the program memory consumption.

String.intern in Java 6, 7 and 8 - part 3:String.intern() usage best practices.

Tags: CPU optimization, memory optimization.

Despite serious optimizations done in the String.intern() implementation in Java 7+, it still takes a noticeable time to run (noticeable for CPU sensitive applications). The simple example in this article runs 3.5 times faster without calls to String.intern(). You should not use String.intern() as a safety net, passing every long living string into it. Instead process only fields with a limited number of possible distinct values (for example, states/provinces if processing addresses) - memory savings in this situation will definitely pay off the initial CPU costs ofString.intern().

Trove library: using primitive collections for performance: an overview of Trove library, which is a primitive type collection library.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization,primitive collections, CPU cache friendly.

The main reason to use Trove maps is a reduced memory consumption. If there is a large array list/set/map with keys or values that could be a primitive type, it is worth replacing it with Trove collection. If there are some maps from a primitive type to a primitive type, it is especially worth to replace them.
Trove maps and sets support custom hashing strategies which allow to implement map/set specificequals and hashCode, for example to implement identity set or map.
Trove collections implement several additional methods, like grep,retainEntries or adjustOrPutValue. They allow to reduce code required for many common tasks.

Various types of memory allocation in Java: how to allocate a large memory buffer in Java and how to write any Java types into such buffer.

Tags: low latency, high throughput, finance, CPU optimization, low level memory access in Java.

Array size in Java is limited by the biggest int value = 2^31 - 1. On the other hand, you are not limited by2Gb - 1 bytes as a size of your array - you may allocate a long[], which occupies 8 times more memory (16Gb - 8 bytes).
You may use sun.misc.Unsafe.allocateMemory(long size) for allocating a buffer longer than 2Gb, but you will have to free such buffers yourself.
You can use sun.misc.Unsafe memory access methods for reading/writing any Java datatype from/to both Java arrays andUnsafe buffers in the uniform manner.

Memory introspection using sun.misc.Unsafe and reflection: how to find out Java object memory layout usingsun.misc.Unsafe and reflection.

Tags: memory usage in Java, memory allocation in Java.

You can use the following sun.misc.Unsafe methods for obtaining Java object layout information:objectFieldOffset, arrayBaseOffset and arrayIndexScale.
Java Object reference size depends on your environment. It may be equal to 4 or 8 bytes depending on your JVM settings and on the amount of memory you have given to your JVM. It is always 8 bytes for heaps over 32G, but for smaller heaps it is 4 bytes unless you will turn off -XX:-UseCompressedOops JVM setting.

Protobuf data encoding for numeric datatypes: what type of numeric data encoding is used in Google Protobuf, how it impacts the compressed data size and how fast is it.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization,serialization in Java.

Always try to upgrade to integer datatypes if you have found that some double/float fields in your serializable collections contain only integer values. This will increase the compression ratio of the general purpose algorithms on your binary data.
Try to use Google protobuf or any other similar encoding for your integer data if a noticeable part of your values happen to be small (either non-negative or by absolute value). You will get a noticeable data size reduction at a very low CPU cost, which will help you to store and later read a higher number of messages per time unit.

Use case: compacting price field disk representation:double, short, java.math.BigDecimal: an example of compacting your data:

Tags: high throughput, finance, memory optimization.

Try to avoid storing double values in your disk data structures. Often same information may be represented in a smaller number of bytes (compare, for example, 0.01 converted withwritePriceUnsigned and 3f 84 7a e1 47 ae 14 7b - binary representation of 0.01).
Analyze properties of the data you have to store in the largest data structures in your programs. Try to identify cases when most of your data can fit into a more compact data type than an original one.
See Use case: how to compact a long-to-long mapping for other ideas of compacting your data.

Use case: how to compact a long-to-long mapping: a use case where we try to identify some long-2-long mapping properties in order to represent it in the most compact form.

Tags: low latency, high throughput, CPU optimization, memory optimization, data compression.

Analyze properties of the data you have to store in the largest data structures in your programs. Try to identify cases when most of your data can fit into a more compact data type than an original one. Number-to-number maps can be especially effectively compacted if you can notice that keys are nearly consecutive values. In this case a map can be converted into the array.

String packing part 1: converting characters to bytes: we discuss Java objects memory layout and consumption. After that we try to pack aString into a more compact representation, trying to minimize using anyObjects.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization,data compression.

java.lang.String objects were designed to be fast and flexible. That's why they can share internalchar[] with other strings. They also cache the calculated hash code value, because strings are often used asHashMap keys or HashSet values. But these properties add a great penalty for short strings memory footprint. We can avoid this penalty by implementing our own string replacement objects.
Oracle developers tried to solve the same problem in late Java 6 releases by introducing-XX:+UseCompressedStrings option. Unfortunately, it is not supported anymore in Java 7, maybe due to not so big memory savings as one may expect from it.

String packing part 2: converting Strings to any other objects: we discuss how and when to convert aString into various more compact Java objects for temporary string representation.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization,data compression.

If you have a big collection of strings or objects with String fields in memory and you know that in some cases (at least at 10% of cases, for example) these strings may be actually converted to primitive type values, you may replace yourString fields with Object fields and use provided pack/unpack methods to convert Strings to Objects and back, thus saving memory.
If you couldn't convert a string to a primitive, consider converting your strings intobyte[] in UTF-8 encoding. This is loseless conversion, so you could always convert your binarybyte[] back into an original string.

Small tricks

I/O bound algorithms: SSD vs HDD: This article will investigate an impact of modern SSDs on the I/O bound algorithms of HDD era.

Tags: low latency, high throughput, finance, CPU optimization, hardware, Java IO.

Replacing an HDD storage with an SSD one can turn your application from being I/O bound to being CPU-bound. It is especially related to a pure stream data read/write operations, because modern SSDs are capable to process data at speeds over 300Mb/sec (few applications can produce data at such speed).
Modern operating systems try to write data in the background, not blocking your application. Your write requests will be blocked only if OS can't write data faster than your application produces it.
SSD seek time has dramatically decreased compared to HDD seek time (~100 seeks/sec for HDD -> over 2000 seeks/sec for SSD). On the other hand, even SSD seek is too slow compared to modern CPU speed. On my laptop, CPU can execute about a million commands while an SSD executes a seek operation. Always try to arrange your data as a stream.

Forbidden Java actions: object assignments, type conversions etc on the low level in Java: This article will reveal you a few details about the low level Java memory layout: we will see how to implement Object assignments using just primitive types. Then we will see what's hidden in the array header and will convert an array of one type into an array of another type.

Tags: memory usage in Java, memory allocation in Java,unsafe memory access in Java.

All Java object references occupy 4 bytes for under 32G heaps. You can use sun.misc.Unsafe in order to treat such references as int fields.
Java arrays contain element type as int at the offset=8 in the array header. Length (int) is stored at offset=12. Changing these values is possible, but care must be taken in order not to extend an updated array outside of initially allocated memory.

Forbidden Java actions: updating final and static final fields: This article will discuss how you can updatefinal or static final fields in Java using reflection andsun.misc.Unsafe.

Tags: memory usage in Java, memory allocation in Java,unsafe memory access in Java.

If you want to update a private and/or final field using Java reflection - make aMethod or Field accessible via Method/Field.setAccessible( true ) and then set a new field value.
If you want to update a final static field using reflection - you will need to make 2 steps: make the field itself accessible and then make accessiblemodifiers field of Field you want to update and remove final flag from Field modifiers. Such updates to static final fields of primitive/String initialized with complile-time expressions will not be visible to clients, because static fields initialized with constant expressions (JLS 15.28) are inlined.
You may also update final and static final fields usingsun.misc.Unsafe. Use Unsafe.putN( base, offset, newValue ) methods for updates.base is the owner of a field for instance fields and the Class object for static fields.offset could be obtained with Unsafe.objectFieldOffset( Field ) for instance fields andUnsafe.staticFieldOffset( Field ) for static fields.

Forbidden Java actions: not declaring a checked exception; avoiding a constructor while creating an object: In this article we will see how to throw a checked exception in Java without declaring it in the methodthrows clause and how to create an object without calling any of its constructors.

Tags: memory usage in Java, memory allocation in Java,unsafe memory access in Java.

There are several ways to avoid declaring a checked exception in a method which throws it. You can useThread.stop(Throwable), Class.newInstance (and throw an exception in the constructor),sun.misc.Unsafe.throwException or use generic type erasure in order to avoid a checked exception declaration in thethrows clause. We do not recommend you to use any of these practices :)
In Java you may create an object without calling any of its constructors. There are 2 legal ways to do it - cloning and serializing.sun.misc.Unsafe allows you to create an uninitialized instance of an object bypassing its constructors usingUnsafe.allocateInstance method.

Static constructor code is not JIT-optimized in a lot of cases: Static constructor code is generally executed in the interpreted mode, even if you have a heavy calculations in it. But there is a way to force it run in the compiled mode:

Tags: Java pitfalls, avoid it.

If you need to execute CPU-expensive logic in your class static constructor, check if it takes excessive time to execute it. In this case try to move that logic into a separate helper class.

Inefficient byte[] to String constructor: be careful when usingpublic String(byte bytes[], int offset, int length, Charset charset) constructor in Java 6:

Tags: Java pitfalls, avoid it.

Always make a copy of a part of your byte array you want to convert into a String, otherwise this constructor will make a temporary copy of your full original buffer.
Try to avoid unnecessary memory allocations in your program, because it may impact performance of your program in case if it is already using enough memory (1G+).

Java varargs performance issues: a short review of the actual varargs implementation in Java 5+.

Varargs are great for most application code because they shorten program code, but they should be replaced with precompiled arrays when all members of varargs are known constants.

Primitive types to String conversion and String concatenation: a description of various types of string concatenation in Java as well as a few JVM options helping us to make the string concatenation even faster.

Never use concatenation with an empty string "" as a "to string conversion". Use appropriateString.valueOf or wrapper types toString(value) methods instead.
Whenever possible, use StringBuilder for string concatenation. Check old code and get rid ofStringBuffer is possible.
Use -XX:+OptimizeStringConcat option introduced in Java 6 update 20 in order to improve string concatenation performance. It is turned on by default in most of Java 7 releases, but it is still turned off in Java 6_41.

Use cases

In this set of articles we try to apply principles discussed in the other articles to the "real world" problems.

Use case: FIX messages processing. Part 1: Writing a simple FIX parser andUse case: FIX messages processing. Part 2: Composing a message out of fields: possible gateway implementation:a tag-based FIX message parsing and composing is described in two these articles. In essence, we parse a0x0001 separated string into a list of name=value tags, which are converted to actual datatypes after that. In the second part we will discuss a best way to compose these messages back to String format as a part of a gateway implementation.

Tags: low latency, high throughput, finance, CPU optimization.

Always try to cache parsed dates if they do not have a time component in case of message processing: number of various dates in modern financial data is very low.
String.split should usually be avoided. The only exception is a single character pattern in Java 7. You can still write faster code even in this case, but you should add some parsing logic into a splitting loop.
Never parse a "field=value" pair with String.split. String.indexOf(char) with separator character is a far better alternative.
Always try to avoid "binary/string -> Java type -> binary/string" conversions for short-living objects. It is always better to store an original message (if you don't modify a message) or original fields (if you modify only some fields) and reuse them when you have to compose an output message, rather than to convert back from Java types into binary/text message. Besides saving CPU cycles on data conversions, you will also avoid unnecessary memory allocations for converted values.

Use case: Optimizing memory footprint of a read only csv file (Trove, Unsafe, ByteBuffer, data compression): we will see how to optimize memory consumption of a Java program which has to store a large set of readonly data in memory (usingByteBuffer or sun.misc.Unsafe). We will also try replacing index fields with their hash codes, still supporting the case of hash code collisions.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

If you want to keep a large number of read-only records in memory, consider storing them in the most possible compact form in aByteBuffer (or using sun.misc.Unsafe).
Use Trove maps for indices.
Check if you need to keep your ID field values as maps keys (you can still check ID field stored inByteBuffer) or hash code is sufficient as a key.
If you know the possible query keys in advance - don't store IDs at all - hash code is sufficient replacement for map keys in a lot of situations.

Single file vs multi file storage: a short research on the file system cache implementation in Windows 7 and Linux.

Tags: hardware, file system, CPU optimization.

In general, avoid storing your data in a large number of data files. At least, limit the number of such files, so that the growth of your data will not linearly impact the number of files you have to process.
If you still need to handle a large number of small files, use Linux and any of its native file systems which allow you to use 4K sectors (thus limiting the storage and read/write overhead).
Once you have read the files (at least in Linux), you may expect to find them in the file cache, especially if the global memory consumption in your system is not too high. The same applies to the case of one application writing these files to disk and another one reading them - chances are high that the second program will read these files from OS file cache instead of actually reading them from the disk.

NEW: Static code compilation in Groovy 2.0: we will see how static compilation in Groovy makes it as fast as Java.

Tags: Groovy, dynamic languages, CPU optimization.

Groovy is a dynamic JVM language using dynamic dispatch for its method calls. Dynamic dispatch in Groovy 2.1.9 is approximately 3 times slower compared to a normal Java method call due to the need to obtain a method name and argument types (method signature) and match it to the cached java.lang.reflect.Method.
Groovy 2.0 has added the static compilation feature via @CompileStatic annotation, which allows to compile most of Groovy method calls into direct JVM bytecode method calls, thus avoiding all the dynamic dispatch overhead. Besides performance improvements, static compilation is also responsible for type checking of your Groovy code, letting you to discover a lot of typos/mistakes at compile time, thus reducing the need for extensive coverage unit tests.