Threading lightly, Part 2: Reducing contention

原文链接：http://www.ibm.com/developerworks/java/library/j-threads2/index.html

When we say a program is "too slow," we are generally referring to one of two performance attributes -- latency or scalability.Latency describes how long it takes for a given task to complete, whereas scalability describes how a program's performance varies under increasing load or given increased computing resources. A high degree of contention is bad for both latency and scalability.

Why contention is such a problem

Contended synchronizations are slow because they involve multiple thread switches and system calls. When multiple threads contend for the same monitor, the JVM has to maintain a queue of threads waiting for that monitor (and this queue must be synchronized across processors), which means more time spent in the JVM or OS code and less time spent in your program code. Moreover, contention impairs scalability because it forces the scheduler to serialize operations, even if a free processor is available. When one thread is executing a synchronized block, any thread waiting to enter that block is stalled. If no other threads are available for execution, then processors may sit idle.

If we want to write scalable multithreaded programs, we must reduce contention for critical resources. There are a number of techniques for doing so, but before you can apply any of them, you need to take a good look at your code and figure out under what conditions you will be synchronizing on common monitors. Determining what locks are bottlenecks can be quite difficult; sometimes locks are hidden within class libraries or implicitly specified through synchronized methods, and therefore are less obvious when reviewing the code. Moreover, the current state of tools for detecting contention is quite poor.

Technique 1: Get in, get out

Don't miss the rest of this series

Part 1, "Synchronization is not the enemy" (July 2001)

Part 3, "Sometimes it's best not to share" (October 2001)

One obvious technique for reducing the likelihood of contention is to make synchronized blocks as short as possible. The shorter the time a thread holds a given lock, the lower the probability that another thread will request it while the first thread is holding it. So while you should use synchronization to access or update shared variables, it is usually better to do any thread-safe pre-processing or post-processing outside of the synchronized block.

Listing 1 demonstrates this technique. Our application maintains a HashMap for representing attributes of various entities; one such attribute is the list of access rights that a given user has. The access rights are stored as a comma-separated list of rights. The method userHasAdminAccess() looks up a user's access rights in the global attributes table, and looks to see if the user has the access called "ADMIN".

Listing 1. Spending more time in a synchronized block than necessary

  public boolean userHasAdminAccess(String userName) {synchronized (attributesMap) {String rights = attributesMap.get("users." + userName + ".accessRights");if (rights == null) return false;elsereturn (rights.indexOf("ADMIN") >= 0);}}

This version of userHasAdminAccess is thread-safe, but holds the lock for much longer than necessary. To create the concatenated string "users.brian.accessRights", the compiler will create a temporary StringBuffer object, callStringBuffer.append three times, and then call StringBuffer.toString, which means at least two object creations and several method calls. It will then call HashMap.get to retrieve the string, and then call String.indexOf to extract the desired rights identifier. As a percentage of the total work done in this method, the pre- and post-processing are significant; because they are thread-safe, it makes sense to move them out of the synchronized block, as shown in Listing 2.

Listing 2. Reducing the time spent in a synchronized block

  public boolean userHasAdminAccess(String userName) {String key = "users." + userName + ".accessRights";String rights;synchronized (attributesMap) {rights = attributesMap.get(key);}return ((rights != null) && (rights.indexOf("ADMIN") >= 0));}

On the other hand, it's possible to take this technique too far. If you have two operations that require synchronization separated by a small block of thread-safe code, you are generally better just using a single synchronized block.

Technique 2: Reducing lock granularity

Another valuable technique for reducing contention is to spread your synchronizations over more locks. For example, suppose that you have a class that stores user information and service information in two separate hash tables, as shown in Listing 3.

Listing 3. An opportunity for reducing lock granularity

public class AttributesStore {private HashMap usersMap = new HashMap();private HashMap servicesMap = new HashMap();public synchronized void setUserInfo(String user, UserInfo userInfo) {usersMap.put(user, userInfo);}public synchronized UserInfo getUserInfo(String user) {return usersMap.get(user);}public synchronized void setServiceInfo(String service, ServiceInfo serviceInfo) {servicesMap.put(service, serviceInfo);}public synchronized ServiceInfo getServiceInfo(String service) {return servicesMap.get(service);}
}

Here, the accessor methods for user and service data are synchronized, which means that they are synchronizing on theAttributesStore object. While this is perfectly thread-safe, it increases the likelihood of contention for no real benefit. If a thread is executing setUserInfo, it means that not only will other threads be locked out of setUserInfo and getUserInfo, as is desired, but they will also be locked out of getServiceInfo and setServiceInfo.

This problem can be avoided by having the accessor simply synchronize on the actual objects being shared (the userMap andservicesMap objects), as shown in Listing 4.

Listing 4. Reducing lock granularity

public class AttributesStore {private HashMap usersMap = new HashMap();private HashMap servicesMap = new HashMap();public void setUserInfo(String user, UserInfo userInfo) {synchronized(usersMap) {usersMap.put(user, userInfo);}}public UserInfo getUserInfo(String user) {synchronized(usersMap) {return usersMap.get(user);}}public void setServiceInfo(String service, ServiceInfo serviceInfo) {synchronized(servicesMap) {servicesMap.put(service, serviceInfo);}}public ServiceInfo getServiceInfo(String service) {synchronized(servicesMap) {return servicesMap.get(service);}}
}

Now threads accessing the services map will not contend with threads trying to access the users map. (In this case, the same effect could also be obtained by creating the maps using the synchronized wrapper mechanism provided by the Collections framework, Collections.synchronizedMap.) Assuming that requests against the two maps are evenly distributed, in this case this technique would cut the number of potential contentions in half.

Applying Technique 2 to HashMap

One of the most common contention bottlenecks in server-side Java applications is the HashMap. Applications use HashMap to cache all sorts of critical shared data (user profiles, session information, file contents), and the HashMap.get method may correspond to many bytecode instructions. For example, if you are writing a Web server, and all your cached pages are stored in a HashMap, every request will want to acquire and hold the lock on that map, and it will become a bottleneck.

We can extend the lock granularity technique to handle this situation, although we must be careful as there are some potential Java Memory Model (JMM) hazards associated with this approach. The LockPoolMap in Listing 5 exposes thread-safe get() andput() methods, but spreads the synchronization over a pool of locks, reducing contention substantially.

LockPoolMap is thread-safe and functions like a simplified HashMap, but has more attractive contention properties. Instead of synchronizing on the entire map on each get() or put() operation, the synchronization is done at the bucket level. For each bucket, there's a lock, and that lock is acquired when traversing a bucket either for read or write. The locks are created when the map is created (there would be JMM problems if they were not.)

If you create a LockPoolMap with many buckets, many threads will be able to use the map concurrently with a much lower likelihood of contention. However, the reduced contention does not come for free. By not synchronizing on a global lock, it becomes much more difficult to perform operations that act on the map as a whole, such as the size() method. An implementation of size() would have to sequentially acquire the lock for each bucket, count the number of nodes in that bucket, and release the lock and move on to the next bucket. But once the previous lock is released, other threads are now free to modify the previous bucket. By the time size() finishes calculating the number of elements, it could well be wrong. However, theLockPoolMap technique works quite well in some situations, such as shared caches.

Listing 5. Reducing locking granularity on a HashMap

import java.util.*;/*** LockPoolMap implements a subset of the Map interface (get, put, clear)* and performs synchronization at the bucket level, not at the map* level.  This reduces contention, at the cost of losing some Map* functionality, and is well suited to simple caches.  The number of* buckets is fixed and does not increase.*/public class LockPoolMap {private Node[] buckets;private Object[] locks;private static final class Node {public final Object key;public Object value;public Node next;public Node(Object key) { this.key = key; }}public LockPoolMap(int size) {buckets = new Node[size];locks = new Object[size];for (int i = 0; i < size; i++)locks[i] = new Object();}private final int hash(Object key) {int hash = key.hashCode() % buckets.length;if (hash < 0)hash *= -1;return hash;}public void put(Object key, Object value) {int hash = hash(key);synchronized(locks[hash]) {Node m;for (m=buckets[hash]; m != null; m=m.next) {if (m.key.equals(key)) {m.value = value;return;}}// We must not have found it, so put it at the beginning of the chainm = new Node(key);m.value = value;m.next = buckets[hash];buckets[hash] = m;}}public Object get(Object key) {int hash = hash(key);synchronized(locks[hash]) {for (Node m=buckets[hash]; m != null; m=m.next) if (m.key.equals(key))return m.value;}return null;}
}

Table 1 compares the performance of three shared map implementations; a synchronized HashMap, an unsynchronized HashMap(not thread-safe), and a LockPoolMap. The unsynchronized version is present only to show the overhead of contention. A test that does random put() and get() operations on the map was run, with a variable number of threads, on a dual-processor system Linux system using the Sun 1.3 JDK. The table shows the run time for each combination. This test is somewhat of an extreme case; the test programs do nothing but access the map, and so there will be many more contentions than there would be in a realistic program, but it is designed to illustrate the performance penalty of contention.

Table 1. Scalability comparison between HashMap and LockPoolMap

Threads	Unsynchronized HashMap (unsafe)	Synchronized HashMap	LockPoolMap
1	1.1	1.4	1.6
2	1.1	57.6	3.7
4	2.1	123.5	7.7
8	3.7	272.3	16.7
16	6.8	577.0	37.9
32	13.5	1233.3	80.5

While all the implementations exhibit similar scaling characteristics for large numbers of threads, the HashMap implementation exhibits a huge performance penalty when going from one thread to two, because there will be a contention on every singleput() and get() operation. With more than one thread, the LockPoolMap technique is approximately 15 times faster than theHashMap technique. This difference reflects the time lost to scheduling overhead and to idle time spent waiting to acquire locks. The advantage of LockPoolMap would be even larger on a system with more processors.

Technique 3: Lock collapsing

Another technique that may improve performance is called "lock collapsing" (see Listing 6). Recall that the methods of the Vectorclass are nearly all synchronized. Imagine that you have a Vector of String values, and you are searching for the longest String. Suppose further that you know that elements will be added only at the end, and that they will not be removed, making it (mostly) safe to access the data as shown in the getLongest() method, which simply loops through the elements of the Vector, callingelementAt() to retrieve each one.

The getLongest2() method is very similar, except that it obtains the lock on the Vector before starting the loop. The result of this is that when elementAt() attempts to acquire the lock, the JVM sees that the current thread already has it, and will not contend. It lengthens the synchronized block, which appears to be in opposition to the "get in, get out" principle, but because it is avoiding so many potential synchronizations, it can be considerably faster, as less time will be lost to the scheduling overhead.

On a dual-processor Linux system running the Sun 1.3 JDK, a test program with two threads that simply looped callinggetLongest2() was more than 10 times faster than one that called getLongest(). While both programs had the same degree of serialization, much less time was lost to scheduling overhead. Again, this is an extreme example, but it shows that the scheduling overhead of contention is not trivial. Even when run with a single thread, the collapsed version was some 30 percent faster: it is much faster to acquire a lock that you already hold than one that nobody holds.

Listing 6. Lock collapsing.

  Vector v;...public String getLongest() {int maxLen = 0;String longest = null;for (int i=0; i<v.size(); i++) {String s = (String) v.elementAt(i);if (s.length() > maxLen) {maxLen = s.length();longest = s;}}return longest;}public String getLongest2() {int maxLen = 0;String longest = null;synchronized (v) { for (int i=0; i<v.size(); i++) {String s = (String) v.elementAt(i);if (s.length() > maxLen) {maxLen = s.length();longest = s;}  }  return longest;}}

Conclusion

Contended synchronization can have a serious impact on the scalability of your programs. Even worse, unless you perform realistic load testing, contention-related performance problems do not always present themselves during the development and testing process. The techniques presented in this article are effective for reducing the cost of contention in your programs, and increasing the load they can bear before exhibiting nonlinear scaling behavior. But before you can apply these techniques, you first have to analyze your program to determine where contention is likely to occur.

In the last installment in this series, we'll examine ThreadLocal, an oft-neglected facility of the Thread API. By using ThreadLocal, we can reduce contention by giving each thread its own copy of certain critical objects. Stay tuned!

Resources

Java Performance Tuning by Jack Shirazi (O'Reilly & Associates, 2000) provides guidance on eliminating performance issues in the Java platform.
Java Platform Performance: Strategies and Tactics by Steve Wilson and Jeff Kesselman (Addison-Wesley, 2000) offers techniques for the experienced Java programmer to build speedy and efficient Java code.
Java Performance and Scalability, Volume 1: Server-Side Programming Techniques by Dov Bulka (Addison-Wesley, 2000) provides a wealth of tips and tricks designed to help you increase the performance of your apps.
Brian Goetz' recent article "Double-checked locking: Clever, but broken" (JavaWorld, February 2001) explores the JMM in detail and the surprising consequences of failing to synchronize in certain situations.
Doug Lea's Concurrent Programming in Java, Second Edition (Addison-Wesley, 1999) is a masterful book on the subtle issues surrounding multithreaded programming in Java.
In his article "Writing multithreaded Java applications" (developerWorks, February 2001), Alex Roetter introduces the Java Thread API, outlines issues involved in multithreading, and offers solutions to common problems.
The performance modeling and analysis team at IBM Thomas J. Watson Research Center is researching several projects in the areas of performance and performance management.
Find more Java technology resources on the developerWorksJava technology zone.

Threading lightly, Part 2: Reducing contention相关推荐

Java中如何实现每天定时对数据库的操作
现在有一个很棘手的问题:客户要赠加一个功能,就是每天晚上11点要统计一下数据,并存到一个文件中,我试着用线程,但是总达不到理想的效果.请给点思路,多谢了. 我们的开发环境是tomcat和servlet ...
云应用设计模式（三）
Queue-Based Load Leveling Pattern 基于队列的负载均衡模式 Article文章 08/26/2015 2015年8月26日 5 minutes to read还有五分钟 ...
vvvvvvvvvvvvvvvvvvvvvvvvv
Java Concurrency In Practice Brian Göetz Tim Peierls Joshua Bloch Joseph Bowbeer David Holmes Doug L ...
【转】Alert Log Messages: Private Strand Flush Not Complete [ID 372557.1]
文章转自:oracle 官网 Modified 01-SEP-2010 Type PROBLEM Status MODERATED In this Document Symptom ...
mysqld install mysql default_MySQL安装默认配置
查看MySQL默认数据保存地,执行语句: SELECT @@databir; 查看MySQL安装文件路径,执行语句: SELECT @@basedir; MySQL默认配置文件: # Other de ...
mysql innodb_sort_buffer_size_mysql优化---第7篇：参数 innodb_buffer_pool_instances设置
摘要:1 innodb_buffer_pool_instances可以开启多个内存缓冲池,把需要缓冲的数据hash到不同的缓冲池中,这样可以并行的内存读写. 2 innodb_buffer_pool_ ...
MYSQL 5.7 解压版 windows 环境下安装
MYSQL 5.7 解压版 windows 环境下安装 1.下载MYSQL 2.下载完成后,放入要部署目录,解压. 3.新建 my.ini 文件 data 子文件夹千万不要自己建,否则后续启动会报错 ...
数据库实验一(MySQL基本操作命令总结)
MySQL总结 //进入MySQL数据库的bin目录下 cd C:\Program Files\MySQL\MySQL Server 8.0\bin//登录MySQL mysql -u root -p ...
MySQL关于Table cache设置，看这一篇就够了
导读:本文整理对table_definition_cache,table_open_cache和table_open_cache_instances这几种参数的理解,希望对大家有帮助. 先看看官网怎么 ...

Threading lightly, Part 2: Reducing contention

Don't miss the rest of this series

Threading lightly, Part 2: Reducing contention相关推荐

最新文章

热门文章