2019独角兽企业重金招聘Python工程师标准>>>

https://dzone.com/articles/high-concurrency-http-clients-on-the-jvm

HTTP is probably the most popular application-level protocol and there are many libraries that implement it on top of network I/O, which is a special (stream-oriented) case of general I/O. Since all I/O has a much in common, let’s start with some discussion about it.

I’ll concentrate on I/O cases with a lots of concurrent HTTP requests, for example micro-services, where a set of higher-level HTTP services invoke several lower-level ones, some concurrently and some sequentially due to data dependencies.

When serving many such requests the total number of concurrently open connections can become big at times; if there are data dependencies, or if the lower-level services are slow (or slowed down due to exceptional conditions). So microservice layers tend to require many concurrent, potentially long-lived connections. To see how many open connections we are required to support without crashing let’s recall Little’s Law with Ψ being the average in-progress requests count, ρ being the average arrival rate and τ being the average completion time:

Ψ = ρ τ

The number of in-progress requests we can support depends on the language runtime, the OS and the hardware; the average request completion time (or latency), depends on what we have to do in order to fulfill the requests, including of course the calls to any lower level services, access to storage etc.

How many concurrent HTTP requests can we support? Each will need an open connection and some runnable primitive that can read/write on it using syscalls. If the memory, I/O subsystem and network bandwidth can keep up, modern OSes can support hundreds of thousands open TCP connections; the runnable primitives they provide to work on sockets are threads. Threads are much more heavyweight than sockets: a single box running a modern OS can only support 5000-15000 of them.

From 10,000 Feet: I/O Performance on the JVM

Nowadays JDK threads are OS threads on most platforms but if at any time there are only few concurrent connections then the “thread-per-connection” model is perfectly fine.

What if not? The answer to this question has changed along history:

  • JDK pre-1.4 only had libraries calling into the OS’ thread-blocking I/O (java.io pkgs), so only the “thread-per-connection” model or thread-pools could be used. If you wanted something better you’d tap into your OS’ additional features through JNI.
  • JDK 1.4 added non-blocking I/O or NIO (java.nio packages) to read/write from connections only if it can be done immediately, without putting the thread to sleep. Even more importantly it added a way for a single thread to work effectively on many channels with socket selection, which means asking the OS to block the current thread and unblock it when it is possible to receive/send data immediately from at least one socket of a set.
  • JDK 1.7 added NIO.2, also known as asynchronous I/O (still java.nio packages). This means asking the OS to perform I/O tasks completely in the background and wake up a thread with a notification later on, only when the I/O has finished.

Calling HTTP From the JVM Either Easily or Efficiently: the Thread-blocking and Async Toolboxes

There’s a wide selection of open-source HTTP client libraries available for the JVM. The thread-blocking APIs are easy to use and to maintain but potentially less efficient with many concurrent requests, while the async ones are efficient but harder to use. Asynchronous APIs also virally affect your code with asynchrony: any method consuming asynchronous data must be asynchronous itself, or block and nullify the advantages of asynchrony.

Here’s a selection of open-source HTTP clients for Java and Clojure:

  • JDK’s URLConnection uses traditional thread-blocking I/O.
  • Apache HTTP Client uses traditional thread-blocking I/O with thread-pools.
  • Apache Async HTTP Client uses NIO.
  • Jersey is a ReST client/server framework; the client API can use several HTTP client backends including URLConnection and Apache HTTP Client.
  • OkHttp uses traditional thread-blocking I/O with thread-pools.
  • Retrofit turns your HTTP API into a Java interface and can use several HTTP client backends including Apache HTTP Client.
  • Grizzly is network framework with low-level HTTP support; it was using NIO but it switched to AIO .
  • Netty is a network framework with HTTP support (low-level), multi-transport, includes NIO and native (the latter uses epoll on Linux).
  • Jetty Async HTTP Client uses NIO.
  • Async HTTP Client wraps either Netty, Grizzly or JDK’s HTTP support.
  • clj-http wraps the Apache HTTP Client.
  • http-kit is an async subset of clj-http implemented partially in Java directly on top of NIO.
  • http async client wraps the Async HTTP Client for Java.

From 10,000 Feet: Making it Easy

Since Java threads are heavy on resources, if we want to perform I/O and scale to many concurrent connections we have to use either NIO or async NIO; on the other hand they are much more difficult to code and maintain. Is there a solution to this dilemma?

If threads weren’t heavy we could just use straightforward blocking I/O, so our question really is: can we have cheap enough threads that could be created in much larger numbers than OS threads?

At present the JVM itself doesn’t provide lightweight threads but Quasar comes to the rescue withfibers, which are very efficient threads, implemented in userspace.

Calling HTTP From the JVM Both Easily and Efficiently: the Comsat Fiber-blocking Toolbox

Comsat integrates some of the existing libraries with Quasar fibers. The Comsat APIs are identical to the original ones and the HTTP clients section) explains how to hook them in; for the rest simplyensure you’re running Quasar properly, fire up your fibers when you need to perform a new HTTP call and use one (or more) of following fiber-blocking APIs (or take inspiration from templates and examples:

  • Java:

    • An extensive subset of the Apache HTTP Client API, integrated by bridging the async one. Apache HTTP Client is mature, efficient, feature-complete and very widely used.
    • The fiber-blocking Retrofit API wraps the Apache client. Retrofit is a modern and high-level HTTP client toolkit that has been drawing a lot of interest also for ReST.
    • The JAXRS synchronous HTTP client API, integrated by bridging Jersey’s async one. Jersey is a very popular JAXRS-compliant framework for ReST, so several micro-services could decide to use both its server and client APIs.
    • The OkHttp synchronous API, integrated by bridging the OkHttp async API. OkHttp performs very well, is cheap on resources and feature-rich yet at the same time it has a very straightforward API for common cases, plus it supports HTTP2 and SPDY as well.
  • Clojure:
    • An extensive subset of the clj-http API, integrated by bridging the async API of http-kit. clj-http is probably the most popular HTTP client API in the Clojure ecosystem.

New integrations can be added easily and of course contributions are always welcome.

Some Load Tests with JBender

jbender is Pinterest’s Quasar-based network load testing framework. It’s efficient and flexible but thanks to Quasar fiber-blocking its source code is tiny and readable; using it is just as straightforward as using traditional thread-blocking I/O.

Consider this project, which builds on JBender and with a tiny amount of code implements HTTP load test clients for all the Comsat-integrated libraries, both in their original thread-blocking version and in Comsat’s fiber-blocking one.

JBender can use any either (plain, heavyweight, OS) threads or fibers to perform requests, both are abstracted by Quasar to a shared abstract class called Strand, so the thread-blocking and fiber-blocking versions share HTTP code: this proves that the Comsat-integrated APIs are exactly the same as the original ones and that fibers and threads are used exactly in the same way.

The load-test clients accept parameters to customize pretty much every aspect of their run but the test cases we’ll consider are the following:

  1. 41000 long-lived HTTP connections fired at the highest possible rate.
  2. Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 1 second each with a target rate of 1000 rps.
  3. Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 100 milliseconds each with a target rate of 10000 rps.
  4. Executing 10000 requests (plus 1000 of initial client and server warmup) with an immediate reply and a target rate of 100000 rps.

All of the tests have been fired against a server running Dropwizard, optimized to employ fibers on the HTTP server-side with comsat-dropwizard for maximum concurrency. The server simply replies to any request with “Hello!”

Here’s some information about our load test environment:

Parallel Universe Stack Quasar 0.7.4-SNAPSHOT, Comsat 0.5.0
Fiber Server (comsat-dropwizard0.5.0, Jetty 9.2.9) AWSEC2 Linux m4.xlarge (16 GB, 4 vcpus, high net perf)
Client (GET “/” -> 204 “Hello”) AWSEC2 Linux t2.medium (4 GB, 2 vcpus, moderate-to-low net perf)
OS Settings https://github.com/circlespainter/jbender
JBender load test suite (server + clients) https://github.com/circlespainter/comsat-http-client-bench
CPU/RAM Monitoring method JFR
CPU/RAM sampling interval JFR’s default
JVM Oracle 1.8.0_b66
JVM Settings -XX:+AggressiveOpts
HTTP Client settings No retries, maximum-sized connection pool, I/O threads (only async) = <cpus> (= 2 for m4.medium), connect/read/write/ttl timeout = 1h
Warmup 1000 reqs, both server and client
Request generator buffer = # reqs with pre-generation for throughput tests, = 1 for concurrency tests
Request completion events buffer = # reqs with throughput tests, = 1 for concurrency tests

The first important result is that the Comsat-based clients win hands-down, each compared to its respective non-fiber mode. Apache’s for many long-lasting connections and OkHttp’s for lots of short-lived requests with a very high target rate, both with a small and a bigger heap (resp. 990 MiB and 3 GiB, showing just the first one for brevity):

HTTP Client Load Test (colored is best) Regular (thread-blocking) Apache (4.4.1) Comsat (fiber-blocking) Apache (async 4.1) Regular (thread-blocking) OkHttp2.4.0 Comsat (fiber-blocking) OkHttp 2.4.0 Regular (thread-blocking) Jersey 2.19w/JDKconnector Comsat (fiber-blocking) Jersey 2.19w/JDKconnector
AHCblocking (BIO) AHC async (NIO) + Quasar fibers OkHttp blocking (BIO) OkHttp async (BIO) + Quasar fibers Jersey blocking Jersey async + Quasar fibers
Long-lived concurrent 41k (maximum rate possible) Max 16715 41k 16358 16608 16713 16713
Error OOM - thread - OOM - thread OOM - thread OOM - thread OOM - thread
Time (s) 8.8 8.8 16.6 16.7 16.5 20.2
Heap max (MiB) N/A 702 N/A N/A N/A N/A
Heap avg (MiB) 246
Threads max 16
Throughput with target rate 1k (response after 1s) Time max (ms) 7139 1138 10209 1301 6341 4370
Time avg (ms) 2359 1002 3031 1008 1902 1477
Heap max (MiB) 227 110 125 119 330 342
Heap avg (MiB) 61 29.8 34.9 30 76.8 73.7
Threads max 4000+ 15 4000+ 1900+ 4300+ 2600+
Throughput with target rate 10k (response after 100ms) Time max (ms) 4898 4085 7939 7079 45198 14512
  Time avg (ms) 2479 2717 3423 2125 25885 7594
  Heap max (MiB) 338 192 179 165 495 489
  Heap avg (MiB) 91.3 67.2 40.9 38.5 147 155
  Threads max 7500+ 16 4900+ 3900+ 11000+ 6900+
Throughput with target rate 100k (immediate response) Time max (ms) 6937 3590 4668 1793 9303 9840
  Time avg (ms) 1468 1821 1287 826 1659 3442
  Heap max (MiB) 226 188 130 113 354 398
  Heap avg (MiB) 62.2 66 36.1 33.2 79.5 122
  Threads max 3500+ 16 2600+ 2000+ 4000+ 4000+
Notes         OkHttp doesn’t useNIO but regular blocking I/O under the hood.   Jersey uses one thread per connection even in the async case

OkHttp excels in speed and memory utilization for fast requests. The fiber version for the JVM uses the async API and performs significantly better even though the underlying mechanism is traditional blocking I/O served by a thread pool.

Even more impressive is the measure by which the http-kit-based fiber-blocking comsat-httpkit wins against a traditional clj-http client (still showing just with the small heap):

HTTP Client Load Test (colored is best) clj-http comsat-httpkit
AHC blocking (BIO) http-kit async (NIO) + Quasar fibers
Long-lived concurrent 41k (maximum rate possible) Max 15715 41k
Error OOM - thread -
Time (s)   5
Heap max (MiB) N/A 511
Heap avg (MiB) 127
Threads max 14
Throughput with target rate 1k (response after 1s) Time max (ms) 19059 1102
Time avg (ms) 8720 1003
Heap max (MiB) 405 331
Heap avg (MiB) 94.4 64.3
Threads max 9000+ 16
Throughput with target rate 10k (response after 100ms) Time max (ms) 22045 5545
Time avg (ms) 8960 4102
Heap max (MiB) 406 250
Heap avg (MiB) 117 52.1
Threads max 7000+ 15
Throughput with target rate 100k (immediate response) Time max (ms) 42849 3438
Time avg (ms) 34750 4698
Heap max (MiB) 523 259
Heap avg (MiB) 481 50.2
Threads max 11000+ 16

There are other Jersey providers as well (Grizzly, Jetty and Apache) but Jersey proved the worst of the bunch with a generally higher footprint and an async interface (used by Comsat’s fiber-blocking integration) that unfortunately spawns and blocks a thread for each and every request; for this reason (and probably also due to each provider’s implementation strategy) the fiber version sometimes provides clear performance benefits and sometimes doesn’t. Anyway these numbers are not as interesting as the Apache, OkHttp and http-kit ones so I’m not including them here, but let me know if you’d like to see them.

(Optional) From 100 < 10,000 Feet: More About I/O Performance on the JVM

So you want to know why fibers are better than threads in highly concurrent scenarios.

When only few concurrent sockets are open, the OS kernel can wake up blocked threads with very low latency. But OS threads are general purpose and they add considerable overhead for many use cases: they consume a lot of kernel memory for bookkeeping, synchronization syscalls can be orders of magnitude slower than procedure calls, context switching is expensive, and the scheduling algorithm is too generalist. All of this means that at present OS threads are just not the best choice for fine-grained concurrency with significant communication and synchronization, nor for highly concurrent systems in general .

Blocking I/O syscalls can indeed block expensive OS threads indefinitely, so a “thread-per-connection” approach will tear your system down very fast when you’re serving lots of concurrent connections; on the other hand using a thread-pool will probably make the “accepted” connection queue overflow because we can’t keep the arrival pace or cause unacceptable latencies at the very least. A “fiber-per-connection” approach instead is perfectly sustainable because fibers are so lightweight.

Summing it up: threads can be better at latency with few concurrent connections and fibers are better at throughput with many concurrent connections.

Of course fibers need to run on top of active OS threads because the OS knows nothing about fibers, so fibers are scheduled on a thread pool by Quasar. Quasar is just a library and runs entirely in user-space, which means that a fiber performing a syscall will block its underlying JVM thread for the entire call duration, making it unavailable to other fibers. That’s why it’s important that such calls are as short as possible and especially they shouldn’t wait for long time or, even worse, indefinitely: in practice fibers should only perform non-blocking syscalls. So how can we make blocking HTTP clients run so well on fibers? As those libraries provide a non-blocking (but inconvenient) API as well, weconvert that async APIs to a fiber-blocking ones and use it to implement the original blocking API. The new implementation (which is very short and is little more than a wrapper) will:

  1. Block the current fiber.
  2. Start an equivalent asynchronous operation, and pass in a completion handler that will unblock the fiber when finished.

From the fiber’s (and programmer’s) perspective the execution will restart after the library call when I/O completes, just like when using a thread and a regular thread-blocking call.

Wrap-up

With Quasar and Comsat you can easily write and maintain highly concurrent and HTTP-intensive code in Java, Clojure or Kotlin and you can even choose your favorite HTTP client library, without any API lock-ins. Do you want to use something else? Let us know, or integrate it with Quasar yourself.

  1. …and much not in common, for example file I/O (which is block-oriented) supports memory-mapped I/O which doesn’t make sense with stream-oriented I/O.
  2. Read this blog post for further discussion.
  3. Not so before 1.2, when it had (only) Green Threads.
  4. Using thread-pools means dedicating a limited or anyway managed amount (or pool) of threads to fulfill a certain type of tasks, in this case serving HTTP requests: incoming connections are queued until a thread in the pool is free to serve it (as an aside, “connection pooling” is something entirely different and it’s most often about reusing DB connections).
  5. Have a look at this intro for more information.
  6. Read for example this, this and this for more information and benchmarks as well as this guest post on ZeroTurnaround RebelLabs’s blog if you want more insight about why and how fibers are implemented.

转载于:https://my.oschina.net/qiangzigege/blog/792625

老外写的关于协程的性能文章-主打http协议相关推荐

  1. python协程池_python3下multiprocessing、threading和gevent性能对比—-暨进程池、线程池和协程池性能对比 | 学步园...

    目前计算机程序一般会遇到两类I/O:硬盘I/O和网络I/O.我就针对网络I/O的场景分析下python3下进程.线程.协程效率的对比.进程采用multiprocessing.Pool进程池,线程是自己 ...

  2. python协程学习——写个并发获取网站标题的工具

    ​ 平时做渗透的时候,有时候给的是一些域名.一些 url .一些 ip 或者三者都有,手动去一个个地打开比较浪费时间.我们需要用最短时间发现一些有趣的目标,如 xx 管理后台.于是让我们用 pytho ...

  3. Day794.如何用协程来优化多线程业务 -Java 性能调优实战

    如何用协程来优化多线程业务 Hi,我是阿昌,今天学习记录的是关于如何用协程来优化多线程业务. 近一两年,国内很多互联网公司开始使用或转型 Go 语言,其中一个很重要的原因就是 Go 语言优越的性能表现 ...

  4. 忘记Rxjava吧,你应该试试Kotlin的协程

    0.前言 协程以前一直是Kotlin作为实验性的一个库,前些日子发现1.3版本的kotlin relese了协程,所以就找时间研究了一下,本来早就想写这篇文章了,但是因为离职换工作的原因,迟迟未能动笔 ...

  5. 协程的概念及Python中利用第三方库gevent使用协程

    提到程序的并发操作,大多数人程序员首先想到的进程或者线程.我们先复习一下进程和线程的概念.   进程: 进程(Process)是计算机中的程序关于某数据集合上的一次运行活动,是系统进行资源分配和调度的 ...

  6. 单线程实现并发——协程,gevent模块

    一 并发的本质 1 切换 2 保存状态 二 协程的概念 协程,又称微线程,纤程.英文名Coroutine.单线程下实现并发,用户从应用程序级别控制单线程下任务的切换,注意一定是遇到I/O才切. 协程的 ...

  7. python协程库_python中协程的详解(附示例)

    本篇文章给大家带来的内容是关于python中协程的详解(附示例),有一定的参考价值,有需要的朋友可以参考一下,希望对你有所帮助. 协程,又称微线程,纤程.英文名Coroutine 协程看上去也是子程序 ...

  8. C/C++协程实现-学习笔记

    协程,又称微线程,纤程.英文名Coroutine. 协程的概念很早就提出来了,但直到最近几年才在某些语言(如Lua\go\C++20)中得到广泛应用. 子程序,或者称为函数,在所有语言中都是层级调用, ...

  9. coroutine协程详解

    前两天阿里巴巴开源了coobjc,没几天就已经2千多star了,我也看了看源码,主要关注的是协程的实现,周末折腾了两整天参照Go的前身libtask和风神的coroutine实现了一部分,也看了一些文 ...

最新文章

  1. 堆(heap)与栈(stack)的区别(一)
  2. Android 常用的地球经纬度转换公里(km)计算工具类
  3. excel图片变成代码_三行代码把女朋友照片变成了素描图片!以为我画的!爱我爱的不行...
  4. (转)Django ==== 实战学习篇五 模板系统说明
  5. python简单操作题_Python简单练习题可以一起做做
  6. 浅谈测试驱动开发(TDD)
  7. jdbctemplate 开启事务_来,讲讲Spring事务有哪些坑?
  8. java -cp 引用多个包_javac编译单文件、多文件引入jar包、-cp解决无法加载主类问题...
  9. 如何在零停机的情况下迁移 Kubernetes 集群
  10. Redis快的原因:线程切换 IO 内存 数据结构 VM机制
  11. 学习使用新浪接口随笔(一)
  12. hibernate注解实体类(Dept.java)
  13. java获取当前时间星期几_java怎么获取当前日期是星期几
  14. 机器视觉系统中相机镜头选型技巧_工业相机在机器视觉系统中的地位和作用
  15. Xen Documentation - Hypercall Interfaces
  16. python和java的区别-Python与Java的区别与优劣?
  17. 以下选项中表述为oracle,oracle 选择题
  18. Net设计模式实例之桥接模式( Bridge Pattern)(4)
  19. (日常搬砖)Linux常用指令记录(更新ing)
  20. bzoj1754 [Usaco2005 qua]Bull Math

热门文章

  1. 《神经网络》学习笔记
  2. Java最全思维导图知识汇总
  3. 大数据分析体系由哪些层级构成
  4. 文件管理系统源码_【程序源代码】人力资源管理系统
  5. 卫星轨道的估计问题(Matlab)(二):扩展卡尔曼滤波(EKF)对新问题的尝试
  6. TypeScript笔记(5)—— 基本数据类型
  7. 838计算机专业课包含什么,华南农业大学
  8. Linux登录日志配置,Unix系统用户登录及操作命令日志配置的方法
  9. 第九届“图灵杯”NEUQ-ACM程序设计竞赛个人赛题解
  10. Spark之functions