linux内核如何修改lowmem,Linux Kernel Tuning for C500k

Like the idea of working on large scale problems? We’re hiring talented engineers, and would love to chat with you – check it out!

Note: Concurrency, as defined in this article, is the same as it is for The C10k problem: concurrent clients (or sockets).

At Urban Airship we recently published a blog post about scaling beyond 500,000 concurrent socket connections. Hitting these numbers was not a trivial exercise so we’re going to share what we’ve come across during our testing. This guide is specific to Linux and has some information related to Amazon EC2, but it is not EC2-centric. These principles should apply to just about any Linux platform.

For our usage, squeezing out as many possible socket connections per server is valuable. Instead of running 100 servers with 10,000 connections each, we’d rather run 2 servers with 500,000 connections apiece. To do this we made the socket servers pretty much just socket servers. Any communication between the client and server is passed through a queueand processed by a worker. Having less for the socket server to do means less code, cpu-usage, and ram-usage.

To get to these numbers we must consider the Linux kernel itself. A number of configurations needed tweaking. But first, an anecdote.

The Kernel, OOM, LOWMEM, and You

We first tested our code on a local Linux box that had Ubuntu 64-bit with 6GB of RAM, connecting with several Ubuntu VMs per client using bridged network adapters so we could ramp up our connections. We’d fire up the server and run our clients locally to see just how many connections we could hit. We noticed that we could hit 512,000 with our Java server not even breaking a sweat.

The next step was to test on EC2. We first wanted to see what sort of numbers we could get on “Small” instances, which are 1.7GB 32-bit VMs. We also had to fire up a number of other EC2 instances to act as clients.

We watched the numbers go up and up without a hitch until, seemingly randomly, the Java server fell over. It didn’t print any exceptions or die gracefully—it was killed.

We tried the same process again to see if we could replicate the behavior. Killed again.

Grepping through syslog, we found this line:Out of Memory: Killed process 2178 java

The OOM-killer killed the Java process. Having watched the free RAM closely, this was odd because we had at least 500MB free at the time of the kill.

The next time we ran it we watched the contents of /proc/meminfo. What we noticed was a steady decline of the field “LowFree”, the amount of LOWMEM that is available. LOWMEM is the kernel-addressable RAM space used for kernel data. Data like socket buffers.

As we increased the number of sockets each socket’s buffers increased the amount of LOWMEM used. Once LOWMEM was full the kernel (instead of simply panicking) found the user process responsible for the usage and promptly killed it so it could continue to function.

On a standard EC2 Small, the configuration is such that the LOWMEM is around 717MB and the rest is “given” to the user. The kernel is smart about reallocating LOWMEM for the user, but not the other way around. The assumption is that the kernel will use very little ram, or at least a predictable finite amount, and the user should be allowed to go crazy. What we needed with our socket server was just the opposite. We needed the kernel to use all the ram it needed—our Java server rarely uses above a few hundred MB.

(For an in-depth rundown, take a look at High Memory In The Linux Kernel)

On a 32-bit system the kernel-addressable RAM space is 4GB. Making sure the proper space reserved for the kernel is important. But on 64-bit (x86-64) Linux the kernel-addressable space is 64TB (terabytes). At the current state of computing this is effectively limitless, and as such you will not even see LowMem in /proc/meminfo because it is all LOWMEM.

So we created some EC2 Large instances (each of which is 64-bit with 7.5GB of RAM) and ran our tests again, this time without any surprises. The sockets were added happily and the kernel took all the RAM it needed.

Long story short, you can only scale to so many sockets on a 32-bit platform.

Kernel Options

Several parameters exist to allow for tuning and tweaking of socket-related parameters. In /etc/sysctl.conf there are a few options we’ve modified.

First is fs.file-max, the maximum file descriptor limit. The default is quite low so this should be adjusted. Be careful if you’re not ready to go super high.

Second, we have the socket buffer parameters net.ipv4.tcp_rmem and net.ipv4.tcp_wmem. These are the buffers for reads and writes respectively. Each requires three integer inputs: min, default, and max. These each correspond to the number of bytes that may be buffered for a socket. Set these low with a tolerant max to reduce the amount of ram used for each socket.

The relevant portions of our config look like this:fs.file-max = 999999

net.ipv4.tcp_rmem = 4096 4096 16777216

net.ipv4.tcp_wmem = 4096 4096 16777216

Meaning that the kernel allows for 999,999 open file descriptors and each socket buffer has a minimum and default 4096-byte buffer, with a sensible max of 16MB.

We also modified /etc/security/limits.conf to allow for 999,999 open file descriptors for all users.#

* - nofile 999999

You may want to look at the manpage for more information.

Testing

When testing, we were able to get about 64,000 connections per client by increasing the number of ephemeral ports allowed on both the client and the server.echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

This effectively allows every ephemeral port above 1024 be used instead of the default, which is a much lower (and typically more sane) default.

The 64k Connection Myth

It’s a common misconception that you can only accept 64,000 connections per IP address and the only way around it is to add more IPs. This is absolutely false.

The misconception begins with the premise that there are only so many ephemeral ports per IP. The truth is that the limit is based on the IP pair, or said another way, the client and server IPs together. A single client IP can connect to a server IP 64,000 times and so can another client IP.

Were this myth true it would be a significant and easy-to-exploit DDoS vector.

Scaling for Everyone

When we set out to establish half a million connections on a single server we were diving deep into water that wasn’t well documented. Sure, we know that C10k is relatively trivial, but how about an order of magnitude (and then some) above that?

Fortunately we’ve been able to achieve success without too many serious problems. Hopefully our solutions can help save time for those out there looking to solve the same problems.

linux内核如何修改lowmem,Linux Kernel Tuning for C500k相关推荐

linux内核如何修改lowmem,技术内幕：Android对Linux内核的增强 Low Memory Killer
6 09 2013 技术内幕:Android对Linux内核的增强 Low Memory Killer Low Memory Killer(低内存管理) 对于PC来说,内存是至关重要.如果某个程序发 ...
linux内核 lts长期演进,Linux Kernel 4.19 将成为下一个LTS（长期支持）系列
最近Linux内核开发人员和维护人员Greg Kroah-Hartman透露,Linux Kernel 4.19将下一个长期支持的Linux内核系列. 现在Linux Kernel 4.17已经达到使 ...
搭建《深入Linux内核架构》的Linux环境
搭建<深入Linux内核架构>的Linux环境阅读目录(Content) 作者软件概述正文一.安装GCC 二.编译Linux内核三.制作跟文件系统四.运行qemu 五.启动l ...
一文了解linux内核,一文了解Linux的系统结构
什么是 Linux ? 如果你以前从未接触过Linux,可能就不清楚为什么会有这么多不同的Linux发行版.在查看Linux软件包时,你肯定被发行版.LiveCD和GNU之类的术语搞晕过.初次进入Li ...
Linux内核入门-如何获取Linux内核源代码、生成配置内核
如何获取Linux内核源代码如何获取Linux内核源代码下载Linux内核当然要去官方网站了,网站提供了两种文件下载,一种是完整的Linux内核,另一种是内核增量补丁,它们都是tar归档压缩包.除 ...
查看linux内核的编译时间,linux内核编译步骤
linux内核编译步骤对于linux新手来说,编译内核相对有一些难度,甚至不知道如何入手,我通过在网上收集这方面的资料,最终编译成功.现在我归纳了一下,写出这一篇还算比较详细的步骤,希望能对各位新手 ...
Linux内核开发_1_编译LInux内核
目录 1. 准备工作 1.1 学习环境 1.2 下载Linux内核源码 1.3 解压Linux内核 1.4 目录结构介绍 2. Linux内核配置 2.1 配置选项 1. make config 2. ...
linux内核学习之三：linux中的32位与64位
linux内核学习之三:linux中的"32位"与"64位" 在通用PC领域,不论是windows还是linux界,我们都会经常听到"32位" ...
linux内核启动分析三,Linux内核分析实验三：跟踪分析Linux内核的启动过程
贺邦 + 原创作品转载请注明出处 + <Linux内核分析>MOOC课程 http://mooc.study.163.com/course/USTC-1000029000 一. 实验过程 ...

linux内核如何修改lowmem,Linux Kernel Tuning for C500k

linux内核如何修改lowmem,Linux Kernel Tuning for C500k相关推荐

最新文章

热门文章