March 31, 2016

In this post I’m going to do a silly, but interesting, exercise that should never be done in any program that actually matters. I’m going write a program that changes one of its function definitions while it’s actively running and using that function. Unlike last time, this won’t involve shared libraries, but it will require x86_64 and GCC. Most of the time it will work with Clang, too, but it’s missing an important compiler option that makes it stable.

If you want to see it all up front, here’s the full source: hotpatch.c

Here’s the function that I’m going to change:

void
hello(void)
{puts("hello");
}

It’s dead simple, but that’s just for demonstration purposes. This will work with any function of arbitrary complexity. The definition will be changed to this:

void
hello(void)
{static int x;printf("goodbye %d\n", x++);
}

I was only going change the string, but I figured I should make it a little more interesting.

Here’s how it’s going to work: I’m going to overwrite the beginning of the function with an unconditional jump that immediately moves control to the new definition of the function. It’s vital that the function prototype does not change, since that would be a farmore complex problem.

But first there’s some preparation to be done. The target needs to be augmented with some GCC function attributes to prepare it for its redefinition. As is, there are three possible problems that need to be dealt with:

  • I want to hotpatch this function while it is being used by another thread withoutany synchronization. It may even be executing the function at the same time I clobber its first instructions with my jump. If it’s in between these instructions, disaster will strike.

The solution is the ms_hook_prologue function attribute. This tells GCC to put a hotpatch prologue on the function: a big, fat, 8-byte NOP that I can safely clobber. This idea originated in Microsoft’s Win32 API, hence the “ms” in the name.

  • The prologue NOP needs to be updated atomically. I can’t let the other thread see a half-written instruction or, again, disaster. On x86 this means I have an alignment requirement. Since I’m overwriting an 8-byte instruction, I’m specifically going to need 8-byte alignment to get an atomic write.

The solution is the aligned function attribute, ensuring the hotpatch prologue is properly aligned.

  • The final problem is that there must be exactly one copy of this function in the compiled program. It must never be inlined or cloned, since these won’t be hotpatched.

As you might have guessed, this is primarily fixed with the noinline function attribute. Since GCC may also clone the function and call that instead, so it also needs the nocloneattribute.

Even further, if GCC determines there are no side effects, it may cache the return value and only ever call the function once. To convince GCC that there’s a side effect, I added an empty inline assembly string (__asm("")). Since puts() has a side effect (output), this isn’t truly necessary for this particular example, but I’m being thorough.

What does the function look like now?

__attribute__ ((ms_hook_prologue))
__attribute__ ((aligned(8)))
__attribute__ ((noinline))
__attribute__ ((noclone))
void
hello(void)
{__asm("");puts("hello");
}

And what does the assembly look like?

$ objdump -Mintel -d hotpatch
0000000000400848 <hello>:400848:       48 8d a4 24 00 00 00    lea    rsp,[rsp+0x0]40084f:       00400850:       bf d4 09 40 00          mov    edi,0x4009d4400855:       e9 06 fe ff ff          jmp    400660 <puts@plt>

It’s 8-byte aligned and it has the 8-byte NOP: that lea instruction does nothing. It copies rsp into itself and changes no flags. Why not 8 1-byte NOPs? I need to replace exactly one instruction with exactly one other instruction. I can’t have another thread in between those NOPs.

Hotpatching

Next, let’s take a look at the function that will perform the hotpatch. I’ve written a generic patching function for this purpose. This part is entirely specific to x86.

void
hotpatch(void *target, void *replacement)
{assert(((uintptr_t)target & 0x07) == 0); // 8-byte aligned?void *page = (void *)((uintptr_t)target & ~0xfff);mprotect(page, 4096, PROT_WRITE | PROT_EXEC);uint32_t rel = (char *)replacement - (char *)target - 5;union {uint8_t bytes[8];uint64_t value;} instruction = { {0xe9, rel >> 0, rel >> 8, rel >> 16, rel >> 24} };*(uint64_t *)target = instruction.value;mprotect(page, 4096, PROT_EXEC);
}

It takes the address of the function to be patched and the address of the function to replace it. As mentioned, the target must be 8-byte aligned (enforced by the assert). It’s also important this function is only called by one thread at a time, even on different targets. If that was a concern, I’d wrap it in a mutex to create a critical section.

There are a number of things going on here, so let’s go through them one at a time:

Make the function writeable

The .text segment will not be writeable by default. This is for both security and safety. Before I can hotpatch the function I need to make the function writeable. To make the function writeable, I need to make its page writable. To make its page writeable I need to call mprotect(). If there was another thread monkeying with the page attributes of this page at the same time (another thread calling hotpatch()) I’d be in trouble.

It finds the page by rounding the target address down to the nearest 4096, the assumed page size (sorry hugepages). Warning: I’m being a bad programmer and not checking the result of mprotect(). If it fails, the program will crash and burn. It will always fail systems with W^X enforcement, which will likely become the standard in the future. Under W^X (“write XOR execute”), memory can either be writeable or executable, but never both at the same time.

What if the function straddles pages? Well, I’m only patching the first 8 bytes, which, thanks to alignment, will sit entirely inside the page I just found. It’s not an issue.

At the end of the function, I mprotect() the page back to non-writeable.

Create the instruction

I’m assuming the replacement function is within 2GB of the original in virtual memory, so I’ll use a 32-bit relative jmp instruction. There’s no 64-bit relative jump, and I only have 8 bytes to work within anyway. Looking that up in the Intel manual, I see this:

Fortunately it’s a really simple instruction. It’s opcode 0xE9 and it’s followed immediately by the 32-bit displacement. The instruction is 5 bytes wide.

To compute the relative jump, I take the difference between the functions, minus 5. Why the 5? The jump address is computed from the position after the jump instruction and, as I said, it’s 5 bytes wide.

I put 0xE9 in a byte array, followed by the little endian displacement. The astute may notice that the displacement is signed (it can go “up” or “down”) and I used an unsigned integer. That’s because it will overflow nicely to the right value and make those shifts clean.

Finally, the instruction byte array I just computed is written over the hotpatch NOP as a single, atomic, 64-bit store.

    *(uint64_t *)target = instruction.value;

Other threads will see either the NOP or the jump, nothing in between. There’s no synchronization, so other threads may continue to execute the NOP for a brief moment even through I’ve clobbered it, but that’s fine.

Trying it out

Here’s what my test program looks like:

void *
worker(void *arg)
{(void)arg;for (;;) {hello();usleep(100000);}return NULL;
}int
main(void)
{pthread_t thread;pthread_create(&thread, NULL, worker, NULL);getchar();hotpatch(hello, new_hello);pthread_join(thread, NULL);return 0;
}

I fire off the other thread to keep it pinging at hello(). In the main thread, it waits until I hit enter to give the program input, after which it calls hotpatch() and changes the function called by the “worker” thread. I’ve now changed the behavior of the worker thread without its knowledge. In a more practical situation, this could be used to update parts of a running program without restarting or even synchronizing.

Further Reading

These related articles have been shared with me since publishing this article:

  • Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?
  • x86 API Hooking Demystified
  • Living on the edge: Rapid-toggling probes with cross modification on x86
  • arm64: alternatives runtime patching

Hotpatching a C Function on x86相关推荐

  1. linux 培训6,Linux Syscalls有 6个参数(Linux Syscalls with 6 parameters)

    Linux Syscalls有> 6个参数(Linux Syscalls with > 6 parameters) 是否可以编写一个具有6个以上输入参数的(linux内核)sycall函数 ...

  2. Linux二进制exploit入门

    二进制-linux篇 介绍 逆向 二进制正常情况下,我们不容易获得执行文件的源码, 因此需要用到"逆向"来分析执行程序来寻找漏洞, 所以"逆向"的作用是尽可能把 ...

  3. [转]信息安全相关理论题(三)

    21.静态分析是运行程序后进行调试? A. 对 B. 错 您的答案: 标准答案: B 22.安卓反编译后会出现$符号字节码表示是匿名内部类? A. 对 B. 错 您的答案: 标准答案: A 23.反编 ...

  4. [转]信息安全相关理论题(二)

    27.在工程实施之前,验收方可以不给施工方弱电布线图纸,但施工结束后必须有图纸 A. 对 B. 错 您的答案: 标准答案: B 28.在OSI七层协议中,提供一种建立连接并有序传输数据的方法的层是 A ...

  5. Chapter 5. The Stack

    Chapter 5. The Stack Introduction A Real-World Analogy Stacks in x86 and x86-64 Architectures What I ...

  6. 为x86 CPU自动调度神经网络

    为x86 CPU自动调度神经网络 对特定设备和工作负载进行自动调试对于获得最佳性能至关重要.这是有关如何使用自动调度器为x86 CPU调试整个神经网络的文档. 为了自动调试神经网络,将网络划分为小的子 ...

  7. x86 cpu卷积网络的自动调谐

    x86 cpu卷积网络的自动调谐 这是一个关于如何为x86cpu调整卷积神经网络的文档. 本文不会在Windows或最新版本的macOS上运行.要让它运行,需要将主体包装在 if name == &q ...

  8. +++++++X86平台系统启动流程

    操作系统的组成.内核的功能.库.函数.头文件.函数名.Linux内核.X86平台系统启动流程 忘记密码如何登陆系统 操作系统的组成:kernel + rootfs , kernel + 应用程序 ke ...

  9. 寄存器理解 及 X86汇编入门

    本文整理自多材料源,感谢原址分享,请查看末尾Url I, 汇编语言分类: 汇编语言和CPU息息相关,但是不能把汇编语言完全等同于CPU的机器指令.不同架构的CPU指令并不相同,如x86,powerpc ...

  10. MinHook - 最小化的 x86/x64 API 钩子库

    背景 对windows API钩子感兴趣的人都知道有一个优秀的库被微软命名为'Detours'.它真的很有用,但是它的免费版本(Express)是不支持X64.它的收费版本(Professional) ...

最新文章

  1. 彻底搞懂机器学习中的正则化
  2. 【错误记录】安卓编译错误 ( Could not find xxx.tools.build:aapt2 )
  3. [Linux实用工具]munin-node插件配置和插件编写
  4. jquery 删除字符串最后一个字符的方法
  5. Android Studio项目整合PullToRefresh的问题记录
  6. 农业银行数据库最佳实践和发展规划
  7. UI设计师应该知道的汉字体种类的用途(免费素材)
  8. appinventor广州服务器网页,app inventor服务器
  9. 为什么会有这么多种程序设计语言?
  10. 算法—二叉查找树的相关一些操作及总结
  11. 为何各家抢滩物联网?
  12. comet学习(三)cometd心跳机制
  13. 【c++ | 谭浩翔】第四章练习
  14. QtQuick 技巧 2
  15. LeetCode 781 森林中的兔子 题解
  16. 七张图总结了我的2021年,心依然热,情依然真----感谢2021年的自己,感谢CSDN
  17. 拍案惊奇——软件调试实战训练营
  18. Ac4GlcNAz,98924-81-3,N-乙酰葡糖胺叠氮基,可以进行糖化学修饰
  19. 渲染多层材料的综合框架
  20. linux 命令行别名,bash命令行实用的别名-alias命令

热门文章

  1. further occurrences of HTTP header parsing errors will be logged at DEBUG level.
  2. C语言的面向对象设计之 X264,FFMPEG 架构探讨
  3. 【初探IONIC】不会Native可不可以开发APP?
  4. 链接数据库 远程事务的处理方式
  5. 搭建Windows SVN服务器及TortoiseSVN使用帮助和下载
  6. VSTS强制删除死锁项
  7. (结构型模式)FlyWeight——享元模式
  8. 为Struts 2.0做好准备
  9. django配置文件
  10. OkHttp3用法全解析