GDB定位诡异的FreeSWITCH编译问题
【问题现象】
在freeswitch控制台中,使用show channels命令检查当前会话时,发现有部分垃圾数据(会话已经结束,但系统中还保留着该会话信息)
freeswitch@internal> show channels
uuid,direction,created,created_epoch,name,state,cid_name,cid_num,ip_addr,dest,application,application_data,dialplan,context,read_codec,read_rate,read_bit_rate,write_codec,write_rate,write_bit_rate,secure,hostname,presence_id,presence_data,callstate,callee_name,callee_num,callee_direction,call_uuid,sent_callee_name,sent_callee_num,initial_cid_name,initial_cid_num,initial_ip_addr,initial_dest,initial_dialplan,initial_context
46a38a98-0abe-11e7-9a47-bbab893a85bc,inbound,2017-03-1711:03:20,1489719800,sofia/internal/138xxxxxxxx@172.xx.xx.xx,CS_EXECUTE,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,bridge,user/188xxxxxxxx@172.xx.xx.xx,XML,public,opus,16000,0,opus,16000,0,,vhost212.test.quality,138xxxxxxxx@172.xx.xx.xx,,ACTIVE,OutboundCall,188xxxxxxxx,SEND,46a38a98-0abe-11e7-9a47-bbab893a85bc,OutboundCall,188xxxxxxxx,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,XML,public
46afc984-0abe-11e7-9a56-bbab893a85bc,outbound,2017-03-1711:03:20,1489719800,sofia/internal/188xxxxxxxx@172.xx.xx.xx,CS_EXCHANGE_MEDIA,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,,,XML,public,opus,16000,0,opus,16000,0,,vhost212.test.quality,188xxxxxxxx@172.xx.xx.xx,,ACTIVE,OutboundCall,188xxxxxxxx,SEND,46a38a98-0abe-11e7-9a47-bbab893a85bc,138xxxxxxxx,138xxxxxxxx,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,XML,public
【定位过程】
1. 从回显信息中可以看出该会话是由138xxxxxxxx发起的,被叫用户是188xxxxxxxx。根据之前的经验,可能原因是线程死锁、死循环
2. 为不影响测试环境正常使用,gcore了一个core文件进行调试
gdb freeswitch core.xxx
(gdb) thread apply all bt
3. 从众多线程中找到呼叫主线程(LWP 338)
Thread 47 (Thread 0x7f3c89c96700 (LWP 338)):
#0 0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
#1 0x00007f3d473230fb in do_sleep (t=<value optimized out>) at src/switch_time.c:173
#2 0x00007f3d472e7b35 in switch_ivr_multi_threaded_bridge (session=0x7f3c94063858, peer_session=0x7f3ce0095d68, input_callback=0x7f3c89c94b78, session_data=0x1, peer_session_data=0x0) at src/switch_ivr_bridge.c:1479
#3 0x00007f3cf8b921ae in audio_bridge_function (session=<value optimized out>, data=<value optimized out>) at mod_dptools.c:3311
#4 0x00007f3d472876f0 in switch_core_session_exec (session=0x7f3c94063858, application_interface=0xb35298, arg=0x7f3ce001ea38 "user/188xxxxxxxx@${domain_name}") at src/switch_core_session.c:2888
#5 0x00007f3d47287c42 in switch_core_session_execute_application_get_flags (session=0x7f3c94063858, app=0x7f3ce0030c48 "bridge", arg=0x7f3ce001ea38 "user/188xxxxxxxx@${domain_name}", flags=<value optimized out>)at src/switch_core_session.c:2758
#6 0x00007f3d4734070b in CoreSession::execute (this=0x7f3ce002c630, app=<value optimized out>, data=<value optimized out>) at src/switch_cpp.cpp:734
#7 0x00007f3c8a1fbe6f in _wrap_CoreSession_execute (L=0x7f3ce0006350) at mod_lua_wrap.cpp:6254
#8 0x00007f3c8a20b33d in luaD_precall (L=0x7f3ce0006350, func=<value optimized out>, nresults=0) at lua/ldo.c:318
#9 0x00007f3c8a2168d6 in luaV_execute (L=<value optimized out>) at lua/lvm.c:709
#10 0x00007f3c8a20b539 in luaD_call (L=0x7f3ce0006350, func=<value optimized out>, nResults=<value optimized out>, allowyield=0) at lua/ldo.c:395
#11 0x00007f3c8a20a738 in luaD_rawrunprotected (L=0x7f3ce0006350, f=0x7f3c8a206ad0 <f_call>, ud=0x7f3c89c95280) at lua/ldo.c:131
4. 查看该堆栈对应的代码段
从代码中可以看出该线程没有结束的原因是被叫通道peer_channel一直处于CS_EXCHANGE_MEDIA状态
5. 切到该线程,查看被叫所在的线程号
(gdb) thread 47
(gdb) frame 2
(gdb) p/x peer_session->thread_id
$3 = 0x7f3c89c1e700
找到b_leg所在的线程0x7f3c89c1e700
Thread 48 (Thread 0x7f3c89c1e700 (LWP 340)):
#0 0x0000003210e082ad in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f3d47352255 in apr_thread_join (retval=0x7f3c89c1da5c, thd=0x7f3d380ebad0) at threadproc/unix/thread.c:229
#2 0x00007f3d472e5864 in audio_bridge_thread (thread=<value optimized out>, obj=0x7f3d380eb898) at src/switch_ivr_bridge.c:650
#3 0x00007f3d472e6cfc in audio_bridge_on_exchange_media (session=0x7f3ce0095d68) at src/switch_ivr_bridge.c:723
#4 0x00007f3d4728b6f7 in switch_core_session_run (session=0x7f3ce0095d68) at src/switch_core_state_machine.c:538
#5 0x00007f3d4728645e in switch_core_session_thread (thread=<value optimized out>, obj=0x7f3ce0095d68) at src/switch_core_session.c:1606
#6 0x00007f3d47282e15 in switch_core_session_thread_pool_worker (thread=0x7f3d40027dc0, obj=<value optimized out>) at src/switch_core_session.c:1698
#7 0x0000003210e07a51 in start_thread () from /lib64/libpthread.so.0
#8 0x0000003210ae896d in clone () from /lib64/libc.so.6
6. 查看该堆栈对应的代码段
由代码可以看出,b_leg所在的线程一直挂住的原因是在等待子线程vid_thread的结束
7. 切换到该线程查找其对应的子线程号
(gdb) thread 48
[Switching to thread 48 (Thread 0x7f3c89c1e700 (LWP 340))]#0 0x0000003210e082ad in pthread_join () from /lib64/libpthread.so.0
(gdb) frame 1
#1 0x00007f3d47352255 in apr_thread_join (retval=0x7f3c89c1da5c, thd=0x7f3d380ebad0) at threadproc/unix/thread.c:229
229 threadproc/unix/thread.c: No such file or directory.in threadproc/unix/thread.c
(gdb) p/x *thd->td
$6 = 0x7f3c89ba6700
找到对应的子线程0x7f3c89ba6700
Thread 50 (Thread 0x7f3c89ba6700 (LWP 378)):
#0 0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
#1 0x00007f3d473230fb in do_sleep (t=<value optimized out>) at src/switch_time.c:173
#2 0x00007f3d472e40df in video_bridge_thread (thread=<value optimized out>, obj=0x7f3c89c1da00) at src/switch_ivr_bridge.c:89
#3 0x0000003210e07a51 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003210ae896d in clone () from /lib64/libc.so.6
8. 查看这部分堆栈对应的代码段,一个很简单的while循环
9. 切到该线程,查看内存中的值,判断是否是该while循环死循环了
(gdb) thread 50
[Switching to thread 50 (Thread 0x7f3c89ba6700 (LWP 378))]#0 0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
(gdb) bt full
#0 0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
No symbol table info available.
#1 0x00007f3d473230fb in do_sleep (t=<value optimized out>) at src/switch_time.c:173ts = {tv_sec = 0, tv_nsec = 1000000}
#2 0x00007f3d472e40df in video_bridge_thread (thread=<value optimized out>, obj=0x7f3c89c1da00) at src/switch_ivr_bridge.c:89vh = 0x7f3c89c1da00channel = 0x7f3ce00a4a50b_channel = 0x7f3c94061930status = <value optimized out>read_frame = 0x7f3ce00b48a0source = <value optimized out>b_source = 0x7f3ce00a34f8 "mod_sofia"__func__ = "video_bridge_thread"
#3 0x0000003210e07a51 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x0000003210ae896d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) frame 2
#2 0x00007f3d472e40df in video_bridge_thread (thread=<value optimized out>, obj=0x7f3c89c1da00) at src/switch_ivr_bridge.c:89
89 src/switch_ivr_bridge.c: No such file or directory.in src/switch_ivr_bridge.c
(gdb) p channel->state
$7 = CS_EXCHANGE_MEDIA
(gdb) p channel->state
$8 = CS_EXCHANGE_MEDIA
(gdb) p vh->up
$9 = -1
高亮部分显示,vh->up值是-1,明显不满足while循环判断条件,按理说这个线程应该结束了啊
10. 定位到这里,接下来一天就各种排查while循环中所用到的函数switch_channel_up_nosig、switch_channel_media_up、switch_core_session_write_video_frame、switch_cond_next,甚至一度怀疑clock_nanosleep有问题
11. 折腾半天没有结果以后,找了个中午趁大家都没有使用测试环境的时候,调试运行中的freeswitch
gdb
attach 26409
thread apply all bt
12. 由于thread id会变,但线程号LWP不会变,所以根据LWP378找到对应的线程
13. 切到该线程,打断点,锁定其他线程(多线程调试时需要锁定),单步执行
(gdb) thread 4
(gdb) break switch_ivr_bridge.c:89
(gdb) set scheduler-locking on
(gdb) n
14. 调试结果令人诧异,在while循环内部代码执行完89行后执行88行,陷入了死循环
15. 到这一步,前面的定位都能解释通了,因为这里死循环了,所以主被叫线程都一直挂着。但是这个地方怎么会死循环了,连个循环都没有
16. 无奈之下,反编译video_bridge_thread整个函数,看汇编指令(有一些没用的没贴出来)
(gdb) disassemble /m video_bridge_thread
Dump of assembler code for function video_bridge_thread:
50 in src/switch_ivr_bridge.c0x00007f3d472e3ec0 <+0>: push %r140x00007f3d472e3ec2 <+2>: push %r130x00007f3d472e3ec4 <+4>: push %r120x00007f3d472e3ec6 <+6>: push %rbp0x00007f3d472e3ec7 <+7>: push %rbx0x00007f3d472e3ec8 <+8>: mov %rsi,%rbx0x00007f3d472e3ecb <+11>: sub $0x20,%rsp…88 in src/switch_ivr_bridge.c0x00007f3d472e40c0 <+512>: mov 0x8(%rbx),%rdi0x00007f3d472e40c4 <+516>: mov 0x18(%rsp),%rsi0x00007f3d472e40c9 <+521>: xor %ecx,%ecx0x00007f3d472e40cb <+523>: xor %edx,%edx0x00007f3d472e40cd <+525>: callq 0x7f3d47251bb8 <switch_core_session_write_video_frame@plt>0x00007f3d472e40d2 <+530>: test %eax,%eax0x00007f3d472e40d4 <+532>: je 0x7f3d472e3f70 <video_bridge_thread+176>89 in src/switch_ivr_bridge.c
=> 0x00007f3d472e40da <+538>: callq 0x7f3d47252c28 <switch_cond_next@plt>0x00007f3d472e40df <+543>: nop0x00007f3d472e40e0 <+544>: jmp 0x7f3d472e40c0 <video_bridge_thread+512>0x00007f3d472e40e2 <+546>: nopw 0x0(%rax,%rax,1)…End of assembler dump.
从汇编的代码可以看出,执行完89行的switch_cond_next()以后,执行了一条空指令,然后又调回到88行了
17. 至此,问题定位结束,编译问题。重新编译后没有出现。
GDB定位诡异的FreeSWITCH编译问题相关推荐
- 麒麟系统开发笔记(十):在国产麒麟系统上使用gdb定位崩溃异常方法流程以及测试Demo
若该文为原创文章,转载请注明原文出处 本文章博客地址:https://hpzwl.blog.csdn.net/article/details/129858821 红胖子网络科技博文大全:开发技术集合( ...
- gdb源码下载及编译
(1) 编译平台: ubuntu 16.04LTS (2) 源码路径: Index of /gnu/gdb 我这里下载的版本是gdb-8.2.tar.gz (3) ...
- freeswitch编译安装,初探, 以及联合sipgateway, webrtc server的使用场景。
本文主要记录freeswitch学习过程. 一 安装freeswitch NOTE 以下两种安装方式,再安装的过程中遇到了不少问题,印象比较深刻的就是lua库找到不到这个问题.这个问题发生在make ...
- gdb定位Segmentation fault 问题
1. gcc编译参数加上 -g gcc -g test -o test.c 直接gdb环境下运行test,异常时会打印所在代码行号,或者按照一下方式 2. 命令打开core dump 功能 $ uli ...
- 根据内核Oops 定位代码工具使用— addr2line 、gdb、objdump
(这三种工具都在out/host/linux-x86目录下) 内核开发时有时候出现Oops,例如一个野指针会导致内核崩溃,如运行时出现以下log:现在有三种方法可以找出具体出现野指针的地方 [plai ...
- C++(3)--编译、gdb调试
3--编译和执行过程 1.编译 2.gdb调试 GCC是一个编译套件,是一个以"gcc"命令为首的源码施工队.施工队的成员有gcc.cpp.as.ld四个成员 预处理–宏定义展开, ...
- gcc编译以及Makefile与GDB调试
一:编译选项: gcc常用编译的选项: -c 表示编译源文件,只编译并生成目标文件. -E 只运行 C 预编译器. -o 表示输出目标文件 -g 表示在目标文件中产生调试信息, 用于 gd ...
- 如何快速定位程序Core?
导读:程序core是指应用程序无法保持正常running状态而发生的崩溃行为.程序core时会生成相关的core-dump文件,是程序崩溃时程序状态的数据备份.core-dump文件中包含内存.处理器 ...
- gdb core调试
原文链接 http://blog.163.com/lanka83/blog/static/32637615200801793020182/ http://blog.csdn.net/taina2008 ...
最新文章
- 利用NVIDIA NGC的TensorRT容器优化和加速人工智能推理
- Programmer of Practice Manual
- linux星期六字符,linux shell系列10 判断某个月中的星期六和星期天
- 接口传参为formData类型,实现文件/图片上传功能
- C语言的关键字 extern
- 灵魂拷问:用移位来代替除法运算真的效率高吗?Java 编译器到底有没有做除法优化?
- 机器学习(三十五)——Actor-Critic, Integrating Learning and Planning(1)
- 分组密码简介和五大分组模式
- matlab中circle函数_JavaScript碎片——函数闭包(模拟面向对象)
- A Brief Overview Of Vulkan API
- zabbix 2.2节点批量安装
- MSN Messenger去广告和其他修改方法
- js 正则中冒号代表什么_是否还在疑惑Vue.js中组件的data为什么是函数类型而不是对象类型...
- ibatis mysql 配置文件详解_Mybatis主配置文件的properties标签详解
- nginx启用reuseport
- Web:仿苹果官网首页HTML和CSS
- 安装与配置VMware虚拟机 https://www.vmware.com/cn/products/workstation-pro/workstation-pro-evaluation.html
- 一度智信:2021电商运营教程
- java String工具类/字符串工具类 StringUtil
- MYSQL误删除DELETE数据找回