【问题现象】

在freeswitch控制台中,使用show channels命令检查当前会话时,发现有部分垃圾数据(会话已经结束,但系统中还保留着该会话信息)

freeswitch@internal> show channels
uuid,direction,created,created_epoch,name,state,cid_name,cid_num,ip_addr,dest,application,application_data,dialplan,context,read_codec,read_rate,read_bit_rate,write_codec,write_rate,write_bit_rate,secure,hostname,presence_id,presence_data,callstate,callee_name,callee_num,callee_direction,call_uuid,sent_callee_name,sent_callee_num,initial_cid_name,initial_cid_num,initial_ip_addr,initial_dest,initial_dialplan,initial_context
46a38a98-0abe-11e7-9a47-bbab893a85bc,inbound,2017-03-1711:03:20,1489719800,sofia/internal/138xxxxxxxx@172.xx.xx.xx,CS_EXECUTE,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,bridge,user/188xxxxxxxx@172.xx.xx.xx,XML,public,opus,16000,0,opus,16000,0,,vhost212.test.quality,138xxxxxxxx@172.xx.xx.xx,,ACTIVE,OutboundCall,188xxxxxxxx,SEND,46a38a98-0abe-11e7-9a47-bbab893a85bc,OutboundCall,188xxxxxxxx,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,XML,public
46afc984-0abe-11e7-9a56-bbab893a85bc,outbound,2017-03-1711:03:20,1489719800,sofia/internal/188xxxxxxxx@172.xx.xx.xx,CS_EXCHANGE_MEDIA,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,,,XML,public,opus,16000,0,opus,16000,0,,vhost212.test.quality,188xxxxxxxx@172.xx.xx.xx,,ACTIVE,OutboundCall,188xxxxxxxx,SEND,46a38a98-0abe-11e7-9a47-bbab893a85bc,138xxxxxxxx,138xxxxxxxx,138xxxxxxxx,138xxxxxxxx,172.xx.xx.xx,188xxxxxxxx,XML,public

【定位过程】

1. 从回显信息中可以看出该会话是由138xxxxxxxx发起的,被叫用户是188xxxxxxxx。根据之前的经验,可能原因是线程死锁、死循环

2. 为不影响测试环境正常使用,gcore了一个core文件进行调试

gdb freeswitch core.xxx
(gdb) thread apply all bt

3. 从众多线程中找到呼叫主线程(LWP 338)

Thread 47 (Thread 0x7f3c89c96700 (LWP 338)):
#0  0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
#1  0x00007f3d473230fb in do_sleep (t=<value optimized out>) at src/switch_time.c:173
#2  0x00007f3d472e7b35 in switch_ivr_multi_threaded_bridge (session=0x7f3c94063858, peer_session=0x7f3ce0095d68, input_callback=0x7f3c89c94b78, session_data=0x1, peer_session_data=0x0) at src/switch_ivr_bridge.c:1479
#3  0x00007f3cf8b921ae in audio_bridge_function (session=<value optimized out>, data=<value optimized out>) at mod_dptools.c:3311
#4  0x00007f3d472876f0 in switch_core_session_exec (session=0x7f3c94063858, application_interface=0xb35298, arg=0x7f3ce001ea38 "user/188xxxxxxxx@${domain_name}") at src/switch_core_session.c:2888
#5  0x00007f3d47287c42 in switch_core_session_execute_application_get_flags (session=0x7f3c94063858, app=0x7f3ce0030c48 "bridge", arg=0x7f3ce001ea38 "user/188xxxxxxxx@${domain_name}", flags=<value optimized out>)at src/switch_core_session.c:2758
#6  0x00007f3d4734070b in CoreSession::execute (this=0x7f3ce002c630, app=<value optimized out>, data=<value optimized out>) at src/switch_cpp.cpp:734
#7  0x00007f3c8a1fbe6f in _wrap_CoreSession_execute (L=0x7f3ce0006350) at mod_lua_wrap.cpp:6254
#8  0x00007f3c8a20b33d in luaD_precall (L=0x7f3ce0006350, func=<value optimized out>, nresults=0) at lua/ldo.c:318
#9  0x00007f3c8a2168d6 in luaV_execute (L=<value optimized out>) at lua/lvm.c:709
#10 0x00007f3c8a20b539 in luaD_call (L=0x7f3ce0006350, func=<value optimized out>, nResults=<value optimized out>, allowyield=0) at lua/ldo.c:395
#11 0x00007f3c8a20a738 in luaD_rawrunprotected (L=0x7f3ce0006350, f=0x7f3c8a206ad0 <f_call>, ud=0x7f3c89c95280) at lua/ldo.c:131

4. 查看该堆栈对应的代码段

从代码中可以看出该线程没有结束的原因是被叫通道peer_channel一直处于CS_EXCHANGE_MEDIA状态

5. 切到该线程,查看被叫所在的线程号

(gdb) thread 47
(gdb) frame 2
(gdb) p/x peer_session->thread_id
$3 = 0x7f3c89c1e700

找到b_leg所在的线程0x7f3c89c1e700

Thread 48 (Thread 0x7f3c89c1e700 (LWP 340)):
#0  0x0000003210e082ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f3d47352255 in apr_thread_join (retval=0x7f3c89c1da5c, thd=0x7f3d380ebad0) at threadproc/unix/thread.c:229
#2  0x00007f3d472e5864 in audio_bridge_thread (thread=<value optimized out>, obj=0x7f3d380eb898) at src/switch_ivr_bridge.c:650
#3  0x00007f3d472e6cfc in audio_bridge_on_exchange_media (session=0x7f3ce0095d68) at src/switch_ivr_bridge.c:723
#4  0x00007f3d4728b6f7 in switch_core_session_run (session=0x7f3ce0095d68) at src/switch_core_state_machine.c:538
#5  0x00007f3d4728645e in switch_core_session_thread (thread=<value optimized out>, obj=0x7f3ce0095d68) at src/switch_core_session.c:1606
#6  0x00007f3d47282e15 in switch_core_session_thread_pool_worker (thread=0x7f3d40027dc0, obj=<value optimized out>) at src/switch_core_session.c:1698
#7  0x0000003210e07a51 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003210ae896d in clone () from /lib64/libc.so.6

6. 查看该堆栈对应的代码段

由代码可以看出,b_leg所在的线程一直挂住的原因是在等待子线程vid_thread的结束

7. 切换到该线程查找其对应的子线程号

(gdb) thread 48
[Switching to thread 48 (Thread 0x7f3c89c1e700 (LWP 340))]#0  0x0000003210e082ad in pthread_join () from /lib64/libpthread.so.0
(gdb) frame 1
#1  0x00007f3d47352255 in apr_thread_join (retval=0x7f3c89c1da5c, thd=0x7f3d380ebad0) at threadproc/unix/thread.c:229
229 threadproc/unix/thread.c: No such file or directory.in threadproc/unix/thread.c
(gdb) p/x *thd->td
$6 = 0x7f3c89ba6700

找到对应的子线程0x7f3c89ba6700

Thread 50 (Thread 0x7f3c89ba6700 (LWP 378)):
#0  0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
#1  0x00007f3d473230fb in do_sleep (t=<value optimized out>) at src/switch_time.c:173
#2  0x00007f3d472e40df in video_bridge_thread (thread=<value optimized out>, obj=0x7f3c89c1da00) at src/switch_ivr_bridge.c:89
#3  0x0000003210e07a51 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003210ae896d in clone () from /lib64/libc.so.6

8. 查看这部分堆栈对应的代码段,一个很简单的while循环

9. 切到该线程,查看内存中的值,判断是否是该while循环死循环了

(gdb) thread 50
[Switching to thread 50 (Thread 0x7f3c89ba6700 (LWP 378))]#0  0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
(gdb) bt full
#0  0x0000003211203f94 in clock_nanosleep () from /lib64/librt.so.1
No symbol table info available.
#1  0x00007f3d473230fb in do_sleep (t=<value optimized out>) at src/switch_time.c:173ts = {tv_sec = 0, tv_nsec = 1000000}
#2  0x00007f3d472e40df in video_bridge_thread (thread=<value optimized out>, obj=0x7f3c89c1da00) at src/switch_ivr_bridge.c:89vh = 0x7f3c89c1da00channel = 0x7f3ce00a4a50b_channel = 0x7f3c94061930status = <value optimized out>read_frame = 0x7f3ce00b48a0source = <value optimized out>b_source = 0x7f3ce00a34f8 "mod_sofia"__func__ = "video_bridge_thread"
#3  0x0000003210e07a51 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x0000003210ae896d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) frame 2
#2  0x00007f3d472e40df in video_bridge_thread (thread=<value optimized out>, obj=0x7f3c89c1da00) at src/switch_ivr_bridge.c:89
89  src/switch_ivr_bridge.c: No such file or directory.in src/switch_ivr_bridge.c
(gdb) p channel->state
$7 = CS_EXCHANGE_MEDIA
(gdb) p channel->state
$8 = CS_EXCHANGE_MEDIA
(gdb) p vh->up
$9 = -1

高亮部分显示,vh->up值是-1,明显不满足while循环判断条件,按理说这个线程应该结束了啊

10. 定位到这里,接下来一天就各种排查while循环中所用到的函数switch_channel_up_nosig、switch_channel_media_up、switch_core_session_write_video_frame、switch_cond_next,甚至一度怀疑clock_nanosleep有问题

11. 折腾半天没有结果以后,找了个中午趁大家都没有使用测试环境的时候,调试运行中的freeswitch

gdb
attach 26409
thread apply all bt

12. 由于thread id会变,但线程号LWP不会变,所以根据LWP378找到对应的线程

13. 切到该线程,打断点,锁定其他线程(多线程调试时需要锁定),单步执行

(gdb) thread 4
(gdb) break switch_ivr_bridge.c:89
(gdb) set scheduler-locking on
(gdb) n

14. 调试结果令人诧异,在while循环内部代码执行完89行后执行88行,陷入了死循环

15. 到这一步,前面的定位都能解释通了,因为这里死循环了,所以主被叫线程都一直挂着。但是这个地方怎么会死循环了,连个循环都没有

16. 无奈之下,反编译video_bridge_thread整个函数,看汇编指令(有一些没用的没贴出来)

(gdb) disassemble /m video_bridge_thread
Dump of assembler code for function video_bridge_thread:
50  in src/switch_ivr_bridge.c0x00007f3d472e3ec0 <+0>:   push   %r140x00007f3d472e3ec2 <+2>:  push   %r130x00007f3d472e3ec4 <+4>:  push   %r120x00007f3d472e3ec6 <+6>:  push   %rbp0x00007f3d472e3ec7 <+7>:  push   %rbx0x00007f3d472e3ec8 <+8>:  mov    %rsi,%rbx0x00007f3d472e3ecb <+11>:    sub    $0x20,%rsp…88    in src/switch_ivr_bridge.c0x00007f3d472e40c0 <+512>: mov    0x8(%rbx),%rdi0x00007f3d472e40c4 <+516>:  mov    0x18(%rsp),%rsi0x00007f3d472e40c9 <+521>: xor    %ecx,%ecx0x00007f3d472e40cb <+523>:   xor    %edx,%edx0x00007f3d472e40cd <+525>:   callq  0x7f3d47251bb8 <switch_core_session_write_video_frame@plt>0x00007f3d472e40d2 <+530>:   test   %eax,%eax0x00007f3d472e40d4 <+532>:   je     0x7f3d472e3f70 <video_bridge_thread+176>89    in src/switch_ivr_bridge.c
=> 0x00007f3d472e40da <+538>:    callq  0x7f3d47252c28 <switch_cond_next@plt>0x00007f3d472e40df <+543>:    nop0x00007f3d472e40e0 <+544>:    jmp    0x7f3d472e40c0 <video_bridge_thread+512>0x00007f3d472e40e2 <+546>: nopw   0x0(%rax,%rax,1)…End of assembler dump.

从汇编的代码可以看出,执行完89行的switch_cond_next()以后,执行了一条空指令,然后又调回到88行了

17. 至此,问题定位结束,编译问题。重新编译后没有出现。

GDB定位诡异的FreeSWITCH编译问题相关推荐

  1. 麒麟系统开发笔记(十):在国产麒麟系统上使用gdb定位崩溃异常方法流程以及测试Demo

    若该文为原创文章,转载请注明原文出处 本文章博客地址:https://hpzwl.blog.csdn.net/article/details/129858821 红胖子网络科技博文大全:开发技术集合( ...

  2. gdb源码下载及编译

    (1)        编译平台: ubuntu 16.04LTS (2)        源码路径: Index of /gnu/gdb 我这里下载的版本是gdb-8.2.tar.gz (3)      ...

  3. freeswitch编译安装,初探, 以及联合sipgateway, webrtc server的使用场景。

    本文主要记录freeswitch学习过程. 一 安装freeswitch NOTE 以下两种安装方式,再安装的过程中遇到了不少问题,印象比较深刻的就是lua库找到不到这个问题.这个问题发生在make ...

  4. gdb定位Segmentation fault 问题

    1. gcc编译参数加上 -g gcc -g test -o test.c 直接gdb环境下运行test,异常时会打印所在代码行号,或者按照一下方式 2. 命令打开core dump 功能 $ uli ...

  5. 根据内核Oops 定位代码工具使用— addr2line 、gdb、objdump

    (这三种工具都在out/host/linux-x86目录下) 内核开发时有时候出现Oops,例如一个野指针会导致内核崩溃,如运行时出现以下log:现在有三种方法可以找出具体出现野指针的地方 [plai ...

  6. C++(3)--编译、gdb调试

    3--编译和执行过程 1.编译 2.gdb调试 GCC是一个编译套件,是一个以"gcc"命令为首的源码施工队.施工队的成员有gcc.cpp.as.ld四个成员 预处理–宏定义展开, ...

  7. gcc编译以及Makefile与GDB调试

    一:编译选项:       gcc常用编译的选项: -c 表示编译源文件,只编译并生成目标文件. -E 只运行 C 预编译器. -o 表示输出目标文件 -g 表示在目标文件中产生调试信息, 用于 gd ...

  8. 如何快速定位程序Core?

    导读:程序core是指应用程序无法保持正常running状态而发生的崩溃行为.程序core时会生成相关的core-dump文件,是程序崩溃时程序状态的数据备份.core-dump文件中包含内存.处理器 ...

  9. gdb core调试

    原文链接 http://blog.163.com/lanka83/blog/static/32637615200801793020182/ http://blog.csdn.net/taina2008 ...

最新文章

  1. 利用NVIDIA NGC的TensorRT容器优化和加速人工智能推理
  2. Programmer of Practice Manual
  3. linux星期六字符,linux shell系列10 判断某个月中的星期六和星期天
  4. 接口传参为formData类型,实现文件/图片上传功能
  5. C语言的关键字 extern
  6. 灵魂拷问:用移位来代替除法运算真的效率高吗?Java 编译器到底有没有做除法优化?
  7. 机器学习(三十五)——Actor-Critic, Integrating Learning and Planning(1)
  8. 分组密码简介和五大分组模式
  9. matlab中circle函数_JavaScript碎片——函数闭包(模拟面向对象)
  10. A Brief Overview Of Vulkan API
  11. zabbix 2.2节点批量安装
  12. MSN Messenger去广告和其他修改方法
  13. js 正则中冒号代表什么_是否还在疑惑Vue.js中组件的data为什么是函数类型而不是对象类型...
  14. ibatis mysql 配置文件详解_Mybatis主配置文件的properties标签详解
  15. nginx启用reuseport
  16. Web:仿苹果官网首页HTML和CSS
  17. 安装与配置VMware虚拟机 https://www.vmware.com/cn/products/workstation-pro/workstation-pro-evaluation.html
  18. 一度智信:2021电商运营教程
  19. java String工具类/字符串工具类 StringUtil
  20. MYSQL误删除DELETE数据找回

热门文章

  1. 挖掘行业长尾关键字以及词库的步骤
  2. iview 的modal内form表单校验的坑
  3. 腾讯云服务器linux+CentOS7.9+yum源+nginx搭建网站
  4. Java 每年节假日获取
  5. 为何2020年,生鲜电商领域会迎来市场的大爆发?
  6. 【大咖发声】如何写出好程序?
  7. 计算机软件优化,如何优化计算机软件系统
  8. 共线条件方程c语言程序,云南师范大学《C语言》期末考试样卷及答案
  9. Java第十一天笔记01——多线程编程
  10. Django No module named 'xxx'解决