问题背景

对项目中的关键应用调用链日志需要结构化得统一吐出到mongodb中,同时项目中日志输出使用log4j,故准备使用log4j的Appender直接集成mongodb的输出,同时mongodb采用集群模式。因为对于日志有特殊结构化要求,所以没有使用log4mongo库。

mongodb驱动版本:3.5.0

log4j版本:1.2.17

问题现象

error和warn级别的日志输出没有任何问题,但是当输出info级别日志的时候,保存失败,进程被阻塞,过一阵子后控制台报出如下:

Exception in thread "main" com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=UNKNOWN, servers=[{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING}]

排查过程

首先查看mongodb进程和端口全部正常,并且切回warn和error级别再次输出后,一切正常,mongodb数据成功插入,排除了mongodb进程的原因。此步直接确定是log4j与mongodb集成驱动的原因

再次执行info级别输出,在进程无响应的时候,通过jstack拿取线程栈,发现尽然后线程阻塞引发的死锁。如下:main线程锁住了0x0000000797b8f4d0,而cluster-ClusterId再等待该锁的释放。

"cluster-ClusterId{value='59a3c74a936270211a375fc6', description='null'}-127.0.0.1:27017" #12 daemon prio=5 os_prio=31 tid=0x00007f97c2263800 nid=0x570f waiting for monitor entry [0x0000700001452000]

java.lang.Thread.State: BLOCKED (on object monitor)

at org.apache.log4j.Category.callAppenders(Category.java:202)

- waiting to lock <0x0000000797b8f4d0> (a org.apache.log4j.spi.RootLogger)

at org.apache.log4j.Category.forcedLog(Category.java:388)

at org.apache.log4j.Category.log(Category.java:853)

at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:300)

at com.mongodb.diagnostics.logging.SLF4JLogger.info(SLF4JLogger.java:71)

at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:110)

at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:111)

- locked <0x0000000797b01148> (a com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable)

"main" #1 prio=5 os_prio=31 tid=0x00007f97c202f000 nid=0x1003 waiting on condition [0x0000700000182000]

java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for <0x0000000797b03588> (a java.util.concurrent.CountDownLatch$Sync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)

at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)

at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:114)

at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75)

at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)

at com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)

at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:413)

at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)

at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)

at com.mongodb.Mongo.execute(Mongo.java:819)

at com.mongodb.Mongo$2.execute(Mongo.java:802)

at com.mongodb.MongoCollectionImpl.executeSingleWriteRequest(MongoCollectionImpl.java:550)

at com.mongodb.MongoCollectionImpl.insertOne(MongoCollectionImpl.java:317)

at com.mongodb.MongoCollectionImpl.insertOne(MongoCollectionImpl.java:307)

at com.test.mongo.core.MongoLogAgent.saveLog(MongoLogAgent.java:52)

at com.test.log.log4j.MongoAppender.append(MongoAppender.java:86)

at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)

- locked <0x0000000797b8f3e8> (a com.test.log.log4j.MongoAppender)

at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:65)

at org.apache.log4j.Category.callAppenders(Category.java:203)

- locked <0x0000000797b8f4d0> (a org.apache.log4j.spi.RootLogger)

at org.apache.log4j.Category.forcedLog(Category.java:388)

at org.apache.log4j.Category.log(Category.java:853)

3.测试代码及log4j.properties配置

private final static Logger logger = LoggerFactory.getLogger(Test.class);

public static void main(String[] args) throws Exception {

logger.info("测试",new Exception("出错了"));

}

log4j.rootLogger=info,console,file,mongo

log4j.appender.mongo=com.test.log.log4j.MongoAppender

PS:mongo就是自己实现的Appender,因为没有实质性内容,就不再贴出,主要就是拿到日志后开始写入mongodb

4.既然找到了问题原因,直接看源码分析

log4j的Category对象中,会迭代循环Category及父节点,拿到之后对该对象进行加锁。synchronized(c)就是造成了上述线程栈中main线程(- locked <0x0000000797b8f4d0> (a org.apache.log4j.spi.RootLogger)的原因)

AppenderAttachableImpl aai;

//中间代码省略

public void callAppenders(LoggingEvent event) {

int writes = 0;

for(Category c = this; c != null; c = c.parent) {

synchronized(c) {

if(c.aai != null) {

writes += c.aai.appendLoopOnAppenders(event);

}

if(!c.additive) {

break;

}

}

}

if(writes == 0) {

this.repository.emitNoAppenderWarning(this);

}

}

根据log4j.properties的配置,MongoAppender对象,会在Log4j读取配置文件的时候加入到Category的aai对象中,此处appendLoopOnAppenders方法内就是循环执行console,file,mongo三个Appender的写方法,用来记录日志,其中MongoAppender中通过获取MongoClient进行insertDocument操作。

那cluster-ClusterId线程又是干什么的呢?

这个线程就是mongodb客户端驱动用来启动并且监听Socket链接并且建立连接池的一个独立线程,mongodb的客户端驱动在传入多个mongodb服务端ip并创建MongoClient对象时会构造MultiServerCluster对象及DefaultServer对象,该对象会绑定一个缺省的监听对象DefaultServerMonitor,DefaultServerMonitor对象在构造函数中,会初始化一个线程,这个线程就是真正建立mongodb数据库链接的地方。而DefaultServer对象在构造函数中负责启动该线程。当建立好链接后会将链接放入池中

DefaultServer(final ServerId serverId, final ClusterConnectionMode clusterConnectionMode, final ConnectionPool connectionPool,

final ConnectionFactory connectionFactory, final ServerMonitorFactory serverMonitorFactory,

final ServerListener serverListener, final CommandListener commandListener) {

this.serverListener = notNull("serverListener", serverListener);

this.commandListener = commandListener;

notNull("serverAddress", serverId);

notNull("serverMonitorFactory", serverMonitorFactory);

this.clusterConnectionMode = notNull("clusterConnectionMode", clusterConnectionMode);

this.connectionFactory = notNull("connectionFactory", connectionFactory);

this.connectionPool = notNull("connectionPool", connectionPool);

this.serverStateListener = new DefaultServerStateListener();

this.serverId = serverId;

serverListener.serverOpening(new ServerOpeningEvent(this.serverId));

description = ServerDescription.builder().state(CONNECTING).address(serverId.getAddress()).build();

//构造并启动监控连接线程

serverMonitor = serverMonitorFactory.create(serverStateListener);

serverMonitor.start();

}

//无关代码已注释

class DefaultServerMonitor implements ServerMonitor {

DefaultServerMonitor(final ServerId serverId, final ServerSettings serverSettings,

final ChangeListener serverStateListener,

final InternalConnectionFactory internalConnectionFactory, final ConnectionPool connectionPool) {

this.serverSettings = serverSettings;

this.serverId = serverId;

this.serverMonitorListener = getServerMonitorListener(serverSettings);

this.serverStateListener = serverStateListener;

this.internalConnectionFactory = internalConnectionFactory;

this.connectionPool = connectionPool;

monitor = new ServerMonitorRunnable();

monitorThread = new Thread(monitor, "cluster-" + this.serverId.getClusterId() + "-" + this.serverId.getAddress());

monitorThread.setDaemon(true);

isClosed = false;

}

//一个内部类的线程,用于建立连接

class ServerMonitorRunnable implements Runnable {

private final ExponentiallyWeightedMovingAverage averageRoundTripTime = new ExponentiallyWeightedMovingAverage(0.2);

@Override

@SuppressWarnings("unchecked")

public synchronized void run() {

InternalConnection connection = null;

try {

ServerDescription currentServerDescription = getConnectingServerDescription(null);

while (!isClosed) {

ServerDescription previousServerDescription = currentServerDescription;

try {

if (connection == null) {

connection = internalConnectionFactory.create(serverId);

try {

connection.open();

} catch (Throwable t) {

connection = null;

throw t;

}

}

try {

currentServerDescription = lookupServerDescription(connection);

} catch (MongoSocketException e) {

connectionPool.invalidate();

connection.close();

connection = null;

connection = internalConnectionFactory.create(serverId);

try {

connection.open();

} catch (Throwable t) {

connection = null;

throw t;

}

try {

currentServerDescription = lookupServerDescription(connection);

} catch (MongoSocketException e1) {

connection.close();

connection = null;

throw e1;

}

}

} catch (Throwable t) {

averageRoundTripTime.reset();

currentServerDescription = getConnectingServerDescription(t);

}

if (!isClosed) {

try {

logStateChange(previousServerDescription, currentServerDescription);

serverStateListener.stateChanged(new ChangeEvent(previousServerDescription,

currentServerDescription));

} catch (Throwable t) {

LOGGER.warn("Exception in monitor thread during notification of server description state change", t);

}

waitForNext();

}

}

} finally {

if (connection != null) {

connection.close();

}

}

}

}

}

可以看到mongodb驱动客户端是通过独立线程来建立连接的,而main线程中是操作的MongoClient对象,当主线程运行较快,MongoClient在执行insert获取不到连接时,会在BaseCluster类的selectServer方法中循环等待是否可以建立连接,若循环多次后时间超过了maxWaitTimeNanos,则直接抛出MongoTimeoutException,这就是为什么造成了超时现象,而mongodb驱动将真实原因屏蔽的地方,这样有个好处就是不会造成线程僵死,影响应用系统,但同时对人造成了误导。

public Server selectServer(final ServerSelector serverSelector) {

isTrue("open", !isClosed());

try {

CountDownLatch currentPhase = phase.get();

ClusterDescription curDescription = description;

ServerSelector compositeServerSelector = getCompositeServerSelector(serverSelector);

Server server = selectRandomServer(compositeServerSelector, curDescription);

boolean selectionFailureLogged = false;

long startTimeNanos = System.nanoTime();

long curTimeNanos = startTimeNanos;

long maxWaitTimeNanos = getMaxWaitTimeNanos();

while (true) {

throwIfIncompatible(curDescription);

if (server != null) {

return server;

}

if (curTimeNanos - startTimeNanos > maxWaitTimeNanos) {

throw createTimeoutException(serverSelector, curDescription);

}

if (!selectionFailureLogged) {

logServerSelectionFailure(serverSelector, curDescription);

selectionFailureLogged = true;

}

connect();

currentPhase.await(Math.min(maxWaitTimeNanos - (curTimeNanos - startTimeNanos), getMinWaitTimeNanos()), NANOSECONDS);

curTimeNanos = System.nanoTime();

currentPhase = phase.get();

curDescription = description;

server = selectRandomServer(compositeServerSelector, curDescription);

}

} catch (InterruptedException e) {

throw new MongoInterruptedException(format("Interrupted while waiting for a server that matches %s", serverSelector), e);

}

}

死锁的产生

因为在log4j.propertis配置中,root上配置了所有的Appender。通过线程dump的分析和源码的解读,很容易发现是因为main线程在执行mongodb的Appender是,获取到的是rootLogger的锁,同时,BaseCluster类中每一次获取链接时利用了CountDownLatch方式将线程挂起,导致了主线程的等待,但此时main线程的锁并没有释放。而cluster-ClusterId线程中执行的mongodb代码中,通过slf4j也有Info级别的日志输出,而此时在执行Appender时,获取到的还是rootLogger对象,并又去申请该锁,因为是两个不同的线程,造成了cluster-ClusterId线程的阻塞

问题的解决

将mongodb包中的logger对象指向变为不是rootLogger即可,这样,cluster-ClusterId线程在获取Category对象锁时,与main线程已经申请到的对象锁不是同一个即可。

log4j.rootLogger=debug,console,file,mongo

#此配置是让org.mongodb的logger不继承父对象级别,即root,

log4j.additivity.org.mongodb=false

log4j.logger.org.mongodb=info,console

通过以上配置,所有org.mongodb包的日志,都会使用一个新的Logger对象,该对象只绑定了console一个Appender,此Logger对象的parent节点是rootLogger对象

AppenderAttachableImpl aai;

//中间代码省略

public void callAppenders(LoggingEvent event) {

int writes = 0;

for(Category c = this; c != null; c = c.parent) {

//cluster-ClusterId线程中,此时加锁对象即为新的logger对象

synchronized(c) {

if(c.aai != null) {

writes += c.aai.appendLoopOnAppenders(event);

}

if(!c.additive) {

break;

}

}

}

if(writes == 0) {

this.repository.emitNoAppenderWarning(this);

}

}

总结

网上也发现有log4j的死锁问题,其实跟多线程并发没什么关系,主要是一个线程在输出日志时,启动了另一个线程,但此时主线程又不释放锁,而子线程又去申请该锁导致的。

此时若是同一个线程,在synchronized(c)时,直接就是重入锁,不会产生死锁阻塞的问题。

mongodb java 日志分析_记一次log4j与mongodb集成引发的问题分析相关推荐

  1. java日志使用_使用Java 8防止日志过宽

    java日志使用 一些日志将被机器消耗并永久保存. 其他日志仅用于调试和供人类使用. 在后一种情况下,您通常要确保您不会产生太多的日志,尤其是不会产生太宽的日志,因为一旦行长超过一定大小( 例如,此E ...

  2. jmeter性能分析_使用JMeter和Yourkit进行REST / HTTP服务的性能分析

    jmeter性能分析 我的上一篇文章描述了如何使用JMeter完成异步REST / HTTP服务的压力测试或负载测试. 但是,运行这样的测试通常表明被测系统不能很好地应对增加的负载. 现在的问题是如何 ...

  3. mongodb java 日志分析_如何禁用mongoDB Java驱动程序日志记录?

    我正在尝试禁用的日志输出mongo-java-driver-3.0.0. 在加载mongo驱动程序之前,我曾尝试在应用程序的开头设置这些设置,但没有帮助. // Enable MongoDB logg ...

  4. java mysql死锁_记一次线上mysql死锁分析(一)

    记录一次比较诡异的mysql死锁日志.系统运行几个月来,就在前几天发生了一次死锁,而且就只发生了一次死锁,整个排查过程耗时将近一天,最后感谢我们的DBA大神和老大一起分析找到原因. 诊断死锁 借助于我 ...

  5. jvm 崩溃日志设置_记一次JVM疯狂GC导致CPU飙高的问题解决

    背景 线上web服务器不时的出现非常卡的情况,登录服务器top命令发现服务器CPU非常的高,重启tomcat之后CPU恢复正常,半天或者一天之后又会出现同样的问题.解决问题首先要找到问题的爆发点,对于 ...

  6. java 日志乱码_【开发者成长】JAVA 线上故障排查完整套路!

    云栖号资讯:[点击查看更多行业资讯] 在这里您可以找到不同行业的第一手的上云资讯,还在等什么,快来! 线上故障主要会包括 CPU.磁盘.内存以及网络问题,而大多数故障可能会包含不止一个层面的问题,所以 ...

  7. java 头尾 队列_记一次腾讯面试,我挂在了最熟悉不过的队列上

    前记 面试官问:你了解队列和链表的区别吗? 我:了解,blabla 面试官又问:你能自己实现队列吗?具体讲讲怎么实现? 我当时说了用链表来实现队列的存储,并实现push和pop的操作,但回答的不具体, ...

  8. java 日志 生成器_自动生成 java 测试 mock 对象框架 DataFactory-01-入门使用教程

    项目简介 Data-Factory 用于根据对象,随机自动生成初始化信息,避免了手动创建对象的繁琐,便于测试. 特性 8 大基本类型的支持 String.Date.金额,日期等常见类型的支持 java ...

  9. logback修改日志内容_记一次Logstash日志丢失问题

    某系统日志架构是在项目中通过配置logback.xml配置双写 写本地日志文件 写到远程logstash 本地没有问题,有问题是logstash,在Kibana上看到有些日志没显示. 开始是怀疑是不是 ...

最新文章

  1. C#弹窗提示输入密码
  2. python装饰器实现aop_python中面向切片编程(AOP)和装饰器
  3. char[],char *,string之间转换
  4. 判断N!阶乘中末尾0的个数
  5. Qt中moc问题(qt moc 处理 cpp)
  6. linux中postscript如何生成,【转载】如何为Linux生成和打上patch
  7. 14.cookie与自动登陆
  8. linux内核源码目录分析
  9. 如何在Windows上安装Angular:Angular CLI,Node.js和构建工具指南
  10. SQL Sever2008 无法启动
  11. OCR 脱机手写汉字识别与印刷汉字识别
  12. java基础练习题及答案_java基础测试题含答案.docx
  13. 升级到ckplayerX
  14. mysql jdbc dao_MYSQL 之 JDBC(九):增删改查(七)DAO的补充和重构
  15. 新版itunes添加铃声
  16. 总结:linux笔记-001
  17. 爬取钉钉在B站卑微道歉视频弹幕,做成词云
  18. js 对象的键名 可以是数字么
  19. http-invoker插件原理解析
  20. 知识图谱评价指标的学习笔记——MAP,MRR

热门文章

  1. 硬件专业化和软件映射的敏捷框架
  2. MegEngine计算图、MatMul优化解析
  3. Error:org.gradle.api.internal.tasks.DefaultTaskInputs$TaskInputUnionFileCollection cannot be cast to
  4. greendao3出现 Error:Execution failed for task ':app:greendao'
  5. ClassCastException
  6. django-debug-toolbar使用指南
  7. php substr cnblog,php中substr用法示例
  8. Python2 与 Python3 共存,pip2 与 pip3 共存切换设置
  9. Glide二次封装库的使用
  10. try-catch-finally对返回值的影响