原文:ibv_get_async_event() - RDMAmojo RDMAmojo

描述

ibv_get_async_event() 读取 RDMA 设备上下文context的下一个异步事件。
在调用 ibv_open_device() 之后,所有异步事件都被加入到这个上下文中,并且调用 ibv_get_async_event() 将按照它们的顺序一个一个地读取它们。即使 ibv_get_async_event() 在事件生成后很长时间被调用,它仍然会首先读取较旧的事件。不幸的是,事件没有任何时间概念,用户无法知道事件何时发生。

默认情况下,ibv_get_async_event() 是一个阻塞函数,如果没有任何异步事件要读取,它会等到下一个事件生成。拥有一个等待下一个事件发生的专用线程会更好。但是,如果希望以非阻塞方式读取事件,则可以这样做。可以使用 fcntl() 将设备上下文中事件文件的文件描述符配置为非阻塞,然后使用 read()/poll()/epoll()/select() 读取此文件描述符,以便确定是否有等待读取的事件。在这篇文章中有一个关于如何做的例子。

调用 ibv_get_async_event() 是原子的,即使它在多个线程中调用,也可以保证同一事件不会被多个线程读取。
使用 ibv_get_async_event() 接收到的每个事件都必须使用 ibv_ack_async_event() 进行确认。
这是结构 ibv_async_event 的完整描述:

Name Description
element A union of several fields that only one of them is valid, depends on the event type:

CQ events: element.cq is valid

QP events: element.qp is valid

SRQ events: element.srq is valid

Port events: element.port_num is valid

RDMA device events: no field is valid

event_type Enumerated value which described the type of the event

Here is a full description of the possible events:

QP events

Here is the description of the affiliated events that may occur for QPs. For those events, the field event->element.qp contains the handle of the QP that got this asynchronous event. Those events will be generated only in the context of the code that this QP belongs to.

IBV_EVENT_COMM_EST

A QP which its state is IBV_QPS_RTR received the first packet in its Receive Queue and it was processed without any error.

This event is mainly relevant only in connection oriented QPs, i.e. RC and UC QPs. It may happen for UD QP as well, it is driver implementation specific.

IBV_EVENT_SQ_DRAINED

A QP, which its state was changed from IBV_QPS_RTS to IBV_QPS_SQD, completed sending all of the outstanding messages in progress in its Send Queue when the state change was requested. For RC QP, this means that all of those messages received acknowledgments, if applicable.

Most of the time, this event will be generated when the (internal) QP state will be changed from SQD.draining to SQD.drained. However, this event may be also generated if the transition to the state IBV_QPS_SQD was aborted because of a transition (either by the RDMA device or by the user) into the  IBV_QPS_SQEIBV_QPS_ERR or IBV_QPS_RESET QP states.

After this event, and the QP is in the IBV_QPS_SQD state it is safe to the user to start modifying the Send Queue attributes send there aren't any message send in progress.

IBV_EVENT_PATH_MIG

Indicates the connection has migrated to the alternate path. This event is relevant only to connection oriented QPs, i.e. RC and UC QPs.

This means that the alternate path attributes are now being used as the primary path attributes. If it is required that there will be another alternate path attribute loaded, the user can now set those attributes.

IBV_EVENT_QP_LAST_WQE_REACHED

A QP, which is associated with an SRQ, was transitioned to the IBV_QPS_ERR state, either automatically by the RDMA device or explicitly by the user, and one of the following occurred:

  • A completion with error was generated for the last WQE
  • The QP transitioned to the IBV_QPS_ERR state and there are no more WQEs on Receive Queue of that QP

This event actually means that WQEs won't be consumed anymore from the SRQ by this QP.

If there was an error to a QP and this event wasn't generated, the user must destroy all of the QPs that are associated with this SRQ and the SRQ itself in order to reclaim all of the WQEs associated with that QP.

IBV_EVENT_QP_FATAL

A QP experienced an error that prevents the generation of completions while accessing or processing the Work Queue, either Send or Receive Queue.

If the problem that caused this event is in the CQ of that Work Queue, the appropriate CQ will get the IBV_EVENT_CQ_ERR event too.

IBV_EVENT_QP_REQ_ERR

The transport layer of the RDMA device detected a transport error violation in the responder side. This error may be one of the following:

  • Unsupported or reserved opcode
  • Out of sequence opcode

Those errors are rare and may happen when there are problems in the subnet or when an RDMA device sends illegal packets.

When this happens, the QP is being transitioned automatically to the IBV_QPS_ERR state by the RDMA device.

This event is relevant only to RC QPs.

IBV_EVENT_QP_ACCESS_ERR

The transport layer of the RDMA device detected a request error violation in the responder side. This error may be one of the following:

  • Misaligned atomic request
  • Too many RDMA Read or Atomic requests
  • R_Key violation
  • Length errors without immediate data

Those errors are usually happening due to bugs in the user code.

When this happens, the QP is being transitioned automatically to the IBV_QPS_ERR state by the RDMA device.

This event is relevant only to RC QPs.

IBV_EVENT_PATH_MIG_ERR

A QP that has an alternate path attributes loaded tried to perform a path migration change, either by the RDMA device or explicitly by the user, and there was an error that prevented from moving to that alternate path.

This error usually can happen if the alternate path attributes in both sides aren't consistent.

CQ events

Here is the description of the affiliated events that may occur for CQs. For those events, the field event->element.cq contains the handle of the CQ that got this asynchronous event. Those events will be generated only in the context of the code that this CQ belongs to.

IBV_EVENT_CQ_ERR

An error occurred when writing a completion to the CQ. This event may occur when there is a protection error (a rare condition) or when there is a CQ overrun (most likely)

When the CQ has an error, it isn't guaranteed that completions from that CQ can be pulled. All of the QPs that are associated with this CQ, either in their RQ or in their SQ will get the IBV_EVENT_QP_FATAL event too.

SRQ events

Here is the description of the affiliated events that may occur for SRQs. For those events, the field event->element.srq contains the handle of the SRQ that got this asynchronous event. Those events will be generated only in the context of the code that this SRQ belongs to.

IBV_EVENT_SRQ_LIMIT_REACHED

A SRQ which was armed and the number of RR in that SRQ dropped below the limit value of that SRQ. When this event is being generated, the limit value of the SRQ will be set to zero.

Most likely that when this event happens, the user will post more RRs to that SRQ and rearm the SRQ again.

IBV_EVENT_SRQ_ERR

An error occurred that prevents from the RDMA device from dequeuing RRs from that SRQ and reporting of receive completions.

If an SRQ experience an error, all of the QPs, which are associated with this SRQ, will be transitioned to IBV_QPS_ERR state and the IBV_EVENT_QP_FATAL asynchronous event will be generated for them.

Port events

Here is the description of the unaffiliated events that may occur for RDMA device ports. For those events, the field event->element.port_num contains the number of the port that got this asynchronous event. Those events will be generated for all of the contexts that use the RDMA device that its port got the events.

IBV_EVENT_PORT_ACTIVE

The link becomes active and it now available to send/receive packets.

The port_attr.state is was in one of the following states: IBV_PORT_DOWNIBV_PORT_INITIBV_PORT_ARMED and it moved to one of the following states IBV_PORT_ACTIVE or IBV_PORT_ACTIVE_DEFER. This can happen when the SM configures the port.

This event will be generated by the device only if IBV_DEVICE_PORT_ACTIVE_EVENT is set in dev_cap.device_cap_flags.

IBV_EVENT_LID_CHANGE

LID was changed on a port by the SM. If this is not the first time that the SM configures the port LID, this may indicate that there is a new SM in the subnet, or the SM reconfigures the subnet. QPs which send/receive data may experience connection failures (if the LIDs in the subnet were changed).

IBV_EVENT_PKEY_CHANGE

P_Key table was changed on a port by the SM. Since QPs are using P_Key table indexes rather than absolute values, it is suggested for the client to check that the P_Key indexes which his QPs use weren't changed.

IBV_EVENT_GID_CHANGE

GID table was changed on a port by the SM. Since QPs are using GID table indexes rather than absolute values (as the source GID), it is suggested for the client to check that the GID indexes which his QPs use weren't changed.

IBV_EVENT_SM_CHANGE

There is a new SM in the subnet which port belongs to and the client should reregister to all subscriptions previously requested from this port, for example (but not limited to) join a multicast group.

IBV_EVENT_CLIENT_REREGISTER

The SM requests that the client will reregister to all subscriptions previously requested from this port, for example (but not limited to) join a multicast group. This event may be generated when the SM suffered from a failure, which caused it to lose his records or when there is new SM in the subnet.

This event will be generated by the device only if the bit that indicates that client reregister is supported set in port_attr.port_cap_flags.

IBV_EVENT_PORT_ERR

The link becomes inactive and it now unavailable to send/receive packets.

The port_attr.state is was in either IBV_PORT_ACTIVE or IBV_PORT_ACTIVE_DEFER states and it moved to one of the following states: IBV_PORT_DOWNIBV_PORT_INITIBV_PORT_ARMED. This can happen when the there are problems with the link (for example: the cable was removed).

This will not affect the QPs, which are associated with this port, states. Although if they are reliable and tries to send data, they may experience retry exceeded.

Device events

Here are the unaffiliated events that may occur in RDMA devices. Those events will be generated for all of the contexts that use the RDMA device that got the events.

IBV_EVENT_DEVICE_FATAL

The RDMA device suffered from an error which isn't related to one of the above asynchronous events. When this event occurs, the behavior of the RDMA device isn't determined and it is highly recommended to close the process immediately since the attempt to destroy the RDMA resources may fail.

概要

下表总结了异步事件的行为:

Event name Element type Event type Protocol
IBV_EVENT_COMM_EST QP Info IB, RoCE
IBV_EVENT_SQ_DRAINED QP Info IB, RoCE
IBV_EVENT_PATH_MIG QP Info IB, RoCE
IBV_EVENT_QP_LAST_WQE_REACHED QP Info IB, RoCE
IBV_EVENT_QP_FATAL QP Error IB, RoCE, iWARP
IBV_EVENT_QP_REQ_ERR QP Error IB, RoCE, iWARP
IBV_EVENT_QP_ACCESS_ERR QP Error IB, RoCE, iWARP
IBV_EVENT_PATH_MIG_ERR QP Error IB, RoCE
IBV_EVENT_CQ_ERR CQ Error IB, RoCE, iWARP
IBV_EVENT_SRQ_LIMIT_REACHED SRQ Info IB, RoCE, iWARP
IBV_EVENT_SRQ_ERR SRQ Error IB, RoCE, iWARP
IBV_EVENT_PORT_ACTIVE Port Info IB, RoCE, iWARP
IBV_EVENT_LID_CHANGE Port Info IB
IBV_EVENT_PKEY_CHANGE Port Info IB
IBV_EVENT_GID_CHANGE Port Info IB, RoCE
IBV_EVENT_SM_CHANGE Port Info IB
IBV_EVENT_CLIENT_REREGISTER Port Info IB
IBV_EVENT_PORT_ERR Port Error IB, RoCE, iWARP
IBV_EVENT_DEVICE_FATAL Device Error IB, RoCE, iWARP

 参数

Name Direction Description
context in

从 ibv_open_device() 返回的 RDMA 设备上下文

event out 发生的异步事件

返回值 

Value Description
0 On success
-1
If blocking mode: there is an error
If non-blocking mode: there isn't any async event to read

例子 

1)读取异步事件(以阻塞方式)并打印其上下文:

/* helper function to print the content of the async event */
static void print_async_event(struct ibv_context *ctx,struct ibv_async_event *event)
{switch (event->event_type) {/* QP events */case IBV_EVENT_QP_FATAL:printf("QP fatal event for QP with handle %p\n", event->element.qp);break;case IBV_EVENT_QP_REQ_ERR:printf("QP Requestor error for QP with handle %p\n", event->element.qp);break;case IBV_EVENT_QP_ACCESS_ERR:printf("QP access error event for QP with handle %p\n", event->element.qp);break;case IBV_EVENT_COMM_EST:printf("QP communication established event for QP with handle %p\n", event->element.qp);break;case IBV_EVENT_SQ_DRAINED:printf("QP Send Queue drained event for QP with handle %p\n", event->element.qp);break;case IBV_EVENT_PATH_MIG:printf("QP Path migration loaded event for QP with handle %p\n", event->element.qp);break;case IBV_EVENT_PATH_MIG_ERR:printf("QP Path migration error event for QP with handle %p\n", event->element.qp);break;case IBV_EVENT_QP_LAST_WQE_REACHED:printf("QP last WQE reached event for QP with handle %p\n", event->element.qp);break;/* CQ events */case IBV_EVENT_CQ_ERR:printf("CQ error for CQ with handle %p\n", event->element.cq);break;/* SRQ events */case IBV_EVENT_SRQ_ERR:printf("SRQ error for SRQ with handle %p\n", event->element.srq);break;case IBV_EVENT_SRQ_LIMIT_REACHED:printf("SRQ limit reached event for SRQ with handle %p\n", event->element.srq);break;/* Port events */case IBV_EVENT_PORT_ACTIVE:printf("Port active event for port number %d\n", event->element.port_num);break;case IBV_EVENT_PORT_ERR:printf("Port error event for port number %d\n", event->element.port_num);break;case IBV_EVENT_LID_CHANGE:printf("LID change event for port number %d\n", event->element.port_num);break;case IBV_EVENT_PKEY_CHANGE:printf("P_Key table change event for port number %d\n", event->element.port_num);break;case IBV_EVENT_GID_CHANGE:printf("GID table change event for port number %d\n", event->element.port_num);break;case IBV_EVENT_SM_CHANGE:printf("SM change event for port number %d\n", event->element.port_num);break;case IBV_EVENT_CLIENT_REREGISTER:printf("Client reregister event for port number %d\n", event->element.port_num);break;/* RDMA device events */case IBV_EVENT_DEVICE_FATAL:printf("Fatal error event for device %s\n", ibv_get_device_name(ctx->device));break;default:printf("Unknown event (%d)\n", event->event_type);}
}/* the actual code that reads the events in the loop and prints it */
int ret;while (1) {/* wait for the next async event */ret = ibv_get_async_event(ctx, &event);if (ret) {fprintf(stderr, "Error, ibv_get_async_event() failed\n");return -1;}/* print the event */print_async_event(ctx, &event);/* ack the event */ibv_ack_async_event(&event);
}

2)读取异步事件(以非阻塞方式)并打印其上下文:

int flags;
int ret;printf("Changing the mode of events read to be non-blocking\n");/* change the blocking mode of the async event queue */
flags = fcntl(ctx->async_fd, F_GETFL);
ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
if (ret < 0) {fprintf(stderr, "Error, failed to change file descriptor of async event queue\n");return -1;
}while (1) {struct pollfd my_pollfd;int ms_timeout = 100;/** poll the queue until it has an event and sleep ms_timeout* milliseconds between any iteration*/my_pollfd.fd      = ctx->async_fd;my_pollfd.events  = POLLIN;my_pollfd.revents = 0;do {ret = poll(&my_pollfd, 1, ms_timeout);} while (ret == 0);if (ret < 0) {fprintf(stderr, "poll failed\n");return -1;}/* we know that there is an event, so we just need to read it */ret = ibv_get_async_event(ctx, &event);if (ret) {fprintf(stderr, "Error, ibv_get_async_event() failed\n");return -1;}/* print the event */print_async_event(ctx, &event);/* ack the event */ibv_ack_async_event(&event);
}

async_event.casync_event_nonblocking.c

FAQs

我必须read 异步事件吗?

No. The asynchronous events mechanism is a way to provide extra information about things that happen in the CQs, QPs, SRQs, ports, devices. The user doesn't have to use it, but it is highly recommended doing so.

我可以不时(例如,每隔几分钟)读取一次事件吗?

Yes, you can. The downside for this is that you won't know when the event happened, and maybe this information is irrelevant anymore.

这个verb是线程安全的吗?

Yes, this verb is thread-safe (just like the rest of the verbs).

我收到了 QP/CQ/SRQ 事件。其他进程也会收到此事件吗?

No. Affiliated events will be generated only to the context that this resource belongs to. Other contexts won't even know that this event occurred.

【vbers】ibv_get_async_event()相关推荐

  1. 【verbs】ibv_get_async_event()

    原文:ibv_get_async_event() - RDMAmojo RDMAmojo 描述 ibv_get_async_event() 读取 RDMA 设备上下文context的下一个异步事件. ...

  2. 【vbers】ibv_poll_cq()|RDMA

    目录 描述 参数 返回值 例子 常见问题 原文:https://www.rdmamojo.com/2013/02/15/ibv_poll_cq/  (强烈建议去看原文) 描述 ibv_poll_cq( ...

  3. 【vbers】ibv_req_notify_cq()

    ibv_req_notify_cq() - RDMAmojo RDMAmojo 描述 ibv_req_notify_cq() 在完成队列 (CQ) 上请求完成通知. 当一个请求的下一个WC添加到 CQ ...

  4. 【verbs】ibv_get_cq_event|ibv_ack_cq_events()

    目录 ibv_get_cq_event 概要 描述 返回值 提示 ibv_ack_cq_events 描述 参数 返回值 例子 常见问题 ibv_get_cq_event ibv_get_cq_eve ...

  5. 【RDMA】RDMA编程 和相关资料

    目录 RDMA的学习环境搭建 RDMA与socket的类比 RDMA编程流程 RDMA编程2 RDMA学习路线总结 简介--什么是rdma 编程环境 推荐编程库 编程参考手册 相关资料和代码参考 rd ...

  6. 【RDMA】RDMA 编程实例(rdma_cm API)

    目录 RDMA编程基础 说明 1. RDMA的学习环境搭建 2. RDMA与socket的类比 3. RDMA服务器的代码流程 main() { } 实例 用法 Makefile 服务端server. ...

  7. 【CentOS】利用Kubeadm部署Kubernetes (K8s)

    [CentOS]利用Kubeadm部署Kubernetes (K8s)[阅读时间:约10分钟] 一.概述 二.系统环境&项目介绍 1.系统环境 2.项目的任务要求 三.具体实验流程 1 系统准 ...

  8. 【Spring】框架简介

    [Spring]框架简介 Spring是什么 Spring是分层的Java SE/EE应用full-stack轻量级开源框架,以IOC(Inverse Of Control:反转控制)和AOP(Asp ...

  9. 【C#】类——里式转换

    类是由面对对象程序设计中产生的,在面向结构的程序设计例如C语言中是没有类这个概念的!C语言中有传值调用和传址调用的两种方式!在c语言中,主方法调用方法,通过传递参数等完成一些操作,其中比较常用的的数据 ...

最新文章

  1. alert三秒后关闭_疏通经络后,感觉很疲倦是什么情况?
  2. lLinux网络相关命令,防火墙介绍及相关命令
  3. Android开发之异步任务加载网络图片并存储在sdcard中(源代码分享)
  4. GoldenGate 基本参数含义
  5. ES6公用立体轮播组件的封装及使用
  6. 斑马无线打印服务器,如何设置斑马打印机无线WiFi
  7. 卷积神经网络——各种网络的简洁介绍和实现
  8. java heap_javaHeap的组成及GC监控
  9. LeetCode:892. 三维形体的表面积
  10. windows 锁屏+自动黑屏脚本
  11. redhat官方文档下载方法
  12. 【MTK sensor】alsps分析(以色温为例)
  13. 半导体PN结的工作原理
  14. 计算机频繁开机是什么原因,电脑频繁自动重启什么原因
  15. 打工人也不好惹!一份校招“恶霸”指南强势冲上GitHub热榜,一天暴涨 1000 星!
  16. 谷歌邮箱lmap服务器填什么_解决Gmail的imap收发邮件无法连接服务器的问题
  17. 机器人被挠脚心_《fm及机器人系列(tk)》专题
  18. 什么是远程桌面连接?win11系统如何启用远程桌面连接?
  19. windows中使用钩子拦截消息
  20. 在抖音里添加商品图有水印该怎么去,在抖音里添加商品从别人那里下载的商品图有水印怎么办,抖音商品图怎么去水印

热门文章

  1. 日本小学生入校时校长说的话
  2. 不要再问Python了!
  3. ZZULI郑州轻工业大学21级新生赛正式赛
  4. 学理财应该从哪些学起_学理财入门知识理财的知识有哪些
  5. java毕业设计点餐系统设计Mybatis+系统+数据库+调试部署
  6. 爬虫之Selenium模块
  7. android实践练习_android 练习之路 (五)
  8. 美国公募基金业60年来的十大巨变,预示了中国基金业的未来
  9. 沁恒微 BLE Mesh 接入天猫精灵 教程 三元组
  10. 吉林大学珠海学院计算机录取分数线,吉林大学珠海学院2018年录取分数线