作者

QQ群:852283276
微信:arm80x86
微信公众号:青儿创客基地
B站:主页 https://space.bilibili.com/208826118

参考

Exploring the Complexities of PCIe Connectivity and Peer-to-Peer Communication
The mystery of peer-to-peer transfer
pcie-peer-to-peer-communication
PCIe Switch高级功能及应用
SOFTWARE DEVELOPMENT KITS
PCI EXPRESS SWITCHES
PCIE switch 非透明桥

P2P

CPU FT1500A/16,盘starblaze 1TB,FPGA1 PCIe1.0x8,FPGA2 PCIe3.0x8,

storage@kylin:~$ sudo ./nvmel_benchmark -i /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1
disk[0]: /dev/nvme0n1 size=[1022545821696Bytes 975175MB 952GB] max_hw_sectors=1024 online_queues=17 stripe_size=0 lba_shift=9 name=nvme0 serial=C1PSMDCP00038 model=S1200ITT1-T1M21T fw_rev=1.2.0.2
disk[1]: /dev/nvme1n1 size=[1022545821696Bytes 975175MB 952GB] max_hw_sectors=1024 online_queues=17 stripe_size=0 lba_shift=9 name=nvme1 serial=C1PSMDCP00015 model=S1200ITT1-T1M21T fw_rev=1.2.0.2
disk[2]: /dev/nvme2n1 size=[1022545821696Bytes 975175MB 952GB] max_hw_sectors=1024 online_queues=17 stripe_size=0 lba_shift=9 name=nvme2 serial=C1PSMDCP00049 model=S1200ITT1-T1M21T fw_rev=1.2.0.2
disk[3]: /dev/nvme3n1 size=[1022545821696Bytes 975175MB 952GB] max_hw_sectors=1024 online_queues=17 stripe_size=0 lba_shift=9 name=nvme3 serial=C1PSMDCP00047 model=S1200ITT1-T1M21T fw_rev=1.2.0.2

读写PC内存的速度,

storage@kylin:~$ sudo ./nvmel_benchmark -w /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1 -p malloc -s 0x4000000 -l auto -n 512 -m 16
auto rwLength=0x800000
disk /dev/nvme0n1 cmd 0x4, speed 885.43MB/s, cost times 4626ms
disk /dev/nvme1n1 cmd 0x4, speed 891.79MB/s, cost times 4593ms
disk /dev/nvme2n1 cmd 0x4, speed 892.76MB/s, cost times 4588ms
disk /dev/nvme3n1 cmd 0x4, speed 898.25MB/s, cost times 4560ms
storage@kylin:~$ sudo ./nvmel_benchmark -r /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1 -p malloc -s 0x4000000 -l auto -n 512 -m 16
auto rwLength=0x800000
disk /dev/nvme0n1 cmd 0x2, speed 2147.88MB/s, cost times 1907ms
disk /dev/nvme1n1 cmd 0x2, speed 2169.49MB/s, cost times 1888ms
disk /dev/nvme2n1 cmd 0x2, speed 2142.26MB/s, cost times 1912ms
disk /dev/nvme3n1 cmd 0x2, speed 2140.02MB/s, cost times 1914ms

单盘p2p速度,

storage@kylin:~$ sudo ./nvmel_benchmark -r /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1 -p 0x6c000000 -s 0x4000000 -l auto -n 512 -m 16
auto rwLength=0x800000
disk /dev/nvme0n1 cmd 0x2, speed 1478.17MB/s, cost times 2771ms
disk /dev/nvme1n1 cmd 0x2, speed 1481.91MB/s, cost times 2764ms
disk /dev/nvme2n1 cmd 0x2, speed 1463.38MB/s, cost times 2799ms
disk /dev/nvme3n1 cmd 0x2, speed 1465.47MB/s, cost times 2795ms
storage@kylin:~$ sudo ./nvmel_benchmark -r /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1 -p 0x74000000 -s 0x4000000 -l auto -n 512 -m 16
auto rwLength=0x800000
disk /dev/nvme0n1 cmd 0x2, speed 2036.80MB/s, cost times 2011ms
disk /dev/nvme1n1 cmd 0x2, speed 2016.74MB/s, cost times 2031ms
disk /dev/nvme2n1 cmd 0x2, speed 2054.16MB/s, cost times 1994ms
disk /dev/nvme3n1 cmd 0x2, speed 2039.84MB/s, cost times 2008ms
storage@kylin:~/nvmelite$
storage@kylin:~/nvmelite$ sudo ./nvmel_benchmark -w /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1 -p 0x6c000000 -s 0x4000000 -l auto -n 512 -m 16
auto rwLength=0x800000
disk /dev/nvme0n1 cmd 0x4, speed 885.05MB/s, cost times 4628ms
disk /dev/nvme1n1 cmd 0x4, speed 894.52MB/s, cost times 4579ms
disk /dev/nvme2n1 cmd 0x4, speed 902.40MB/s, cost times 4539ms
disk /dev/nvme3n1 cmd 0x4, speed 891.79MB/s, cost times 4593ms
storage@kylin:~$ sudo ./nvmel_benchmark -w /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1 -p 0x74000000 -s 0x4000000 -l auto -n 512 -m 16
auto rwLength=0x800000
disk /dev/nvme0n1 cmd 0x4, speed 892.18MB/s, cost times 4591ms
disk /dev/nvme1n1 cmd 0x4, speed 891.02MB/s, cost times 4597ms
disk /dev/nvme2n1 cmd 0x4, speed 899.23MB/s, cost times 4555ms
disk /dev/nvme3n1 cmd 0x4, speed 900.22MB/s, cost times 4550ms

并行p2p速度,

storage@kylin:~$ ./tbw.sh  # 1.2GB/s + 1.2GB/s
disk /dev/nvme0n1 cmd 0x4, speed 641.91MB/s, cost times 6381ms # FPGA1
disk /dev/nvme1n1 cmd 0x4, speed 640.80MB/s, cost times 6392ms # FPGA1
disk /dev/nvme2n1 cmd 0x4, speed 628.80MB/s, cost times 6514ms # FPGA2
disk /dev/nvme3n1 cmd 0x4, speed 626.59MB/s, cost times 6537ms # FPGA2
storage@kylin:~$ ./tbr.sh  # 2.1GB/s + 1.5GB/s
disk /dev/nvme2n1 cmd 0x2, speed 1069.73MB/s, cost times 3829ms # FPGA2
disk /dev/nvme3n1 cmd 0x2, speed 1068.06MB/s, cost times 3835ms # FPGA2
disk /dev/nvme1n1 cmd 0x2, speed 786.94MB/s, cost times 5205ms # FPGA1
disk /dev/nvme0n1 cmd 0x2, speed 787.24MB/s, cost times 5203ms # FPGA1
storage@kylin:~$ ./tbr_fpga2.sh # 2.2GB/s
disk /dev/nvme2n1 cmd 0x2, speed 562.95MB/s, cost times 7276ms # FPGA2
disk /dev/nvme3n1 cmd 0x2, speed 559.87MB/s, cost times 7316ms # FPGA2
disk /dev/nvme0n1 cmd 0x2, speed 548.62MB/s, cost times 7466ms # FPGA2
disk /dev/nvme1n1 cmd 0x2, speed 548.84MB/s, cost times 7463ms # FPGA2

X86平台,PEX8747,

$ sudo ./dma_benchmark -i xaxicdma -c c2h -s cmem -d 0x100000000 -m cdev:xaxicdma2@cca00000 -l 0x400000 -n 64
global params: dma type: xaxicdma read 0x400000 bytes from 0x34400000 to 0x100000000cmd type: readpoll time: 0test cycle: 64xaxicdma type: cdev name: /dev/xaxicdma2@cca00000
speed: 6095.24MB/s, cost times: 42ms
$ sudo ./dma_benchmark -i xaxicdma -c h2c -s cmem -d 0x100000000 -m cdev:xaxicdma2@cca00000 -l 0x400000 -n 64
global params: dma type: xaxicdma write 0x400000 bytes from 0x34400000 to 0x100000000cmd type: writepoll time: 0test cycle: 64xaxicdma type: cdev name: /dev/xaxicdma2@cca00000
speed: 5565.22MB/s, cost times: 46ms
qe@qe-pc:~/software/hw$ sudo ./dma_benchmark -i xaxicdma -c c2h -s 0xd0000000 -d 0x100000000 -m cdev:xaxicdma2@cca00000 -l 0x400000 -n 64
global params: dma type: xaxicdma read 0x400000 bytes from 0xd0000000 to 0x100000000cmd type: readpoll time: 0test cycle: 64xaxicdma type: cdev name: /dev/xaxicdma2@cca00000
speed: 4196.72MB/s, cost times: 61ms
qe@qe-pc:~/software/hw$
qe@qe-pc:~/software/hw$
qe@qe-pc:~/software/hw$ sudo ./dma_benchmark -i xaxicdma -c h2c -s 0xd0000000 -d 0x100000000 -m cdev:xaxicdma2@cca00000 -l 0x400000 -n 64
global params: dma type: xaxicdma write 0x400000 bytes from 0xd0000000 to 0x100000000cmd type: writepoll time: 0test cycle: 64xaxicdma type: cdev name: /dev/xaxicdma2@cca00000
speed: 4654.55MB/s, cost times: 55ms

软件架构

PCIe总线架构,

Port Numbering端口号和PCIe设备号的对应关系,

PEX8724有2个Station,最多6+1个Port(Port 8是一个软件虚拟Port),PEX8764有4个Station,最多16个Port,对比PEX8724和PEX8764系统,对于系统上的软件拓扑和实际硬件的对应关系,首先确定上行端口在哪个Station,如果在Station 0,就很简单,比如下面的PEX8764系统,15个下行Port全预留,可以看到1,2下没有EP,对应Station 0 Port 1/2,没有6,7,因为4,5配置成x8。

root@t2080rdb:~# lspci -tv
-[0000:00]---00.0-[01-0f]----00.0-[02-0f]--+-01.0-[03]--+-02.0-[04]--+-03.0-[05]----00.0  Samsung Electronics Co Ltd Device a804+-04.0-[06]----00.0  Device 0731:8000+-05.0-[07]----00.0  Device 0731:8000+-08.0-[08]----00.0  Samsung Electronics Co Ltd Device a804+-09.0-[09]----00.0  Samsung Electronics Co Ltd Device a804+-0a.0-[0a]----00.0  Samsung Electronics Co Ltd Device a804+-0b.0-[0b]--+-0c.0-[0c]----00.0  Samsung Electronics Co Ltd Device a804+-0d.0-[0d]----00.0  Samsung Electronics Co Ltd Device a804+-0e.0-[0e]----00.0  Samsung Electronics Co Ltd Device a804\-0f.0-[0f]----00.0  Samsung Electronics Co Ltd Device a804

下面的PEX8724系统,上行端口接在Station 1 Port 9,x8模式,Station 1还有Port 8,Station 0有4个x4 Port,所以总线号还是0,1,2,3,8这几个值,但Port 8和上行端口在一个Station,它先被识别,占用0,还剩1,2,3,8四个号分给Station 0的4个Port,所以当上行端口不在Station 0时,一一对应的关系就不那么明显了,具体情况具体分析,

root@zynqmp:~# lspci -tv
-[0000:00]---00.0-[01-07]----00.0-[02-07]--+-00.0-[03]----00.0  PLX Technology, Inc. Device 87b1+-01.0-[04]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961+-02.0-[05]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961+-03.0-[06]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961\-08.0-[07]--

PCI内存映射配置空间,

Switch Routing

As you told Memory Write transactions requires “valid” Addr of recipient and Data and Memory read transactions requires “Vaild” Addr and data "size " required to creates write or read requests. For the requests from Downstream port, switch takes care of routing to any Upstream (Root) or Downstream (peer to peer), by comparing with its “Base and Limit” Registers. Switch Routing: First check the address on its own bars, if it matched, it will consume. Two bars are available in each Switches. If not, check its IO/P-MMIO/NP-MMIO Base and Limit Register pairs based on the request type. If a TLP travels to Upstream port and if it matches to its Base and Limit Registers it will be handled as “Unsupported Request” on secondary interface. ( again it will pass to the downstream port, other than one that it received, since it may be peer-to-peer communication). If not matches at any interface, it will forwarded to its primary interface as it not matches for the bridge and any function beneath this bridge.

硬件架构

4.4节Hardware Architecture,PEX8724包含两个Station,Station 0对应Port 0 ~ Port 3,Station 1对应Port 8 ~ Port 10,其中Port 8是软件虚拟的,

端口配置

芯片复位

I2C

寄存器访问方法,

写命令格式,

读命令格式稍有不同,Command字段为100b,写数据流程,

start > addr > cmd > data > stop

读数据流程为,

start > addr > cmd > stop > start > addr > data > stop

Serial EEPROM

SPI接口的EEPROM,刚开始是11,表示找到EEPROM但是没有有效的配置数据。

REG BYTE COUNT是总字节数,不是寄存器个数。

Read Pacing

Read Pacing是协调资源分配,在大流量设备工作时,防止小流量设备得不到仲裁被饿死,

ACS Extended Capability Registers

Egress/Ingress Control

PLX PCIe Switch使用相关推荐

  1. PCIe Switch高级功能及应用

    PCI-E Switch芯片,估计不少人已经听说过这个东西了.但是估计多数人对其基本功能知之甚少.PCI-E Switch作为最先进的生产力,已经被广泛应用在了传统存储系统,以及少量品牌/型号的服务器 ...

  2. 飞腾S2500平台PCIe SWITCH下热插拔验证

    飞腾S2500平台PCIe SWITCH下热插拔验证 插拔前主板PCIe设备情况 对PCIe外设进行热插拔 对PCIe外设进行重新枚举,并分配资源 对新的PCIe外设进行功能测试 插拔前主板PCIe设 ...

  3. PCIE switch 非透明桥

    非透明桥,可以把PCIE switch分为几个单独的虚拟switch部分,每一部分都有USP和0或更多的下游端口.这就可以使switch连接多个RC. 每个RC可以枚举自己PCIe域的设备. NT E ...

  4. PCIe Switch PM40028调试

    背景:项目使用到了一款PCIe Gen4的Switch芯片用于高速数据的交换,芯片型号为PM40028,制造商为microchip. 前期工作: 初期参考Demo板设计了电路. 回板后按照原厂要求进行 ...

  5. PCIE知识点-008:PCIE switch的结构

    1.Switch基本结构 Switch结构图如图1-3所示,switch包含一个upstream port和若干个downstream port,upstream port和downstream po ...

  6. PCIE switch 连接绿联SSD

    硬件平台:jetson xavier + +PM4000开发板(100 lane)  G7200 1.使用绿联的G7200 2.PM4000 的statck5连接xavier,statck0 的por ...

  7. PCIe Switch

    PI7C9X2G606PR PERST_L(H13):系统复位,低有效. REXT1:0:

  8. 【VS开发】PCIe体系结构的组成部件

    PCIe总线作为处理器系统的局部总线,其作用与PCI总线类似,主要目的是为了连接处理器系统中的外部设备,当然PCIe总线也可以连接其他处理器系统.在不同的处理器系统中,PCIe体系结构的实现方法略有不 ...

  9. PCI-E基础知识学习

    PCIE特点: (1)2种路由方式:基于地址的路由方式.基于ID(BDFR)的路由方式 (2)2种数据发送方式:Posted方式和No-Posted方式: (3)多种总线事务:配置读写.内存读写.IO ...

最新文章

  1. CNN边缘检测--Richer Convolutional Features for Edge Detection
  2. Java爬虫——网易云热评爬取
  3. 机器学习系列-随机过程
  4. MyCat数据库分片
  5. 18亿用户、10万条电源线、4200万月活......创业者的底限究竟在哪里?
  6. LeetCode中常用语言的一些基本方法记录
  7. 修改默认的pip版本为对应python2.7
  8. springMvc 传子 bean 中有bean
  9. react.js基础
  10. STL6-输入输出流
  11. python soup提取叶子标签_使用Python爬虫库BeautifulSoup遍历文档树并对标签进行操作详解(新手必学)...
  12. loadrunner下载地址
  13. C10K 和 C1000K
  14. python语义分析_NLPIR语义分析系统——文本分析利器
  15. 华为方舟编译器官网正式上线,写一篇你应该知道的科普文章
  16. Oracle ADF 12.2.1 使用报告
  17. Google 文档 地址
  18. PPT文件不能编辑可以这样解决
  19. go使用viper读取配置参数热加载
  20. 自动控制理论基本概念

热门文章

  1. which must be escaped when used within the value
  2. mysql的partition_MySQL分区(Partition)
  3. 淘宝、百度软件工程师们小调皮,各种霸气外漏
  4. BaseMultiItemQuickAdapter 条目position获取
  5. Python基础算法案例:24点纸牌游戏算法
  6. 删除集合中特定元素的几种情况
  7. python网格交易法详解_3分钟带你了解网格交易法
  8. 嫁人就要嫁程序员,钱多话少死得早!
  9. 静态成员变量与静态成员函数的声明与定义
  10. eclipse黑色主题黑色背景