A web-based solution to control a swarm of Raspberry Pis, featuring a real-time dashboard, a deep learning inference engine, 1-click Cloud deployment, and dataset labeling tools.

一种基于Web的解决方案,用于控制大量的Raspberry Pi,具有实时仪表板,深度学习推理引擎,一键式Cloud部署和数据集标签工具。

This is the first article of the three-part SorterBot series.

这是分三部分的SorterBot系列的第一篇文章。

  • Part 1 — General project description and the Web Application第1部分-常规项目描述和Web应用程序
  • Part 2 — Controlling the Robotic Arm

    第2部分-控制机械臂

  • Part 3 — Transfer Learning and Cloud Deployment (coming soon)第3部分-转移学习和云部署(即将推出)

Source code on GitHub:

GitHub上的源代码:

  • Control Panel: Django backend and React frontend, running on EC2

    控制面板 :在EC2上运行的Django后端和React前端

  • Inference Engine: Object Recognition with PyTorch, running on ECS

    推理引擎 :使用PyTorch进行对象识别,在ECS上运行

  • Raspberry: Python script to control the Robotic Arm

    Raspberry :控制机器人手臂的Python脚本

  • Installer: AWS CDK, GitHub Actions and a bash script to deploy the solution

    安装程序 :AWS CDK,GitHub Actions和bash脚本以部署解决方案

  • LabelTools: Dataset labeling tools with Python and OpenCV

    LabelTools :使用Python和OpenCV数据集标签工具

I recently completed an AI mentorship program at SharpestMinds, of which the central element was to build a project, or even better, a complete product. I choose the latter, and in this article, I write about what I built, how I built it, and what I learned along the way. Before we get started, I would like to send a special thanks to my mentor, Tomas Babej (CTO@ProteinQure) for his invaluable help during this journey.

我最近在SharpestMinds完成了AI指导计划,其核心要素是建立一个项目,或者甚至更好的一个完整的产品。 我选择后者,在本文中,我将介绍我的构建,构建方式以及在此过程中学到的知识。 在开始之前,我要特别感谢我的导师Tomas Babej(CTO @ ProteinQure)在此过程中提供的宝贵帮助。

When thinking about what to build, I came up with an idea of a web-based solution to control a swarm of Raspberry Pis, featuring a real-time dashboard, a deep learning inference engine, 1-click Cloud deployment, and dataset labeling tools. The Raspberry Pis can have any sensors and actuators attached to them. They collect data, send it to the inference engine, which processes it and turns it into commands that the actuators can execute. A control panel is also included to manage and monitor the system, while the subsystems communicate with each other using either WebSockets or REST API calls.

在考虑要构建什么时,我想到了一个基于Web的解决方案,用于控制大量的Raspberry Pis,该功能具有实时仪表板,深度学习推理引擎,一键式Cloud部署和数据集标签工具。 Raspberry Pis可以连接任何传感器和执行器。 他们收集数据,将其发送到推理机,由推理机对其进行处理并将其转换为执行器可以执行的命令。 还包括一个控制面板,用于管理和监视系统,而子系统之间可以使用WebSocket或REST API调用相互通信。

As an implementation of the above general idea, I built SorterBot, where the sensor is a camera, and the actuators are a robotic arm and an electromagnet. This solution is able to automatically sort metal objects based on how they look. When the user starts a session, the arm scans the area in front of it, locates the objects and containers within its reach, then automatically divides the objects into as many groups as many containers were found. Finally, it moves the objects to their corresponding containers.

为了实现上述总体思想,我构建了SorterBot,其中的传感器是摄像头,执行器是机械臂和电磁体。 该解决方案能够根据外观自动对金属对象进行排序。 当用户开始会话时,手臂会扫描其前面的区域,找到其范围内的对象和容器,然后自动将对象划分为与找到的容器一样多的组。 最后,它将对象移动到其相应的容器。

SorterBot automatically picks up objects
SorterBot自动拾取对象

To process the images taken by the arm’s camera, I built an inference engine based on Facebook AI’s Detectron2 framework. When a picture arrives for processing, it localizes the items and containers on that image, then saves the bounding boxes to the database. After the last picture in a given session is processed, the items are clustered into as many groups as many containers were found. Finally, the inference engine generates commands, which are instructing the arm to move the similar-looking items into the same container.

为了处理手臂相机拍摄的图像,我构建了基于Facebook AI的Detectron2框架的推理引擎。 图片到达进行处理时,它将在该图像上定位项目和容器,然后将边界框保存到数据库中。 在处理给定会话中的最后一张图片之后,将项目聚类到与找到的容器一样多的组中。 最后,推理引擎生成命令,这些命令指示手臂将外观相似的项目移动到同一容器中。

To make it easier to control and monitor the system, I built a control panel, using React for the front-end and Django for the back-end. The front end shows a list of registered arms, allows the user to start a session, and also shows existing sessions with their statuses. Under each session, the user can access the logically grouped logs, as well as before and after overview images of the working area. To avoid paying for AWS resources unnecessarily, the user also has the option to start and stop the ECS cluster where the inference engine runs, using a button in the header.

为了简化控制和监视系统,我构建了一个控制面板,前端使用React,后端使用Django。 前端显示已注册武器的列表,允许用户启动会话,还显示带有其状态的现有会话。 在每个会话下,用户都可以访问按逻辑分组的日志,以及工作区概览图像的前后。 为了避免不必要地支付AWS资源,用户还可以使用标题中的按钮来选择启动和停止运行推理引擎的ECS集群。

User Interface of the Control Panel
控制面板的用户界面

To make it easier for the user to see what the arm is doing, I used OpenCV to stitch together the pictures that the camera took during the session. Additionally, another set of pictures are taken after the arm moved the objects to the containers, so the user can see a before/after overview of the area and verify that the arm actually moved the objects to the containers.

为了使用户更容易看到手臂在做什么,我使用OpenCV将摄像机在会话期间拍摄的照片拼接在一起。 另外,在手臂将对象移至容器后,还拍摄了另一组照片,因此用户可以查看该区域的前后视图,并验证手臂是否确实将对象移至容器。

Overview image made of the session images stitched together
由缝合在一起的会话图像组成的概览图像

The backend communicates with the Raspberry Pis via WebSockets and REST calls, handles the database and controls the inference engine. To enable real-time updates from the backend as they happen, the front-end also communicates with the back-end via WebSockets.

后端通过WebSocket和REST调用与Raspberry Pi进行通信,处理数据库并控制推理引擎。 为了在后端进行实时更新时,前端还通过WebSockets与后端进行通信。

Since the solution consists of many different AWS resources and it is very tedious to manually provision them, I automated the deployment process utilizing AWS CDK and a lengthy bash script. To deploy the solution, 6 environment variables have to be set, and a single bash script has to be run. After the process finishes (which takes around 30 minutes), the user can log in to the control panel from any web browser and start using the solution.

由于该解决方案由许多不同的AWS资源组成,并且手动配置它们非常繁琐,因此我利用AWS CDK和冗长的bash脚本自动化了部署过程。 要部署该解决方案,必须设置6个环境变量,并且必须运行一个bash脚本。 该过程完成后(大约需要30分钟),用户可以从任何Web浏览器登录到控制面板并开始使用该解决方案。

Web应用程序 (The Web Application)

Conceptually the communication protocol has two parts. The first part is a repeated heartbeat sequence that the arm runs at regular intervals to check if everything is ready for a session to be started. The second part is the session sequence, responsible for coordinating the execution of the whole session across subsystems.

从概念上讲,通信协议分为两个部分。 第一部分是重复的心跳序列,手臂以固定的间隔运行,以检查是否已准备好开始会话。 第二部分是会话序列,负责协调跨子系统的整个会话的执行。

Diagram illustrating how the different parts of the solution communicate with each other
该图说明了解决方案的不同部分如何相互通信

心跳序列 (Heartbeat Sequence)

The point where the execution of the first part starts is marked with a green rectangle. As the first step, the Raspberry Pi pings the WebSocket connection to the inference engine. If the connection is healthy, it skips over to the next part. If the inference engine appears to be offline, it requests its IP address from the control panel. After the control panel returns the IP (or ‘false’ if the inference engine is actually offline), it tries to establish a connection with the new address. This behavior enables the inference engine to be turned off when it’s not in use, which lowers costs significantly. It also simplifies setting up the arms, which is especially important when multiple arms are used.

第一部分开始执行的点用绿色矩形标记。 第一步,Raspberry Pi将WebSocket连接ping到推理引擎。 如果连接正常,则跳至下一部分。 如果推理引擎似乎处于脱机状态,则它会从控制面板中请求其IP地址。 在控制面板返回IP(如果推理引擎实际上处于脱机状态,则为“ false”)之后,它将尝试与新地址建立连接。 此行为使推理引擎在不使用时可以关闭,从而大大降低了成本。 它还简化了臂的设置,这在使用多个臂时尤其重要。

Regardless if the connection with the new IP succeeds or not, the result gets reported to the control panel alongside the arm’s ID. When the control panel receives the connection status, it first checks if the arm ID is already registered in the database, and registers it if needed. After that, the connection status is pushed to the UI, where a status LED lights up in green or orange, representing whether the connection succeeded or not, respectively.

无论与新IP的连接成功与否,结果都会与机械臂ID一起报告给控制面板。 当控制面板收到连接状态时,它首先检查臂ID是否已在数据库中注册,并在需要时进行注册。 之后,将连接状态推送到UI,其中状态LED呈绿色或橙色点亮,分别表示连接是否成功。

An arm as it appears on the UI, with the start button and status light
出现在UI上的手臂,带有开始按钮和状态指示灯

On the UI, next to the status LED, there is a ‘play’ button. When the user clicks this button, the arm’s ID is added to a list in the database that contains the IDs of the arms that should start a session. When an arm checks in with the connection status, and that status is green, it checks if its ID is in that list. If it is, the ID gets removed and a response is sent back to the arm to start a session. If it isn’t, a response is sent back to restart the heartbeat sequence without starting a session.

在UI上,状态LED旁边有一个“播放”按钮。 当用户单击此按钮时,机械臂的ID将添加到数据库中的列表中,该列表包含应启动会话的机械臂的ID。 当机械臂以连接状态签入且该状态为绿色时,它将检查其ID是否在该列表中。 如果是,则删除该ID,并将响应发送回该分支以开始会话。 如果不是,则发送响应以重新启动心跳序列,而无需启动会话。

会话顺序 (Session Sequence)

The first task of the arm is to take pictures for inference. To do that, the arm moves to inference position then starts to rotate at its base. It stops at certain intervals, then the camera takes a picture, which is directly sent to the inference engine as bytes, using the WebSocket connection.

手臂的首要任务是拍照以作推断。 为此,手臂移至推断位置,然后在其底部开始旋转。 它以一定的间隔停止,然后相机拍摄照片,并使用WebSocket连接将其作为字节直接发送到推理引擎。

High-level diagram of the Inference Engine
推理引擎的高级图

When the image data is received from the Raspberry Pi, the image processing begins. First, the image is decoded from bytes, then the resulting NumPy array is used as the input of the Detectron2 object recognizer. The model outputs bounding box coordinates of the recognized objects alongside their classes. The coordinates are relative distances from the top-left corner of the image measured in pixels. Only binary classification is done here, meaning an object can be either an item or a container. Further clustering of items is done in a later step. At the end of the processing, the results are saved to the PostgreSQL database, then the images are written to disk to be used later by the vectorizer, and archived to S3 for later reference. Saving and uploading the image is not in the critical path, so they are executed in a separate thread. This lowers execution time as the sequence can continue before the upload finishes.

从Raspberry Pi收到图像数据后,图像处理开始。 首先,从字节解码图像,然后将所得的NumPy数组用作Detectron2对象识别器的输入。 该模型将已识别对象的边界框坐标与它们的类一起输出。 坐标是距图像左上角的相对距离(以像素为单位)。 此处仅进行二进制分类,这意味着对象可以是项目或容器。 项的进一步聚类在后续步骤中完成。 在处理结束时,将结果保存到PostgreSQL数据库中,然后将图像写入磁盘以供矢量化器稍后使用,并存档到S3中以供以后参考。 保存和上传图像不是关键路径,因此它们在单独的线程中执行。 由于序列可以在上传完成之前继续进行,因此减少了执行时间。

When evaluating models in Detectron2’s model zoo, I choose Faster R-CNN R-50 FPN, as it provides the lowest inference time (43 ms), lowest training time (0.261 s/iteration), and lowest training memory consumption (3.4 GB), without giving up too much accuracy (41.0 box AP, which is 92.5% of the best network’s box AP), compared to other available architectures.

在Detectron2的模型动物园中评估模型时,我选择Faster R-CNN R-50 FPN,因为它提供了最低的推理时间(43 ms),最低的训练时间(0.261 s /迭代)和最低的训练内存消耗(3.4 GB)与其他可用架构相比,不会牺牲太多的准确性(41.0盒式AP,是最佳网络盒式AP的92.5%)。

High-level diagram of the Vectorizer
矢量化器的高级图

After all of the session images have been processed and the signal to generate session commands arrived, stitching together these pictures starts on a separate process, providing a ‘before’ overview for the user. Parallel to this, all the image processing results belonging to the current session are loaded from the database. First, the coordinates are converted to absolute polar coordinates using an arm-specific constant sent with the request. The constant, r represents the distance between the center of the image and the arm’s base axis. The relative coordinates (x and y on the drawing below) are pixel distances from the top-left corner of the image. The angle where the image was taken is denoted with γ. Δγ represents the difference between the angle of the given item and the image’s center and can be calculated using equation 1) on the drawing below. The first absolute polar coordinate of the item (angle, γ’), can be simply calculated using this equation: γ’ = γ + Δγ. The second coordinate (radius, r’), can be calculated using equation 2) on the drawing.

在处理完所有会话图像并到达生成会话命令的信号之后,将这些图片拼接在一起是在单独的过程中开始的,为用户提供了“之前”的概览。 与此并行,属于当前会话的所有图像处理结果都从数据库中加载。 首先,使用随请求发送的特定于手臂的常数将坐标转换为绝对极坐标。 常数r表示图像中心与手臂的基本轴之间的距离。 相对坐标(下图上的xy )是距图像左上角的像素距离。 拍摄图像的角度用γ表示。 Δγ表示给定项目的角度与图像中心之间的差,可以使用下图中的公式1)计算得出。 可以使用以下等式简单地计算出项的第一绝对极坐标(角度, γ' ): γ'=γ+Δγ 。 可以使用图形上的公式2)计算第二个坐标(半径, r')

Drawing and equations used to convert relative coordinates to absolute polar coordinates
用于将相对坐标转换为绝对极坐标的图形和方程式

After the conversion of the coordinates, the bounding boxes belonging to the same physical objects are replaced by their averaged absolute coordinates.

坐标转换后,属于相同物理对象的边界框将被其平均绝对坐标替换。

In the preprocessing step for the vectorizer, the images saved to disk during the previous step are loaded, then cropped around the bounding boxes of each object, resulting in a small picture of every item.

在矢量化程序的预处理步骤中,加载在上一步中保存到磁盘的图像,然后在每个对象的边界框周围进行裁剪,从而为每个项目生成一张小图片。

Example of an object cropped around its bounding box
围绕其边界框裁剪的对象示例

These pictures are converted to tensors, then added to a PyTorch dataloader. Once all the images are cropped, the created batch is processed by the vectorizer network. The chosen architecture is a ResNet18 model, which is appropriate for these small-sized images. A PyTorch hook is inserted after the last fully connected layer, so in each inference step the output of that layer, a 512-dimensional feature vector is copied to a tensor outside of the network. After the vectorizer processed all of the images, the resulting tensor is directly used as input of the K-Means clustering algorithm. For the other required input, the number of clusters to be computed, a simple count of the recognized containers is inserted from the database. This step outputs a set of pairings, representing which item goes to which container. Lastly, these pairings are replaced with absolute coordinates that are sent to the robotic arm.

这些图片被转换为张量,然后添加到PyTorch数据加载器中。 裁剪完所有图像后,矢量化仪网络将处理创建的批次。 选择的体系结构是ResNet18模型,适用于这些小型图像。 在最后一个完全连接的层之后插入一个PyTorch挂钩,因此在每个推理步骤中,该层的输出将512维特征向量复制到网络外部的张量。 在向量化器处理完所有图像之后,所得张量将直接用作K-Means聚类算法的输入。 对于其他所需输入,即要计算的群集数,将从数据库中插入一个简单的已识别容器计数。 此步骤输出一组配对,代表哪个项目进入哪个容器。 最后,将这些配对替换为发送到机械臂的绝对坐标。

The commands are pairs of coordinates representing items and containers. The arm executes these one by one, moving the objects to the containers using the electromagnet.

命令是代表项目和容器的坐标对。 手臂一步一步地执行这些操作,然后使用电磁体将物体移到容器中。

After the objects were moved, the arm takes another set of pictures to be stitched, as an overview of the landscape after the operation. Finally, the arm resets to its initial position and the session is complete.

在移动对象之后,手臂会拍摄另一组要缝合的照片,作为手术后的景观概览。 最后,手臂复位到其初始位置,会话完成。

To be continued in Part 2…

在第二部分中继续……

翻译自: https://medium.com/swlh/web-application-to-control-a-swarm-of-raspberry-pis-with-an-ai-enabled-inference-engine-b3cb4b4c9fd


http://www.taodudu.cc/news/show-863808.html

相关文章:

  • 算法题指南书_分类算法指南
  • 小米 pegasus_使用Google的Pegasus库生成摘要
  • 数据集准备及数据预处理_1.准备数据集
  • ai模型_这就是AI的样子:用于回答问题的BiDAF模型
  • 正则化技术
  • 检测对抗样本_避免使用对抗性T恤进行检测
  • 大数据数据量估算_如何估算数据科学项目的数据收集成本
  • 为什么和平精英无响应_什么和为什么
  • 1. face_generate.py
  • cnn卷积神经网络应用_卷积神经网络(CNN):应用的核心概念
  • 使用mnist数据集_使用MNIST数据集上的t分布随机邻居嵌入(t-SNE)进行降维
  • python模型部署方法_终极开箱即用的自动化Python模型选择方法
  • 总体方差的充分统计量_R方是否衡量预测能力或统计充分性?
  • 多尺度视网膜图像增强_视网膜图像怪异的预测
  • 多元线性回归中多重共线性_多重共线性如何在线性回归中成为问题。
  • opencv 创建图像_非艺术家的图像创建(OpenCV项目演练)
  • 使用TensorFlow进行深度学习-第2部分
  • 基于bert的语义匹配_构建基于BERT的语义搜索系统…针对“星际迷航”
  • 一个数据包的旅程_如何学习数据科学并开始您的惊人旅程
  • jupyter 托管_如何在本地托管的Jupyter Notebook上进行协作
  • fitbit手表中文说明书_如何获取和分析Fitbit睡眠分数
  • 熔池 沉积_用于3D打印的AI(第2部分):异常熔池检测的一课学习
  • 机器学习 可视化_机器学习-可视化
  • 学习javascript_使用5行JavaScript进行机器学习
  • 强化学习-动态规划_强化学习-第4部分
  • 神经网络优化器的选择_神经网络:优化器选择的重要性
  • 客户细分_客户细分:K-Means聚类和A / B测试
  • 菜品三级分类_分类器的惊人替代品
  • 开关变压器绕制教程_教程:如何将变压器权重和令牌化器从AllenNLP上传到HuggingFace
  • 一般线性模型和混合线性模型_线性混合模型如何工作

SorterBot-第1部分相关推荐

最新文章

  1. 【开源方案共享】ORB-SLAM3开源啦!
  2. 如何使用Jekyll+GitHub Pages搭建个人博客站点
  3. acm公选课第三节4.7直播4.9补 递归 深搜啥的
  4. OpenCV中图像垂直拼接函数vconcat的使用
  5. 用Elevator优化AV1视频播放
  6. linux在文件或文件夹中查找字符串
  7. 后BERT时代:15个预训练模型对比分析与关键点探究
  8. 配置ssl证书_Mysql配置ssl证书
  9. pyqt5 捕获异常确保程序不退出_Python异常处理详解(基础篇十一)
  10. 初级使用Latex写论文经验总结
  11. 标称型数据和数值型数据_统计信息中的数据类型-标称,有序,间隔和比率数据类型,并举例说明
  12. 关于Eureka的几个问题
  13. 普通路由器改4g路由器_4G宽带随心用,办公娱乐更自由,蒲公英X4C路由器体验|路由器|蒲公英|宽带|wifi|sim...
  14. 如何把开源项目发布到Jcenter
  15. 使用NetronGraphLib类库开发Qfd质量屋编制工具
  16. python(48):re.split 多分隔符
  17. JAVA语言程序设计(基础篇)第四章——课后习题解
  18. 直推学习(transductive learning)
  19. 莱斯利Leslie种群模型 python sympy
  20. mtk 充电出错问题

热门文章

  1. Oracle数据库分页的三种方法
  2. chgrp和chown命令
  3. 转-用qemu-nbd实现mount虚拟硬盘到Host上的功能
  4. 企业能为员工储蓄点什么呢
  5. python module是干什么的_如何最简单、通俗地理解Python的模块?
  6. 鸿合怎么删掉linux6_鸿合电子白板怎么校准?鸿合电子白板校准的方法
  7. Treasure Island CodeForces - 1214D(dfs)
  8. 抖音txt表白html,抖音txt弹窗表白整蛊怎么弄 抖音表白撩妹套路弹窗设置教程
  9. mysql5.7的资源限制策略_MySQL-5.7密码策略及用户资源限制
  10. c# html文件转换word,C#实现word转换成html文档 源码