2021SC@SDUSC-Zxing（十一）：二维码的定位（Detector）及位置校正有关算法介绍

2021SC@SDUSC

文章目录

一、对前面博客内容的总结
二、对后面博客思路的介绍
三、Detector
- 一、Detector所在目录介绍
- 二、定位流程
- 三、相关算法
- - 查找定位点算法
  - 二维码校正算法——计算机透视变换
  - QR码大小估计——Bresenham’s algorithm直线算法
四、总结

一、对前面博客内容的总结

在前几篇博客中，我们将解码流程涉及到的五个主要class进行了讲解，并且分步给出了解码实例，也对实际扫码时可能出现的问题进行了解释，加深了对解码流程的理解。前几篇博客介绍的知识基本可以满足我们在实际应用中调用方法、实现解码的需要。但是Zxing的解码代码中还有一部分很重要的内容，涉及到更深层次的解码知识。从本篇博客开始，我们对这些内容进行介绍。

二、对后面博客思路的介绍

首先观察zxing的目录结构

├─aztec  二维码
│  ├─decoder 解码
│  ├─detector 定位
│  └─encoder 编码
├─client
│  └─result
├─common
│  ├─detector
│  └─reedsolomon
├─datamatrix 二维码
│  ├─decoder 解码
│  ├─detector 定位
│  └─encoder 编码
├─maxicode  二维码
│  └─decoder 解码
├─multi  识别图中多个码
│  └─qrcode
│      └─detector 定位
├─oned  一维码
│  └─rss 一维码和二维码的组合码
│      └─expanded 为rss提供服务
│          └─decoders 解码
├─pdf417  二维码
│  ├─decoder 解码
│  │  └─ec
│  ├─detector 定位
│  └─encoder 编码
└─qrcode  二维码├─decoder 解码├─detector 定位└─encoder 编码

在涉及到二维码的部分（qrcode、pdf417、maxicode、datamatrix、aztec）基本包含三个文件夹：解码、定位、编码；oned中包括Zxing所有支持的条形码；common是一些公用的方法；client.result中则是一些在实际应用中，不同场景下的解码结果（如对联系人信息进行编码的解析结果、对电子邮件信息进行编码的解析结果等）。后面的博客会分别对二维码、一维码、混合码介绍解码算法和定位算法，再对剩余的部分类别方法做简单了解。

三、Detector

在QRCodeReader中我们介绍了decode方法，decode是所有Reader的核心方法，它实现了解码图像（image）到解码结果（Result）的转换。但其实在Reader中这个方法把关键操作（如：码的定位、解码算法）交给了其他类来实现，自己只是调用了相关方法、将这些过程做了个整合。其中，定位算法是交给Detector来做的，下面对Detector进行介绍。

一、Detector所在目录介绍

AlignmentPattern：封装alignment patterns（对应下图 4.2矫正标志），除了最简单的QR码外，其他所有二维码中都有较小的方形模式
AlignmentPatternFinder：尝试在二维码中查找alignment patterns（对应下图 4.2矫正标志）。目前，这只查找右下alignment patterns。
Detector：这个类封装可以检测图像中QR码的逻辑，即使QR码旋转、倾斜或部分模糊。
FinderPattern：封装一个finder patterns（对应下图4.1定位标志），这是在QR码的四角中发现的三个方形模式。
FinderPatternFinder：尝试在二维码中找到finder patterns（对应下图4.1定位标志）。Finder图案是二维码三个角上的方形标记。
FinderPatternInfo：封装图像中有关finder patterns（对应下图4.1定位标志）的信息，包括三个finder patterns的位置及其估计的模块大小。

通过分析可知，Detector探测的是4个黑白方框的信息，这与SDUSC-Zxing（九）中介绍的4个定位点是一致的。

二、定位流程

这个类封装可以检测图像中QR码的逻辑，即使QR码旋转、倾斜或部分模糊。

有关代码
detect部分：

  public final DetectorResult detect(Map<DecodeHintType,?> hints) throws NotFoundException, FormatException {resultPointCallback = hints == null ? null :(ResultPointCallback) hints.get(DecodeHintType.NEED_RESULT_POINT_CALLBACK);// 尝试在二维码中找到查找器模式。Finder图案是二维码三个角上的方形标记。FinderPatternFinder finder = new FinderPatternFinder(image, resultPointCallback);FinderPatternInfo info = finder.find(hints);return processFinderPatternInfo(info);}protected final DetectorResult processFinderPatternInfo(FinderPatternInfo info)throws NotFoundException, FormatException {// 获取三个定位点位置FinderPattern topLeft = info.getTopLeft();FinderPattern topRight = info.getTopRight();FinderPattern bottomLeft = info.getBottomLeft();// 根据从三个查找器模式的位置导出的估计值，计算平均估计模块大小。float moduleSize = calculateModuleSize(topLeft, topRight, bottomLeft);if (moduleSize < 1.0f) {throw NotFoundException.getNotFoundInstance();}// 通过二维码信息判断版本int dimension = computeDimension(topLeft, topRight, bottomLeft, moduleSize);Version provisionalVersion = Version.getProvisionalVersionForDimension(dimension);int modulesBetweenFPCenters = provisionalVersion.getDimensionForVersion() - 7;AlignmentPattern alignmentPattern = null;// 版本1以上的任何内容都有alignment pattern（即二维码右下角的校验位）if (provisionalVersion.getAlignmentPatternCenters().length > 0) {// 猜测“右下角”查找器模式会在哪里float bottomRightX = topRight.getX() - topLeft.getX() + bottomLeft.getX();float bottomRightY = topRight.getY() - topLeft.getY() + bottomLeft.getY();float correctionToTopLeft = 1.0f - 3.0f / modulesBetweenFPCenters;int estAlignmentX = (int) (topLeft.getX() + correctionToTopLeft * (bottomRightX - topLeft.getX()));int estAlignmentY = (int) (topLeft.getY() + correctionToTopLeft * (bottomRightY - topLeft.getY()));// 有点随意——在放弃搜索alignment pattern之前扩大搜索半径for (int i = 4; i <= 16; i <<= 1) {try {alignmentPattern = findAlignmentInRegion(moduleSize,estAlignmentX,estAlignmentY,i);break;} catch (NotFoundException re) {// 试试下一轮}}// 没有找到alignment pattern}
// 经过图像扭曲处理后的图片PerspectiveTransform transform =createTransform(topLeft, topRight, bottomLeft, alignmentPattern, dimension);BitMatrix bits = sampleGrid(image, transform, dimension);ResultPoint[] points;if (alignmentPattern == null) {points = new ResultPoint[]{bottomLeft, topLeft, topRight};} else {points = new ResultPoint[]{bottomLeft, topLeft, topRight, alignmentPattern};}return new DetectorResult(bits, points);}

getBits和getPoints方法

  public final BitMatrix getBits() {return bits;}public final ResultPoint[] getPoints() {return points;}

三、相关算法

查找定位点算法

FinderPatternFinder.find(Map<DecodeHintType,?> hints):
我们在Reader中介绍过二维码的“回”字形定位标志是有比例要求的，这个比例就是为了这里定位做准备。

find方法首先寻找黑/白/黑/白/黑、比例为1:1:3:1:1的模块；有多少个这样的模块就跟踪多少个。算法就是两层for循环以实现对图像的遍历，但是由于涉及到很多分类情况，这个部分有很多if-else判断，如，当前像素是黑or白，当前1:1:3:1:1模块的扫描进度，是否找到多个可能点等。

二维码校正算法——计算机透视变换

如上图所示，这个算法的作用就是在二维码变形的情况下识别出二维码。用到的原理是计算机透视变换。透视变换(Perspective Transformation)是将图片投影到一个新的视平面(Viewing Plane)，也称作投影映射(Projective Mapping)，如下图：

算法思想：
u,v是原始图片坐标，对应得到变换后的图片坐标x,y，其中；我们一般是处理二维图像，所以源坐标的w恒为1，通用变换公式入下：

变换矩阵可以拆成4部分，表示线性变换，比如scaling（缩放），shearing（错切）和ratotion（翻转）。用于平移，产生透视变换。所以可以理解成仿射（线性变换+平移）等是透视变换的特殊形式。经过透视变换之后的图片通常不是平行四边形（除非映射视平面和原来平面平行的情况）。

在代码中，整套算法的重点是squareToQuadrilateral方法，因此着重介绍一下squareToQuadrilateral方法：
在squareToQuadrilateral中我们定义了几个辅助变量（∆x在代码中对应dx1）

①当都为0时变换平面与原来是平行的，可以得到：

②当不为0时，得到：

求解出的变换矩阵就可以将一个正方形变换到四边形。反之，四边形变换到正方形也是一样的。
重写之前的变换公式得到，我们已知目标坐标(x,y),又知道a11~a33的值，所以求原始坐标u v也不过求一个二元一次方程组的解而已：

于是，我们通过两次变换：四边形变换到正方形+正方形变换到四边形就可以将任意一个四边形变换到另一个四边形。
算法代码：

//代码直接基于George Wolberg的“Digital Image Warping”第3.4.2节；见第54-56页
public final class PerspectiveTransform {private final float a11;private final float a12;private final float a13;private final float a21;private final float a22;private final float a23;private final float a31;private final float a32;private final float a33;
// 定义矩阵private PerspectiveTransform(float a11, float a21, float a31,float a12, float a22, float a32,float a13, float a23, float a33) {this.a11 = a11;this.a12 = a12;this.a13 = a13;this.a21 = a21;this.a22 = a22;this.a23 = a23;this.a31 = a31;this.a32 = a32;this.a33 = a33;}
// (x0,y0)-(x3,y3)表示正方形四个顶点，(x0p,y0p)-(x3p,y3p)表示四边形（要做透视变换）的四个顶点。逆时针public static PerspectiveTransform quadrilateralToQuadrilateral(float x0, float y0,float x1, float y1,float x2, float y2,float x3, float y3,float x0p, float y0p,float x1p, float y1p,float x2p, float y2p,float x3p, float y3p) {//四边形到正方形PerspectiveTransform qToS = quadrilateralToSquare(x0, y0, x1, y1, x2, y2, x3, y3);//正方形到四边形PerspectiveTransform sToQ = squareToQuadrilateral(x0p, y0p, x1p, y1p, x2p, y2p, x3p, y3p);return sToQ.times(qToS);}public static PerspectiveTransform squareToQuadrilateral(float x0, float y0,float x1, float y1,float x2, float y2,float x3, float y3) {float dx3 = x0 - x1 + x2 - x3;float dy3 = y0 - y1 + y2 - y3;if (dx3 == 0.0f && dy3 == 0.0f) {// 对应情况①return new PerspectiveTransform(x1 - x0, x2 - x1, x0,y1 - y0, y2 - y1, y0,0.0f,    0.0f,    1.0f);} else {// 对应情况②float dx1 = x1 - x2;float dx2 = x3 - x2;float dy1 = y1 - y2;float dy2 = y3 - y2;float denominator = dx1 * dy2 - dx2 * dy1;float a13 = (dx3 * dy2 - dx2 * dy3) / denominator;float a23 = (dx1 * dy3 - dx3 * dy1) / denominator;return new PerspectiveTransform(x1 - x0 + a13 * x1, x3 - x0 + a23 * x3, x0,y1 - y0 + a13 * y1, y3 - y0 + a23 * y3, y0,a13,                a23,                1.0f);}}public static PerspectiveTransform quadrilateralToSquare(float x0, float y0,float x1, float y1,float x2, float y2,float x3, float y3) {//在这里，伴随式充当逆函数return squareToQuadrilateral(x0, y0, x1, y1, x2, y2, x3, y3).buildAdjoint();}PerspectiveTransform buildAdjoint() {// 伴随是余因子矩阵的转置：return new PerspectiveTransform(a22 * a33 - a23 * a32,a23 * a31 - a21 * a33,a21 * a32 - a22 * a31,a13 * a32 - a12 * a33,a11 * a33 - a13 * a31,a12 * a31 - a11 * a32,a12 * a23 - a13 * a22,a13 * a21 - a11 * a23,a11 * a22 - a12 * a21);}
// 进行转换PerspectiveTransform times(PerspectiveTransform other) {return new PerspectiveTransform(a11 * other.a11 + a21 * other.a12 + a31 * other.a13,a11 * other.a21 + a21 * other.a22 + a31 * other.a23,a11 * other.a31 + a21 * other.a32 + a31 * other.a33,a12 * other.a11 + a22 * other.a12 + a32 * other.a13,a12 * other.a21 + a22 * other.a22 + a32 * other.a23,a12 * other.a31 + a22 * other.a32 + a32 * other.a33,a13 * other.a11 + a23 * other.a12 + a33 * other.a13,a13 * other.a21 + a23 * other.a22 + a33 * other.a23,a13 * other.a31 + a23 * other.a32 + a33 * other.a33);}
}

QR码大小估计——Bresenham’s algorithm直线算法

在流程图中，我们介绍了通过三个定位点来估计二维码大小，其本质就是利用定位点中心坐标，两点确定一条直线。在纸上，给定起点终点，我们用尺子可以很容易的画出直线，但是电脑是通过计算直线则要考虑一个一个的像素点。

算法思想：

图中每个点表示一个像素

算法代码：

  // Bresenham算法的轻度变体,计算起点(fromX,fromY)——>终点(toX,toY)的距离private float sizeOfBlackWhiteBlackRun(int fromX, int fromY, int toX, int toY) {//以上图为例，如果height>width，steep为trueboolean steep = Math.abs(toY - fromY) > Math.abs(toX - fromX);if (steep) {//temp为临时变量int temp = fromX;fromX = fromY;fromY = temp;temp = toX;toX = toY;toY = temp;}//令dx=width，dy=heightint dx = Math.abs(toX - fromX);int dy = Math.abs(toY - fromY);int error = -dx / 2;// int xstep = fromX < toX ? 1 : -1;int ystep = fromY < toY ? 1 : -1;// 在黑色像素中，第一次或第二次查找白色。int state = 0;// 循环直到x==toX，但不能超过int xLimit = toX + xstep;for (int x = fromX, y = fromY; x != xLimit; x += xstep) {int realX = steep ? y : x;int realY = steep ? x : y;
//当前像素是否意味着我们已经将白色移到黑色（或黑色移动到白色）？
//扫描状态 0，2 中的黑色和状态 1 中的白色，所以如果我们发现错误的颜色，则前进到下一个状态；
//如果我们已经在状态 2 中，则结束if ((state == 1) == image.get(realX, realY)) {if (state == 2) {return MathUtils.distance(x, y, fromX, fromY);}state++;}error += dy;if (error > 0) {if (y == toY) {break;}y += ystep;error -= dx;}}//发现黑-白-黑；如果怀疑图像外的下一个像素是“白色”，//那么（toX+xStep，toY）的最后一点就是正确的结尾。//这是一个很小的近似值；（toX+xStep，toY+yStep）可能是正确的。忽略这一点。if (state == 2) {return MathUtils.distance(toX + xstep, toY, fromX, fromY);}// 如果执行到了这一步，说明我们甚至都没有发现黑白黑，估计是不可能找到的。return Float.NaN;}

四、总结

这部分我们完成了二维码的寻找，二维码寻找依据的是其编码特点。

版本号：QR 码符号共有 40 种规格，分别为版本1、版本2……版本40。版本 1 的规格为21模块×21模块，版本 2 的规格为25模块×25模块，以此类推，每一版本符号比前一版本每边增加4个模块，直到版本40（规格为177模块×177模块）。
定位点：分别位于二维码左上角、右上角、左下角，每个位置探测图形由7×7个模块组成（1:1:3:1:1），符号中其他地方遇到类似图形的可能性极小，因此可以在视场中迅速地识别可能的 QR 码符号。识别组成寻像图形的三个位置探测图形，可以明确地确定视场中符号的位置和方向。下图是版本1和版本6的寻像图形，由图可知，版本号越高，寻像图形在整个图案中所占比例越小。

为方便识别位置探测图形，在每个位置探测图形和编码区域之间有宽度为 1 个模块的分隔符，如下图黄色区域所示。此区域应全为空白，不能填入数据。
定时标志：水平和垂直定位图形分别为一个模块宽的一行和一列，由深色与浅色模块交替组成，其开始和结尾都是深色模块。水平和垂直定位图形分别位于第6行和第6列（行、列由0开始计数），并且避开定位点。它们的作用是确定符号的密度和版本，提供决定模块坐标的基准位置。下图是绘制了定位图形后的版本1和版本6图案。
校正位：校正图形作为一个固定的参照图形，在图像有一定程度损坏的情况下，译码软件可以通过它同步图像模块的坐标映像。校正图形的数量视 QR 码的版本号而定。

此外，由于没有固定公式计算中心模块的行/列坐标值，因此在DetectorResult detect(Map<DecodeHintType,?> hints)中查找位置时比较随意（具体见代码注释）。
下一章节我们将介绍Zxing是如何将BitMatrix解析成文本的。

欢迎提出宝贵意见，感谢观看！
参考：
ZxingAPI
当我们在扫描二维码时，我们在扫描什么？
Bresenham’s algorithm( 布兰森汉姆算法)画直线
【图像处理】计算机视觉透视变换 Perspective Transformation
QR码详解（上）