转载自：https://xgoat.com/wp/2011/05/08/dismantling-artoolkit-part-1-labelling/

Tonight’s PhD creative/productive escape was to continue teasing apart some of the functionality of ARToolkit. I’m pursuing getting some fiducial marker detection, à la ARToolKit, into the core of Student Robotics 2012’s kit (the one that’ll be shipped in October). We won’t be using the exact algorithms found in ARToolKit, as it frequently reports markers that aren’t there, but learning some of what I perceived to be the “magic” of the library’s guts seemed like a good idea.

I first hit Google scholar to find papers about how ARToolKit and other similar libraries work. Luckily, I’m currently a student at an institution that provides me with access to journal and conference papers. Sadly this is not the case for everyone, which sucks. I read, and skimmed through, a few papers. This gave me an idea of what’s around. Unfortunately I didn’t find a thorough description of ARToolKit’s algorithms. Even ARToolKit’s own description of its vision algorithm left big gaps out. There are a few papers out there that compare different marker detection libraries. I’d link to them, but they’re behind a paywall, so I’d rather not.

What I suspect is happening is that people take one look at the more interesting functions within ARToolKit’s source and then run a mile. Take, for example, the function labeling2 found in arLabeling.c. Take a look at it now. To understand what’s going on in there you need to be really determined. You need to wade through some manual loop unrolling, preprocessor fudging, and arrays of ints being treated like they should be arrays of structs. More on this in a bit.

What’s interesting is that this code works well enough for people to use it. Wikipedia says:

“ARToolKit is a very widely used AR tracking library with over 160,000 downloads since 2004”

So, either I’m going crazy and I have lost the ability to read someone else’s code, or the ARToolKit code leaves a lot to be desired. For what are hopefully intuitive reasons, I’m going to opt for the ~~former~~ latter explanation. So here I get to apply some rather extended extrapolation (for more of this see Freakonomics) about how we reached a situation in which at least thousands of people are using a library that’s quite impenetrable. I think it’s a pretty good demonstration of two things. First: usability can count more than implementation detail. ARToolKit functions as a library fine, and has a usable interface. Most users don’t need to care about the internal implementation until it breaks. Secondly, it’s a demonstration that organically developed things can work, and that one doesn’t need to follow software engineering formalisms to the letter to get the job done.

Still, all this mustn’t belittle the achievement of the authors of ARToolKit. They’ve obviously made a significant contribution to a lot of good projects that people have done, and I’m sure that many of the ideas wouldn’t have nucleated had the library not existed at all. So, time to quit whinging about how impenetrable the code is, and get to work on deciphering it! I’ll be doing this over a series of blog posts. This is the first one, and is about the first stage of ARToolKit’s image processing: labelling. I’ll be ignoring the image acquisition itself because this is a pretty run-of-the-mill operation.

Ok, I lied, I’m actually going to cover thresholding and labelling here. The thresholding step is really simple and happens in the same pass as labelling, so it doesn’t really count. Both the thresholding and labelling happen in the labeling2() function in arLabeling.c. I’ve spent several hours working through this function, simplifying it so that I could understand what was going on. Whilst I was doing this, I committed my changes to a git repository:

git clone https://xgoat.com/pubgit/artoolkit-revenge.git

The contents of the above git repository is not intended to be run. The only reason for this code existing is to help me (and maybe others) understand how ARToolKit works. In doing this, I removed several things that I didn’t care about, such as different pixel colour formats. These things weren’t important to understanding how ARToolKit’s algorithms work. At the point of writing, I’ve got labeling2() down from 344 lines of quite unapproachable code to 169 lines of fodder that I find easier to grok.

Thresholding

ARToolKit identifies black-and-white markers in images from the camera, such as the one shown on the right. It converts the colour image that comes from the camera into a black-and-white image using a threshold specified by the user. If one is working with an RGB image, then the sum of the R, G, and B components for each pixel are compared against the threshold that the user specifies like so:

r + g + b < threshold * 3

Pixels that satisfy this threshold are suitable for labelling. Other pixels are ignored.

Labelling

The pixels that got through the thresholding step are now grouped into sets of connected pixels. Each group of connected pixels is assigned a label, which is just an integer greater than zero. This is done by iterating through the thresholded image pixels, row-by-row from top-left to bottom-right, and storing its calculated label number in a second image buffer. A pixel's label is decided as follows:

If the pixel above is labelled, take the same label as it.
Otherwise, if the pixel to the top-right was labelled then some checks are made to determine if two groups of (what are currently) differently labelled pixels require merging to have the same label. If at least one of the pixels to the left and or top-left of the current pixel is labelled, then two labelled regions are indeed connected. ARToolKit makes the labels of the two intersecting groups equivalent, simply by recording the fact that these two labels are equivalent -- it doesn't go renumbering the labels stored in the label image buffer (this'd be quite inefficient).
Otherwise, take the label of the pixel to the top-left if it has one.
Otherwise, take the label of the pixel to the left if it has one.
Finally, if none of the above conditions were met, then a new region has been found. A new label number is assigned to the pixel.

Whilst all this labelling is going on, statistics about each label are built up as well. The following stats are collected for each label:

The number of pixels assigned this label.
The sum of the x coordinates, as well as the sum of the y coordinates of each pixel of this label.
The minimum and maximum x and y coordinates for the pixel.

The first two of those statistics are used to calculate the centre-point of a labelled region, whilst the other numbers will be passed back to the caller of labeling2.

As I said above, label numbers can be made equivalent to each other during the labelling process. This means that after the labelling is complete, there can be redundancy in label numbers. ARToolKit performs a little bit of label number shuffling to remove this redundancy, and ensures that all label numbers are consecutive.

These statistics are passed back along with the labelled image buffer to the caller. I won't go into the precise details here. If you want to know more, have a look at the labeling2 function in the modified sources that I linked to above. I've changed the prototype of the labeling2 function so that it uses arrays of structs that are easier to decipher, so hopefully it'll all make sense.

That's where I'm up to right now with parsing ARToolKit's behaviour. The next instalment will be on the behaviour of arDetectMarker2(), which will use some of the information collected by the labelling process.

Posted at 3:15 am on Sunday 8th May 2011

ARToolkit Part 1: Labelling 详解相关推荐

LSTM公式详解推导
书籍简介 LSTM理解 LSTM流程简介算法及公式一些函数一些符号前向传播反向传播关于误差的定义公式推导总结书籍简介 <Surpervised Sequence Labelli ...
从命令行到IDE，版本管理工具Git详解（远程仓库创建+命令行讲解+IDEA集成使用）
首先,Git已经并不只是GitHub,而是所有基于Git的平台,只要在你的电脑上面下载了Git,你就可以通过Git去管理"基于Git的平台"上的代码,常用的平台有GitHub.Gi ...
JVM年轻代，老年代，永久代详解
秉承不重复造轮子的原则,查看印象笔记分享连接↓↓↓↓ 传送门:JVM年轻代,老年代,永久代详解速读摘要最近被问到了这个问题,解释的不是很清晰,有一些概念略微模糊,在此进行整理和记录,分享给大家.在 ...
docker常用命令详解
docker常用命令详解本文只记录docker命令在大部分情境下的使用,如果想了解每一个选项的细节,请参考官方文档,这里只作为自己以后的备忘记录下来. 根据自己的理解,总的来说分为以下几种: Doc ...
通俗易懂word2vec详解词嵌入-深度学习
https://blog.csdn.net/just_so_so_fnc/article/details/103304995 skip-gram 原理没看完 https://blog.csdn.net ...
深度学习优化函数详解（5）-- Nesterov accelerated gradient (NAG) 优化算法
深度学习优化函数详解系列目录深度学习优化函数详解(0)– 线性回归问题深度学习优化函数详解(1)– Gradient Descent 梯度下降法深度学习优化函数详解(2)– SGD 随机梯度下降 ...
CUDA之nvidia-smi命令详解---gpu
nvidia-smi是用来查看GPU使用情况的.我常用这个命令判断哪几块GPU空闲,但是最近的GPU使用状态让我很困惑,于是把nvidia-smi命令显示的GPU使用表中各个内容的具体含义解释一下. ...
Bert代码详解（一）重点详细
这是bert的pytorch版本(与tensorflow一样的,这个更简单些,这个看懂了,tf也能看懂),地址:https://github.com/huggingface/pytorch-pretr ...
CRF(条件随机场)与Viterbi(维特比)算法原理详解
摘自:https://mp.weixin.qq.com/s/GXbFxlExDtjtQe-OPwfokA https://www.cnblogs.com/zhibei/p/9391014.html C ...
pytorch nn.LSTM()参数详解
输入数据格式: input(seq_len, batch, input_size) h0(num_layers * num_directions, batch, hidden_size) c0(num ...

ARToolkit Part 1: Labelling 详解

转载自：https://xgoat.com/wp/2011/05/08/dismantling-artoolkit-part-1-labelling/

Thresholding

Labelling

ARToolkit Part 1: Labelling 详解相关推荐

最新文章

热门文章