kaldi中hashlist阅读总结

kaldi中的解码算法里，需要记录很多的令牌（token）。每个令牌，都是一条路径的“头”，通过这个令牌回溯，就可以得到一条完整的路径。如果解码到最后一帧，从所有的令牌中，找到得分最优的那个的令牌，回溯得到路径，其路径上的输出，就是识别结果。（one-bese结果）
在解码过程中，会产生很多的令牌。需要设计一种数据结构和相关算法，用来保存和更新令牌。其设计要求可以简单概括如下。
1、可以快速判断某个令牌是否已经存在
2、可以快速插入令牌（令牌不存在的时候）
3、可以很容易地遍历所有的令牌
kaldi的作者设计了一种HashList结果用来保存令牌。HashList跟一般数据结构书籍里介绍的hash表类似，但比它要复杂一些。
这里先介绍一下通常数据结构书籍里描述的hash表。
hash表，是人为设计的一种数据结构。通过一个hash函数，将某个集合中的元素，映射到一个hash表（一般用数组之类的来实现）中。通过这个hash表，可以判断某个元素是否已经存在，并且可以快速插入新数据。
hash函数常用的一种方法是除留余数法，就是对某个数（一般是hash表的大小）做模运算，得到的余数作为索引index，依据index来存放数据。两个数据通过hash函数运算之后，可以得到相同的索引值，就会产生冲突。hash表解决冲突的方法有链表法和开放地址法。

下面举例说明下hash表
1、hash表大小为5。采用除留余数法作为hash函数。采用链地址法解决冲突。
2、依次插入的数据分布是1, 8, 3, 10, 15, 5, 21, 18。
通过上面的描述，我们可以得到下面的hash表。

kaldi中HashList也是一种hash表。它跟上面例子中介绍的链地址法hash表类似，但也有不一样的地方。假如还是使用上面例子中的条件，则HashList产生的hash表，画出来如下图所示。

可以看出，HashList的结构，跟普通的链表hash不同的主要有以下几点
1、HashList中hash项指向的元素，是链表的最后一个元素（last_elem）。
2、HashList中hash项有个特定的域（pre_bucket，上面图片中红色的数字），通过这个可以找到上一个hash项，然后找到本链表的头结点
3、有一个额外的头结点（list_head），通过它可以访问所有的结点。

如果想要查找某个数是否在hash表中，通过取模得到余数，也就是索引index。如果对应的hash项为空，表示未找到。如果hash项不为空，则通过上面的1和2两点，可以确定链表的头和尾巴，在链表中搜索即可。

kaldi中每计算一帧数据，都伴随着HashList中元素的新建和销毁。如果调用系统的new和delete操作，会带来很大的影响。HashList中会预先申请一块内存（一个数组），然后new和delete元素，就是在这些数组上操作的。HashList中的变量allocated_和freed_head_是跟内存相关的变量；New()和Delete()是跟分配和回收相关的操作。

下面是hashlist的原代码，大家可以仔细阅读体会。

// util/hash-list.h// Copyright 2009-2011   Microsoft Corporation
//                2013   Johns Hopkins University (author: Daniel Povey)// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//  http://www.apache.org/licenses/LICENSE-2.0
//
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.#ifndef KALDI_UTIL_HASH_LIST_H_
#define KALDI_UTIL_HASH_LIST_H_
#include <vector>
#include <set>
#include <algorithm>
#include <limits>
#include <cassert>
#include "util/stl-utils.h"/* This header provides utilities for a structure that's used in a decoder (butis quite generic in nature so we implement and test it separately).Basically it's a singly-linked list, but implemented in such a way that wecan quickly search for elements in the list.  We give it a slightly richerinterface than just a hash and a list.  The idea is that we want to separatethe hash part and the list part: basically, in the decoder, we want to have asingle hash for the current frame and the next frame, because by the time weneed to access the hash for the next frame we no longer need the hash for theprevious frame.  So we have an operation that clears the hash but leaves thelist structure intact.  We also control memory management inside this object,to avoid repeated new's/deletes.See hash-list-test.cc for an example of how to use this object.
*/namespace kaldi {template<class I, class T> class HashList {public:struct Elem {I key;T val;Elem *tail;};/// Constructor takes no arguments./// Call SetSize to inform it of the likely size.HashList();/// Clears the hash and gives the head of the current list to the user;/// ownership is transferred to the user (the user must call Delete()/// for each element in the list, at his/her leisure).Elem *Clear();/// Gives the head of the current list to the user.  Ownership retained in the/// class.  Caution: in December 2013 the return type was changed to const/// Elem* and this function was made const.  You may need to change some types/// of local Elem* variables to const if this produces compilation errors.const Elem *GetList() const;/// Think of this like delete().  It is to be called for each Elem in turn/// after you "obtained ownership" by doing Clear().  This is not the opposite/// of. Insert, it is the opposite of New.  It's really a memory operation.inline void Delete(Elem *e);/// This should probably not be needed to be called directly by the user./// Think of it as opposite/// to Delete();inline Elem *New();/// Find tries to find this element in the current list using the hashtable./// It returns NULL if not present.  The Elem it returns is not owned by the/// user, it is part of the internal list owned by this object, but the user/// is free to modify the "val" element.inline Elem *Find(I key);/// Insert inserts a new element into the hashtable/stored list.  By calling/// this,/// the user asserts that it is not already present (e.g. Find was called and/// returned NULL).  With current code, calling this if an element already///  exists will result in duplicate elements in the structure, and Find()///  will find the first one that was added./// [but we don't guarantee this behavior].inline void Insert(I key, T val);/// Insert inserts another element with same key into the hashtable//// stored list./// By calling this, the user asserts that one element with that key is/// already present./// We insert it that way, that all elements with the same key/// follow each other./// Find() will return the first one of the elements with the same key.inline void InsertMore(I key, T val);/// SetSize tells the object how many hash buckets to allocate (should/// typically be at least twice the number of objects we expect to go in the/// structure, for fastest performance).  It must be called while the hash/// is empty (e.g. after Clear() or after initializing the object, but before/// adding anything to the hash.void SetSize(size_t sz);/// Returns current number of hash buckets.inline size_t Size() { return hash_size_; }~HashList();private:struct HashBucket {size_t prev_bucket;  // index to next bucket (-1 if list tail).  Note:// list of buckets goes in opposite direction to list of Elems.Elem *last_elem;  // pointer to last element in this bucket (NULL if empty)inline HashBucket(size_t i, Elem *e): prev_bucket(i), last_elem(e) {}};Elem *list_head_;  // head of currently stored list.size_t bucket_list_tail_;  // tail of list of active hash buckets.size_t hash_size_;  // number of hash buckets.std::vector<HashBucket> buckets_;Elem *freed_head_;  // head of list of currently freed elements. [ready for// allocation]std::vector<Elem*> allocated_;  // list of allocated blocks.static const size_t allocate_block_size_ = 1024;  // Number of Elements to// allocate in one block.  Must be largish so storing allocated_ doesn't// become a problem.
};}  // end namespace kaldi#include "util/hash-list-inl.h"#endif  // KALDI_UTIL_HASH_LIST_H_

类的实现代码

// util/hash-list-inl.h// Copyright 2009-2011   Microsoft Corporation
//                2013   Johns Hopkins University (author: Daniel Povey)// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//  http://www.apache.org/licenses/LICENSE-2.0
//
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.#ifndef KALDI_UTIL_HASH_LIST_INL_H_
#define KALDI_UTIL_HASH_LIST_INL_H_// Do not include this file directly.  It is included by fast-hash.hnamespace kaldi {template<class I, class T> HashList<I, T>::HashList() {list_head_ = NULL;bucket_list_tail_ = static_cast<size_t>(-1);  // invalid.hash_size_ = 0;freed_head_ = NULL;
}template<class I, class T> void HashList<I, T>::SetSize(size_t size) {hash_size_ = size;KALDI_ASSERT(list_head_ == NULL &&bucket_list_tail_ == static_cast<size_t>(-1));  // make sure empty.if (size > buckets_.size())buckets_.resize(size, HashBucket(0, NULL));
}template<class I, class T>
typename HashList<I, T>::Elem* HashList<I, T>::Clear() {// Clears the hashtable and gives ownership of the currently contained list// to the user.for (size_t cur_bucket = bucket_list_tail_;cur_bucket != static_cast<size_t>(-1);cur_bucket = buckets_[cur_bucket].prev_bucket) {buckets_[cur_bucket].last_elem = NULL;  // this is how we indicate "empty".}bucket_list_tail_ = static_cast<size_t>(-1);Elem *ans = list_head_;list_head_ = NULL;return ans;
}template<class I, class T>
const typename HashList<I, T>::Elem* HashList<I, T>::GetList() const {return list_head_;
}template<class I, class T>
inline void HashList<I, T>::Delete(Elem *e) {e->tail = freed_head_;freed_head_ = e;
}template<class I, class T>
inline typename HashList<I, T>::Elem* HashList<I, T>::Find(I key) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];if (bucket.last_elem == NULL) {return NULL;  // empty bucket.} else {Elem *head = (bucket.prev_bucket == static_cast<size_t>(-1) ?list_head_ :buckets_[bucket.prev_bucket].last_elem->tail),*tail = bucket.last_elem->tail;for (Elem *e = head; e != tail; e = e->tail)if (e->key == key) return e;return NULL;  // Not found.}
}template<class I, class T>
inline typename HashList<I, T>::Elem* HashList<I, T>::New() {if (freed_head_) {Elem *ans = freed_head_;freed_head_ = freed_head_->tail;return ans;} else {Elem *tmp = new Elem[allocate_block_size_];for (size_t i = 0; i+1 < allocate_block_size_; i++)tmp[i].tail = tmp+i+1;tmp[allocate_block_size_-1].tail = NULL;freed_head_ = tmp;allocated_.push_back(tmp);return this->New();}
}template<class I, class T>
HashList<I, T>::~HashList() {// First test whether we had any memory leak within the// HashList, i.e. things for which the user did not call Delete().size_t num_in_list = 0, num_allocated = 0;for (Elem *e = freed_head_; e != NULL; e = e->tail)num_in_list++;for (size_t i = 0; i < allocated_.size(); i++) {num_allocated += allocate_block_size_;delete[] allocated_[i];}if (num_in_list != num_allocated) {KALDI_WARN << "Possible memory leak: " << num_in_list<< " != " << num_allocated<< ": you might have forgotten to call Delete on "<< "some Elems";}
}template<class I, class T>
void HashList<I, T>::Insert(I key, T val) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];Elem *elem = New();elem->key = key;elem->val = val;if (bucket.last_elem == NULL) {  // Unoccupied bucket.  Insert at// head of bucket list (which is tail of regular list, they go in// opposite directions).if (bucket_list_tail_ == static_cast<size_t>(-1)) {// list was empty so this is the first elem.KALDI_ASSERT(list_head_ == NULL);list_head_ = elem;} else {// link in to the chain of Elemsbuckets_[bucket_list_tail_].last_elem->tail = elem;}elem->tail = NULL;bucket.last_elem = elem;bucket.prev_bucket = bucket_list_tail_;bucket_list_tail_ = index;} else {// Already-occupied bucket.  Insert at tail of list of elements within// the bucket.elem->tail = bucket.last_elem->tail;bucket.last_elem->tail = elem;bucket.last_elem = elem;}
}template<class I, class T>
void HashList<I, T>::InsertMore(I key, T val) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];Elem *elem = New();elem->key = key;elem->val = val;KALDI_ASSERT(bucket.last_elem != NULL);  // assume one element is already hereif (bucket.last_elem->key == key) {  // standard behavior: add as last elementelem->tail = bucket.last_elem->tail;bucket.last_elem->tail = elem;bucket.last_elem = elem;return;}Elem *e = (bucket.prev_bucket == static_cast<size_t>(-1) ?list_head_ : buckets_[bucket.prev_bucket].last_elem->tail);// find place to insert in linked listwhile (e != bucket.last_elem->tail && e->key != key) e = e->tail;KALDI_ASSERT(e->key == key);  // not found? - should not happenelem->tail = e->tail;e->tail = elem;
}}  // end namespace kaldi#endif  // KALDI_UTIL_HASH_LIST_INL_H_

kaldi中hashlist阅读总结相关推荐

kaldi中的数据准备
数据准备译者:V (shiwei@sz.pku.edu.cn) 水平有限,如有错误请多包涵. @wbglearn校对. 介绍在运行完示例脚本后(见Kaldi tutorial),你可能会想用 ...
Kaldi中声纹识别的流程图
总结了一波Kaldi中声纹识别的流程和所用的可执行文件,. 把可执行文件当作一个库来用,自己来仿照这sre08,sre10,或者aishell的run.sh用自己的数据来完成自己的声纹识别系统就好. ...
运行kaldi中遇到的问题总结
最近在跑kaldi中的程序,中间遇到了一些问题,总结一下,之后还会不断更新. 2018/5/30 1.问题: "在运行中run.sh时遇到 "queue.pl: Error sub ...
如何在Adobe Reader中存储阅读的Session
如何在Adobe Reader中存储阅读的Session 当你同时打开了若干个pdf文件后,突然要求更新Adobe Reader,杀毒,甚至windows补丁.诸位有没有遭遇过?这时候,就需要你关闭并 ...
kaldi中的深度神经网络
这个文档主要来说kaldi中Karel Vesely部分的深度神经网络代码. 如果想了解kaldi的全部深度神经网络代码,请Deep Neural Networks in Kaldi, 和Dan的版本 ...
edge 看pdf阅读视图_如何在Microsoft Edge中使用阅读视图
edge 看pdf阅读视图 The Reading View in Microsoft Edge removes ads and unnecessary images, simplifying pag ...
Kaldi中DNN的实现
本文主要讲解kaldi中run.sh和run_tdnn.sh的代码,从中了解Kaldi的DNN的实现. 在 kaldi 训练过程中,DNN 的训练是主要是依赖于 GMM-HMM 模型的,通过 GMM- ...
kaldi中的chain model(LFMMI)详解
chain model的结构 chain model实际上是借鉴了CTC的思想,引入了blank用来吸收不确定的边界.但CTC只有一个blank,而chain model中每一个建模单元都有自己的bl ...
kaldi中的声纹识别
kaldi中的声纹识别文章目录 kaldi中的声纹识别 kaldi的安装运行aishell例程使用TIMIT数据库进行声纹识别 kaldi中声纹识别的流程我的博客:https://yutouw ...

kaldi中hashlist阅读总结

kaldi中hashlist阅读总结相关推荐

最新文章

热门文章