最近研究多模式匹配算法,有个算法里面提到需要用一个有向无环字图,在网上找不到关于这方面的东西,经过多方面的努力,找到了建图的原理,以及建立的算法。自己编码实现了。

原理及其分析篇幅有点长,需要的可以给我邮件,现在提供算法,以及实现。

builddawg(S)
1. Create a node named source.
2. Let activenode be source.
3. For each word w of S do:*
A. For each letter a of w do:
Let activenode be update (activenode, a).
B. Let activenode be source.
4. Return source.
update (activenode, a)
1. If activenode has an outgoing edge labeled a, then*
A. Let newactivenode be the node that this edge leads to.*
B. If this edge is primary, return newactivenode.*
C. Else, return split (activenode, newactivenode).*
2. Else
A. Create a node named newactivenode.
B. Create a primary edge labeled a from activenode to newactivenode.
C. Let currentnode be activenode.
D. Let suflxnode be undefined.
E. While currentnode isn’t source and sufixnode is undefined do:
i. Let currentnode be the node pointed to by the sufftx pointer of currentnode.
ii. If currentnode has a primary outgoing edge labeled a, then let sufixnode be the
node that this edge leads to.
iii. Else, if currentnode has a secondary outgoing edge labeled a then
a. Let childnode be the node that this edge leads to.
b. Let suffixnode be split (currentnode, childnode).
iv. Else, create a secondary edge from currentnode to newactivenode labeled a

F. If sufixnode is still undefined, let suffixnode be source.
G. Set the suffix pointer of newactivenode to point to sufixnode.
H. Return newactivenode.
split (parentnode, childnode)
1. Create a node called newchildnode.
2. Make the secondary edge from parentnode to childnode into a primary edge from
parentnode to newchildnode (with the same label).
3. For every primary and secondary outgoing edge of childnode, create a secondary outgoing
edge of newchildnode with the same label and leading to the same node.
4. Set the suffix pointer of newchildnode equal to that of childnode.
5. Reset the suffix pointer of childnode to point to newchildnode.
6. Let currentnode be parentnode.
7. While currentnode isn’t source do:
A. Let currentnode be the node pointed to by the suffrx pointer of currentnode.
B. If currentnode has a secondary edge to childnode, then make it a secondary edge to
newchildnode (with the same label).
C. Else, break out of the while loop.
8. Return newchildnode.

实现:

CDawg.h

#ifndef _DAWG_GRAPHIC_H
#define _DAWG_GRAPHIC_H
#define ALPHABET_SIZE 256
#define MAX_STATE 100
#define PATTERN_LEN 50
#include <map>
#include<iostream>
using namespace std;

struct CDawg_node
{
    short node_order;
    short next[ALPHABET_SIZE];
    map<char,char> edge_type;  
    CDawg_node* suffix_node;
    CDawg_node(short order)
    {
        node_order = order;
        memset(next, 0, ALPHABET_SIZE*sizeof(short));
        suffix_node = 0;
    }
    void set_next_state(char accept_char, int next_state, char is_primary)
    {
        next[accept_char] = next_state;
        edge_type[accept_char] = is_primary;
    }
};

class CDawg
{
private:
    CDawg_node* source;
    CDawg_node* state_set[MAX_STATE];
    short assign_state;
public:
    CDawg(char* pats)
    {
        assign_state = 0;
        source = new CDawg_node(assign_state ++);
        state_set[0] = source;
        __setup_dawg(pats);
    }
    void print_dawg();
    bool is_over_state(short state)
    {
        CDawg_node* __temp_dawg_node = state_set[state];
        return __temp_dawg_node->edge_type.size() == 0;
    }

short get_next_state(short state, char accpet_char)
    {
        CDawg_node* _temp_dawg_node = state_set[state];
        short  next_state = _temp_dawg_node->next[accpet_char];
        if( next_state == 0)
        {
            return -1;
        }
        return next_state;
    }

private:
    void __setup_dawg(char* pats);
    CDawg_node* __update_dawg(CDawg_node* active_node, char accept_char);
    CDawg_node* __split_node(CDawg_node* parent, CDawg_node* child, char accept_char);
};
#endif

CDawg.cpp

#include "daw_graphic.h"

void CDawg::__setup_dawg(char* pats)
{
    CDawg_node* active_node = source;
    int len = strlen(pats);
    for(int i = 0; i < len; i ++)
    {
        if(pats[i] == ' ')
        {
            active_node = source;
        }
        else
        {
            active_node = __update_dawg(active_node, pats[i]);
        }
    }
}

CDawg_node* CDawg::__update_dawg(CDawg_node* active_node, char accept_char)
{
    short next_state = active_node->next[accept_char];
    CDawg_node* new_active_node;
    if(next_state != 0)
    {
        new_active_node = state_set[next_state];
        if(active_node->edge_type[accept_char] == 2)
        {
            return new_active_node;
        }
        else
        {
            return __split_node(active_node, new_active_node, accept_char);
        }
    }
    else
    {
        CDawg_node* cur_node = active_node;
        CDawg_node* suffix_node = 0;
        CDawg_node* child_node = 0;
        active_node->set_next_state(accept_char, assign_state, 2);
        new_active_node = new CDawg_node(assign_state);
        state_set[assign_state ++] = new_active_node;
        
        while(cur_node != source && suffix_node == 0)
        {
            cur_node = cur_node->suffix_node;//TODO
            
            if(cur_node->edge_type.find(accept_char) != cur_node->edge_type.end() 
                && cur_node->edge_type[accept_char] == 2)
            {
                suffix_node = state_set[cur_node->next[accept_char]];
            }

else if(cur_node->edge_type.find(accept_char) != cur_node->edge_type.end()
                && cur_node->edge_type[accept_char] == 1)
            {
                child_node = state_set[cur_node->next[accept_char]];
                suffix_node = __split_node(cur_node, child_node, accept_char);
            }
            else
            {
                cur_node->set_next_state(accept_char, new_active_node->node_order, 1);
            }
        }

if(suffix_node == 0)
        {
            suffix_node = source;
        }
        new_active_node->suffix_node = suffix_node;
        return new_active_node;
    }
}

CDawg_node* CDawg::__split_node(CDawg_node* parent, CDawg_node* child, char accept_char)
{
    CDawg_node* new_dawg_node = new CDawg_node(assign_state);
    parent->set_next_state(accept_char, assign_state, 2);
    
    state_set[assign_state ++] = new_dawg_node;
    
    typedef map<char,char>::iterator CB;
    for(CB p = child->edge_type.begin(); p != child->edge_type.end(); p ++)
    {
        char key = p->first;
        short state = child->next[key];
        new_dawg_node->set_next_state(key, state, 1);
    }
    
    new_dawg_node->suffix_node = child->suffix_node;
    child->suffix_node = new_dawg_node;
    CDawg_node* cur_node = parent;

while (cur_node != source)
    {
        cur_node = cur_node->suffix_node;
        short __find_state = child->node_order;
        bool is_ret = false;
        
        
        for(CB q = cur_node->edge_type.begin(); q != cur_node->edge_type.end(); q ++)
        {
            char state_char = q->first;
            if(cur_node->next[state_char] == __find_state && ( q->second) == 1)
            {
                cur_node->set_next_state(state_char, new_dawg_node->node_order, 1);
                is_ret = true;
                break;
            }
        }
        if(is_ret)
        {
            break;
        }
    }
    return new_dawg_node;
}

void CDawg::print_dawg()
{
    for(int i = 0; i < assign_state; i ++)
    {
        CDawg_node* _temp_dawg_node = state_set[i];
        for(int j = 0; j < ALPHABET_SIZE; j ++)
        {
            if(_temp_dawg_node->next[j] != 0)
            {
                cout<<i<<"--"<<(char)j<<"-->"<<_temp_dawg_node->next[j]<<endl;
            }
        }
        cout<<endl;
    }
}

dawg (directed acyclic word graphic)相关推荐

  1. 有向无环图Directed Acyclic Graph(DAG)

    1.DAG 有向无环图Directed Acyclic Graph(DAG) DAG是一个没有 有向循环的.有限的有向图 . 它由有限个顶点和有向边组成,每条有向边都从一个顶点指向另一个顶点: 从任意 ...

  2. java 有向无环图 树_拓扑排序-有向无环图(DAG, Directed Acyclic Graph)

    条件: 1.每个顶点出现且只出现一次. 2.若存在一条从顶点 A 到顶点 B 的路径,那么在序列中顶点 A 出现在顶点 B 的前面. 有向无环图(DAG)才有拓扑排序,非DAG图没有拓扑排序一说. 一 ...

  3. Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm

    原文:Kalisch M, Buehlmann P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm ...

  4. 有向图php,【小龙的资结演算法秘籍】(8) 有向图(directed graph)及DAG(directed acyclic graph)的详细介绍...

    哈啰~ 大家好, 之前在[小马的资结演算法秘笈](6)超好懂的图(gragh)与树(tree) 的观念介绍介绍过什幺是graph, 那什幺是directed graph呢? 其实很简单,undirec ...

  5. 采用DAWG方式在大批量字符串中查询字符串

    最近在一个项目中需要针对上百万条(大约在800W条)的字符串进行相关的处理.该字符串是以文本的形式存放在本地硬盘,并且更新频率为20分钟一次. 具体需求: 1.判断某一个字符串是否存在这800W条字符 ...

  6. O(n)线性构造后缀树详解(一)

    声明: 此为 Esko Ukkonen 论文翻译,由于本人才疏学浅,为了使用后缀树来进行DNA匹配,翻译此论文,完全是顺带之举,如有错误,请见谅!同时也是发现网上类似资料都不完整,顾发出翻译原版论文来 ...

  7. Trie(前缀树/字典树)及其应用

    from:https://www.cnblogs.com/justinh/p/7716421.html Trie,又经常叫前缀树,字典树等等.它有很多变种,如后缀树,Radix Tree/Trie,P ...

  8. Trie树(字典树)详细知识点及其应用

    Trie,又经常叫前缀树,字典树等等.它有很多变种,如后缀树,Radix Tree/Trie,PATRICIA tree,以及bitwise版本的crit-bit tree.当然很多名字的意义其实有交 ...

  9. 后缀树 Suffix Tree

    今天看到一个很神奇的字符串搜索的算法--后缀树,之前真是孤陋寡闻啊.而且后缀树的资料好像还不那么多,讲的也不完全清楚,搜索再三,发现了一些讲得比较清楚的文章: 1. 关于后缀树:http://blog ...

最新文章

  1. asp.net程序性能优化的七个方面
  2. 扔掉Swagger,试试这款功能强大,零注解侵入的API接口文档生成工具!
  3. 指南:从学者到创业者
  4. MySQL • 源码分析 • 内存分配机制
  5. Lucene.Net 精品教程
  6. python和php互动_PHP中常见的五种设计模式
  7. 求最大公约数问题(信息学奥赛一本通-T1207)
  8. 【golang-GUI开发】struct tags系统(一)
  9. quartusII编译时出现Error (119013): Current license file does not support the EP4CE6F17C8 device
  10. 美团|商家数据指标体系搭建实例 。
  11. 阿里云香港云服务器ECS适合什么场景?
  12. html图片显示详情,纯CSS鼠标经过图片视差弹出层显示详情链接按钮特效代码.html...
  13. 迎新春 送温暖——郧阳小红花健康守护包发放
  14. Python同时显示多张图片在一个画面中(两种方法)
  15. 【每日新闻】IDC:国产手机品牌在印度拿下66%的市场份额
  16. hangye5:09年做行业网址导航不如做行业网站联盟
  17. 马科维茨的均值一方差组合模型(转载)
  18. 关于xlrd.biffh.XLRDError: Excel xlsx file; not supported报错问题的两种解决方案
  19. A * 算法(机器人路径避障规划)
  20. Windows Server 2008 R2 英文版 修改桌面主题(Win7主题)

热门文章

  1. [原创]Linux下undefinednbsp;refe…
  2. 计算机一级证二级证是什么样的,计算机一级证和二级证有什么区别?
  3. [学习笔记] [机器学习] 7. 集成学习(Bagging、随机森林、Boosting、GBDT)
  4. 初中数学503个必考知识点_初中生物必考知识点总结_生物必考知识点指南
  5. 【C编程】找出7个默森尼数。法国数学家默森尼曾提出下列公式:Mp=2^p-1。当p是素数并且Mp也是素数时,称Mp为默森尼数,
  6. CC-BY-NC-SA (创作共用许可协议)
  7. 《惢客创业日记》2019.10.28(周一)放纵的代价
  8. python_特征离散化Binarizerdigitize
  9. round在python是什么意思_python – round()和numpy.round()之间的底层差异是什么?
  10. Windows 不能上网怎么办