trie树查找前缀串

by Julia Geist

Julia·盖斯特(Julia Geist)

A Trie, (also known as a prefix tree) is a special type of tree used to store associative data structures

Trie (也称为前缀树)是一种特殊类型的树,用于存储关联数据结构

A trie (pronounced try) gets its name from retrieval — its structure makes it a stellar matching algorithm.

Trie(发音为try)的名称取自rev val -其结构使其成为出色的匹配算法。

语境 (Context)

Write your own shuffle method to randomly shuffle characters in a string.
Use the words text file, located at /usr/share/dict/words, and your shuffle method to create an anagram generator that only produces real words.
Given a string as a command line argument, print one of its anagrams.

I was presented with this challenge this week at Make School’s Product Academy.

本周,我在Make School的产品学院里接受了这一挑战。

The words in the text file are separated by new lines. Its formatting makes it a lot easier to put the words into a data structure. For now, I’m storing them in a list — each element being a single word from the file.

文本文件中的单词用新行分隔。 它的格式使将单词放入数据结构变得容易得多。 现在,我将它们存储在列表中-每个元素都是文件中的单个单词。

One approach to this challenge is to:

解决这一挑战的一种方法是:

  • randomly shuffle the characters in the string随机随机播放字符串中的字符
  • then, check it against all words that were in /usr/share/dict/words to verify that it’s a real word.

    然后,对照/ usr / share / dict / words中的所有单词检查它,以确认它是真实单词。

However, this approach requires that I check that the randomly shuffled characters in the new string matches one of 235,887 words in that file — that means 235,887 operations for each string that I want to verify as a real word.

但是,这种方法要求我检查新字符串中随机混洗的字符是否与该文件中的235887个单词匹配-这意味着要验证为真实单词的每个字符串需要进行235887次操作

This was an unacceptable solution for me. I first looked up libraries that had already been implemented to check if words exist in a language, and found pyenchant. I first completed the challenge using the library, in a few lines of code.

对我来说,这是不可接受的解决方案。 我首先查找已经实现的库,以检查语言中是否存在单词,然后找到pyenchant 。 我首先使用库通过几行代码完成了挑战。

def generateAnagram(string, language="en_US"):   languageDict = enchant.Dict(language)    numOfPossibleCombinationsForString = math.factorial(len(string))   for i in range(0, numOfPossibleCombinationsForString):       wordWithShuffledCharacters = shuffleCharactersOf(string)
if languageDict.check(wordWithShuffledCharacters):            return wordWithShuffledCharacters     return "There is no anagram in %s for %s." % (language, string)

Using a couple of library functions in my code was a quick and easy solution. However, I didn’t learn much by finding a library to solve the problem for me.

在我的代码中使用几个库函数是一个快速简便的解决方案。 但是,我没有找到可以为我解决问题的图书馆,因此学不到很多东西。

I was positive that the library wasn’t using the approach I mentioned earlier. I was curious and dug through the source code — I found a trie.

我很肯定图书馆没有使用我前面提到的方法。 我很好奇并仔细研究了源代码,发现了一个特里。

特里 (Trie)

A trie stores data in “steps”. Each step is a node in the trie.

特里树以“步骤”存储数据。 每个步骤都是特里树中的一个节点。

Storing words is a perfect use case for this kind of tree, since there are a finite amount of letters that can be put together to make a string.

对于此类树而言,存储单词是一个完美的用例,因为可以将一定数量的字母放在一起构成一个字符串。

Each step, or node, in a language trie will represent one letter of a word. The steps begin to branch off when the order of the letters diverge from the other words in the trie, or when a word ends.

语言树中的每个步骤或节点将代表一个单词的一个字母。 当字母的顺序与特里中的其他单词不同或单词结束时,步骤开始分支。

I created a trie out of directories on my Desktop to visualize stepping down through nodes. This is a trie that contains two words: apple and app.

我从桌面上的目录中创建了一个Trie,以可视化逐步通过节点。 这是一个trie,包含两个词:apple和app。

Note that the end of a word is denoted with a ‘$’. I’m using ‘$’ because it is a unique character that will not be present in any word in any language.

注意,单词的结尾用“ $”表示。 我使用的是“ $”,因为它是一个唯一字符,不会以任何语言出现在任何单词中。

If I were to add the word ‘aperture’ to this trie, I would loop over the letters in the word ‘aperture’ while simultaneously stepping down the nodes in the trie. If the letter exists as a child of the current node, step down into it. If the letter does not exist as a child of the current node, create it and then step down into it. To visualize these steps using my directories:

如果要在此Trie中添加单词“ aperture”,则将遍历单词“ aperture”中的字母,同时降低Trie中的节点。 如果该字母作为当前节点的子代存在,请下移至该节点。 如果该字母不作为当前节点的子代存在,请创建该字母,然后逐步降低该字母。 要使用我的目录可视化这些步骤:

While stepping down the trie, the first two letters of ‘aperture’ are already present in the trie, so I step down into those nodes.

在下移Trie时,“ aperture”的前两个字母已经存在于该Trie中,因此我下移到这些节点。

The third letter, ‘e’, however, is not a child of the ‘p’ node. A new node is created to represent the letter ‘e’, branching off from the other words in the trie. New nodes for the letters that follow are created as well.

但是,第三个字母“ e”不是“ p”节点的子代。 创建一个新的节点来表示字母“ e”,并从该树中的其他单词分支出来。 还创建了随后字母的新节点。

To generate a trie from a words file, this process will happen for each word, until all combinations for every word are stored.

为了从单词文件生成特里树,将对每个单词进行此过程,直到存储每个单词的所有组合。

You might be thinking: “Wait, won’t it take really long to generate the trie from that text file with 235,887 words in it? What’s the point of looping over every single character in every single word?”

您可能会想:“等等,从包含235,887个单词的文本文件生成trie是否真的需要很长时间? 是什么在每一个字,遍历每一个字符的意义呢?”

Yes, iterating over each character of every word to generate a trie does take some time. However, the time taken to create the trie is well worth it — because to check if a word exists in the text file, it takes at most, as many operations as the length of the word itself. Much better than the 235,887 operations it was going to take before.

是的,遍历每个单词的每个字符以生成特里确实需要一些时间。 但是,创建特里树所花的时间是非常值得的 -因为检查文本文件中是否存在单词,它最多花费与单词本身长度一样多的操作。 比之前要进行的235,887次操作好得多

I wrote the simplest version of a trie, using nested dictionaries. This isn’t the most efficient way to implement one, but it is a good exercise to understand the logic behind a trie.

我使用嵌套词典编写了最简单的trie版本。 这不是实现一个的最有效方法,但是了解一个trie背后的逻辑是一个很好的练习。

endOfWord = "$"
def generateTrieFromWordsArray(words):   root = {}   for word in words:      currentDict = root      for letter in word:         currentDict = currentDict.setdefault(letter, {})      currentDict[endOfWord] = endOfWord   return root
def isWordPresentInTrie(trie, word):   currentDict = trie   for letter in word:      if letter in currentDict:         currentDict = currentDict[letter]      else:          return False   return endOfWord in currentDict

You can see my solution for the anagram generator on my Github. Since exploring this algorithm, I’ve decided to make this blog post one of many — each post covering one algorithm or data structure. The code is available on my Algorithms and Data Structures repo — star it to stay updated!

您可以在我的Github上看到我的字谜生成器解决方案 。 自探索该算法以来,我决定将该博客文章设为众多文章之一-每个文章都涉及一种算法或数据结构。 该代码在我的算法和数据结构存储库中可用-对其加注星标以保持更新!

下一步 (Next Steps)

I suggest checking out Ray Wenderlich’s trie repo. Although written in Swift, it’s a valuable source for explanations of various algorithms.

我建议您查看雷·温德利希(Ray Wenderlich)的trie repo 。 尽管是用Swift编写的,但它是解释各种算法的宝贵资源。

Similar to the trie (but more memory efficient) is a suffix tree, or radix. In short, instead of storing single characters at every node, the end of a word, its suffix, is stored and the paths are created relatively.

后缀树或基数与特里树类似(但具有更高的存储效率)。 简而言之,不是在每个节点上都存储单个字符,而是存储单词的结尾及其后缀,并且相对地创建路径。

However, a radix is more complicated to implement than a trie. I suggest taking a look at Ray Wenderlich’s radix repo if you’re interested.

但是,基数的实现比trie的实现更为复杂。 如果您有兴趣,我建议您看看Ray Wenderlich的基数存储库 。

This is the first post of my algorithm and data structures series. In each post, I’ll present a problem that can be better solved with an algorithm or data structure to illustrate the algorithm/data structure itself.

这是我的算法和数据结构系列的第一篇文章。 在每篇文章中,我将介绍一个可以通过算法或数据结构更好地解决的问题,以说明算法/数据结构本身。

Star my algorithms repo on Github and follow me on Twitter if you’d like to follow along!

在Github上为我的算法存储库加注星标,如果您想跟随我,在Twitter上关注我!

Did you gain value by reading this article? Click here to share it on Twitter! If you’d like to see content like this more often, follow me on Medium and subscribe to my once-a-month newsletter below. Feel free to buy me a coffee too.

您通过阅读本文获得了价值吗? 单击此处在Twitter上分享! 如果您想经常看到这样的内容,请在Medium上关注我,并订阅下面的每月一次的新闻通讯。 也可以给我买杯咖啡 。

翻译自: https://www.freecodecamp.org/news/trie-prefix-tree-algorithm-ee7ab3fe3413/

trie树查找前缀串

trie树查找前缀串_Trie数据结构(前缀树)相关推荐

  1. 数据结构树的基本操作_《数据结构》树的基本操作.doc

    <数据结构>树的基本操作 实验四 课程名称:完成日期:姓名:学号:指导教师:实验名称:实验序号:实验成绩:一.实验目的及要求 二.实验环境.实验内容 求出它的深度. .调试过程及实验结果 ...

  2. 数据结构—2-3树、2-3-4树、B树、B+树等多路查找树的原理详解

    详细介绍了多路查找树中的2-3树.2-3-4树.B树.B+树的概念的区别,以及它们的应用场景. 文章目录 1 多路查找树的概述 1.1 索引概述 1.2 多路查找树的引出 2 2-3树 2.1 2-3 ...

  3. python 树状数组_【算法日积月累】19-高级数据结构:树状数组

    树状数组能解决的问题 树状数组,也称作"二叉索引树"(Binary Indexed Tree)或 Fenwick 树. 它可以高效地实现如下两个操作: 1.数组前缀和的查询: 2. ...

  4. 分门别类刷leetcode——高级数据结构(字典树,前缀树,trie树,并查集,线段树)

    目录 Trie树(字典树.前缀树)的基础知识 字典树的节点表示 字典树构造的例子 字典树的前序遍历 获取字典树中全部单词 字典树的整体功能 字典树的插入操作 字典树的搜索操作 字典树的前缀查询 字典树 ...

  5. 【前缀树】C++ 数据结构 字典树

    之前有段时间力扣天天出字典树,当时写得特别熟练,几个月没做都忘得差不多了--今天又出了相关题目,正好复习一下. 文章目录 一.前缀树是什么? 二.实现前缀树 三.例题:添加与搜索单词 总结 一.前缀树 ...

  6. 数据结构:树状数组:姐来展示下什么叫高端前缀和

    树状数组:花式前缀和 一.位运算:lowbit() 二.树状数组介绍 三.离散化 四.实战题目: 五.树状数组变式:支持区间查找的HashMap 我们知道前缀和是十分高效的,可以以O(1)的速度求出任 ...

  7. 海量路由表能够使用HASH表存储吗-HASH查找和TRIE树查找

    千万别! 非常多人这样说,也包括我. Linux内核早就把HASH路由表去掉了.如今就仅仅剩下TRIE了,只是我还是希望就这两种数据结构展开一些形而上的讨论. 1.hash和trie/radix ha ...

  8. 海量路由表可以使用HASH表存储吗-HASH查找和TRIE树查找

    千万别!很多人这样说,也包括我. Linux内核早就把HASH路由表去掉了,现在就只剩下TRIE了,不过我还是希望就这两种数据结构展开一些形而上的讨论. 1.hash和trie/radix hash和 ...

  9. 【HDOJ6955】Xor sum(异或前缀和,01字典树)

    1006 Xor sum 题意: 给出一个长度为n的序列,求一个最短连续子序列满足异或和大于等于k.n<1e5. 思路: 参考CF665E,求序列a中有多少个异或和大于等于k的子序列,枚举所有的 ...

最新文章

  1. 全面掌握ISO8583报文协议
  2. ((sizeof(n)+sizeof(int)-1)~(sizeof(int)-1))
  3. wxWidgets:wxMemoryOutputStream类用法
  4. Web安装项目创建桌面快捷方式
  5. nginx版本升级到1.10
  6. RabbitMQ——work queue
  7. 5款最适合 Windows 命令行/控制台的替代品
  8. python装饰器代码简洁_Python基础知识之装饰器(示例代码)
  9. Mom and Dad
  10. php 查找文件 替换内容,关于php:文件中查找和替换功能困扰
  11. 都说开完电动汽车,就不想再开回燃油车了,是这样吗?
  12. tomcat使用中出现的问题及其解决之道
  13. 关于大型网站技术演进的思考(十三)--网站静态化处理—CSI(5)
  14. (深度剖析结构)模块化解释矢量控制
  15. Android Studio 关联源码
  16. 淘宝母婴商品分析-Python
  17. codevs 1296
  18. python安装requirement.txt的扩展包
  19. Hark的数据结构与算法练习之珠排序
  20. 直播预告|5月24日下午14:00 AAAI 专场来啦

热门文章

  1. 给后辈的一点建议,面试必会
  2. linux下安装nginx1.10,Linux(RHEL7.0)下安装nginx-1.10.2(示例代码)
  3. java-transaction事件
  4. 一个电脑的重装到java开发环境安装配置的全过程
  5. VC连接mysql数据库错误:libmysql.lib : fatal error LNK1113: invalid machine 解决方法
  6. TGE学习笔记07 - 脚本控制动画
  7. synchronized 和Lock区别
  8. disruptor 介绍
  9. ubuntu10.04+hadoop0.20.2平台配置(完全分布式模式)
  10. 系统滚动条实现的NUD控件Unusable版