字符串之子字符串查找
子字符串查找
暴力子字符串查找算法
代码实现
'''
Implementation of a violent substring lookup algorithm
Args:pat: pattern stringtxt: text
Return:match successfully, return i which is the location of pat in txtfailed, return the length of txt
'''def search1(pat, txt):M = len(pat) # the length of patternN = len(txt) # the length of textfor i in range(N-M+1): # i is following txtm = 0 # the value of jfor j in range(M): # j is following patif txt[i+j] != pat[j]:breakm += 1if m == M:return i # we find the matchreturn N # not founddef search2(pat, txt):M = len(pat) # the length of patternN = len(txt) # the length of texti = 0 # i is following txtj = 0 # j is following patwhile(i<N): if txt[i] == pat[j]:j += 1i += 1else:i -= j-1j = 0if j == M:return i-M # we find the matchreturn N # not found
KMP子字符串查找算法之前沿——确定有限状态自动机
- i = 0, j = 0, txt[i] = pat[j] = 'A', dfa[ord('A')][j] = 1,则 next[j] = 1, i++;
- i = 1, j = 1, txt[i] = pat[j] = 'B', dfa[ord('B')][j] = 2, 则 next[j] = 2, i++;
- i = 2, j = 2, txt[i] = pat[j] = 'A', dfa[ord('A')][j] = 3, 则 next[j] = 3, i++;
- i = 3, j = 3, txt[i] = pat[j] = 'B', dfa[ord('B')][j] = 4, 则 next[j] = 4, i++;
- i = 4, j = 4, txt[i] = 'C', pat[j] = 'A', dfa[ord('C')][j] = 0, 则 next[j] = 0, i++;
- i = 5, j = 0, txt[i] = pat[j] = 'A', dfa[ord('A')][j] = 1, 则 next[j] = 1, i++;
- i = 6, j = 1, txt[i] = pat[j] = 'B', dfa[ord('B')][j] = 2, 则 next[j] = 2, i++;
- i = 7, j = 2, txt[i] = pat[j] = 'A', dfa[ord('A')][j] = 3, 则 next[j] = 3, i++;
- i = 8, j = 3, txt[i] = pat[j] = 'B', dfa[ord('B')][j] = 4, 则 next[j] = 4, i++;
- i = 9, j = 4, txt[i] = pat[j] = 'A', dfa[ord('A')][j] = 5, 则 next[j] = 5, i++;
- i = 10, j = 5, txt[i] = pat[j] = 'C', dfa[ord('C')][j] = 6, 则 next[j] = 6, i++;
- i = 11, j = 6, j == len(pat), 匹配转换,到达停止状态, 则返回查找匹配成功的位置 i - j = 5(从0开始)。
- 对于匹配失败的情况,我们将 dfa[][X] 复制 dfa[][j]。
- 对于匹配成功的情况,我们将 dfa[ord(pat[j])][j] 设为 j+1,也就是模式字符的下一个位置。
- 更新X的状态,X为匹配后的重启状态。
代码实现
'''
Implementation of a Knuth-Morris-Pratt substring lookup algorithmord() : Converts ASCII characters to corresponding values
'''
class KMP:def __init__(self, pat, txt, R = 256):self.__pat = patself.__txt = txtself.R = R # ASCII numself.dfa = [[0 for i in range(len(self.__pat))] for j in range(self.R)] # DFA'''Args:pat: patternReturn:dfa'''def __initdfa(self):self.dfa[ord(self.__pat[0])][0] = 1X = 0 # restarting indexfor j in range(1,len(self.__pat)): # count dfa[][j]for c in range(self.R):self.dfa[c][j] = self.dfa[c][X]self.dfa[ord(self.__pat[j])][j] = j+1X = self.dfa[ord(self.__pat[j])][X] # update restarting index'''Args:txt: textReturn:match successfully, return i-M which is the location of pat in txtfailed, return the length of txt N'''def search(self):self.__initdfa()M = len(self.__pat) # the length of patternN = len(self.__txt) # the length of texti = 0 # i is following txtj = 0 # j is following patwhile(i<N): j = self.dfa[ord(self.__txt[i])][j]i += 1if j == M:return i-M # we find the matchreturn N # not found
KMP子字符串查找算法
- i = 0, j = 0, d = 0, txt[i] = pat[j] = 'A', 则 next[j] = 1, i++, dd = 0 + (1-s[0]) = 1;
- i = 1, j = 1, d = 0, txt[i] = pat[j] = 'B', 则 next[j] = 2, i++, dd = 0 + (2-s[1]) = 2;
- i = 2, j = 2, d = 0, txt[i] = pat[j] = 'A', 则 next[j] = 3, i++, dd = 0 + (3-s[2]) = 2;
- i = 3, j = 3, d = 0, txt[i] = pat[j] = 'B', 则 next[j] = 4, i++, dd = 0 + (4-s[3]) = 2;
- i = 4, j = 4, d = 0, txt[i] = 'C', pat[j] = 'A', txt[i] != pat[j] ,则 next->d = 2, next[j] = 2;
- i = 4, j = 2, d = 2, txt[i] = 'C', pat[j] = 'A, txt[i] != pat[j], 则 next[j] = 0, i++, next->d = 5-0 = 5;
- i = 5, j = 0, d = 5, txt[i] = pat[j] = 'A', 则 next[j] = 1, i++, dd = 6;
- i = 6, j = 1, d = 5, txt[i] = pat[j] = 'B', 则 next[j] = 2, i++, dd = 7;
- i = 7, j = 2, d = 5, txt[i] = pat[j] = 'A', 则 next[j] = 3, i++, dd = 7;
- i = 8, j = 3, d = 5, txt[i] = pat[j] = 'B', 则 next[j] = 4, i++, dd = 7;
- i = 9, j = 4, d = 5, txt[i] = pat[j] = 'A', 则 next[j] = M = 5, 达到终止状态, 匹配成功。返回d=5(从0开始)。
j
|
0
|
1
|
2
|
3
|
4
|
---|---|---|---|---|---|
pat[j]
|
A
|
B
|
A
|
B
|
A
|
s[j]
|
0
|
0
|
1
|
2
|
3
|
代码实现
'''
Implementation of a Knuth-Morris-Pratt substring lookup algorithm'''
class KMP:def __init__(self, pat, txt):self.__pat = patself.__txt = txtself.s = [0 for j in range(len(self.__pat))] # s'''Find the maximum same prefix length for a stringArgs:pat: patternReturn:s'''def __inits(self):self.s[0] = 0 # Maximum prefix length of the first character of a template string is 0X = 0 # restarting indexfor j in range(1,len(self.__pat)): # count s[j]while (X > 0 and self.__pat[j] != self.__pat[X]):X = self.s[X-1] # update restarting indexif self.__pat[j] == self.__pat[X]:X += 1self.s[j] = X'''Args:txt: textpat: patternReturn:match successfully, return d which is the location of pat in txtfailed, return the length of txt Nj 0 1 2 3 4pat[j] A B A B As[j] 0 0 1 2 3''' def search(self):#s = [0,0,1,2,3]self.__inits()M = len(self.__pat) # the length of patternN = len(self.__txt) # the length of texti = 0 # i is following txtj = 0 # j is following patd = 0 # current distancedd = 0 # willing distancewhile(i < N): if self.__txt[i] == self.__pat[j]:dd = d + (j + 1 - self.s[j])print i,j,d,ddi += 1j += 1else:d = ddj = i - dif j < 0:i += 1j = 0dd = d + (j + 1 - self.s[j])print i,j,d,ddif j == M:return d # we find the matchreturn N # not found
字符串之子字符串查找相关推荐
- Python判断字符串包含子字符串(个数、索引、全部位置)
从左向右查找子串,存在则输出子串首字符的索引值,不存在则输出-1 # find()a = 'love you' b = 'you' c = 'no' print(a.find(b)) #5 print ...
- 如何在Python中对字符串进行子字符串化
Python offers many ways to substring a string. It is often called 'slicing'. Python提供了许多对字符串进行子字符串化的 ...
- C语言检查一个字符串是否为另一个字符串的子字符串的算法(附完整源码)
C语言检查一个字符串是否为另一个字符串的子字符串的算法 C语言检查一个字符串是否为另一个字符串的子字符串的算法完整源码(定义,实现,main函数测试) C语言检查一个字符串是否为另一个字符串的子字符串 ...
- js字符串slice_JavaScript子字符串示例-JS中的Slice,Substr和Substring方法
js字符串slice In daily programming, we often need to work with strings. Fortunately, there are many bui ...
- Python是否具有字符串“包含”子字符串方法?
我正在寻找Python中的string.contains或string.indexof方法. 我想要做: if not somestring.contains("blah"):co ...
- 如何在Python中获取字符串的子字符串?
有没有一种方法可以在Python中为字符串加上字符串,以从第三个字符到字符串的末尾获取新的字符串? 也许像myString[2:end] ? 如果离开第二部分意味着"直到最后",而 ...
- Shell 如何判断字符串包含子字符串
包含子字符串 #!/bin/bash # string='hello world' sub='hello'if [[ $string =~ $sub ]] # if [[ $string = *$su ...
- Java源码-判断两个字符串的子字符串是否匹配(Comparing Portions of Strings)
字符串在现实中几乎无所不在,所有文本都可以看做是字符串,因为实用,所以"挺好玩!". 当然,编程的目的肯定不止为了好玩. 代码如下: import java.util.Scanne ...
- LeetCode简单题之最长的美好子字符串
题目 当一个字符串 s 包含的每一种字母的大写和小写形式 同时 出现在 s 中,就称这个字符串 s 是 美好 字符串.比方说,"abABB" 是美好字符串,因为 'A' 和 'a' ...
最新文章
- 像我这种垃圾学校出来的人...【原话,不是我编的】
- 这个椅子哪里卖?我也想买啊!
- 压缩视频 html5播放,将HTML5视频呈现为Canvas正在压缩图像
- 模式识别与智能系统和计算机视觉,天津大学模式识别与智能系统
- 《剑指offer》——04. 二维数组中的查找——暴力法、线性查找——java实现
- mysql删除数据不会减少存储占用_Mysql单文件存储删除数据文件容量不会减少的bug与解决方法...
- 西北大学计算机转专业,西北大学可以转专业吗,西北大学新生转专业政策
- C# 数组增加元素_C#的集合类型及使用技巧
- Vue应用框架整合与实战--Vue技术生态圈篇
- 大学mysql期末试题_四川大学数据库系统期末试题2014-2015.doc
- LightGBM-GBDT-LR使用树集合进行特征转换
- 【静脉检测】基于matlab手指静脉图像检测【含Matlab源码 1654期】
- java识别验证码图片_Java识别图像、验证码
- java 图片压缩 base64,图片的尺寸 大小压缩 和转化为base64
- CISC和RISC的区别
- 【技能教学】如何通过FFMPEG编码推RTSP视频直播流到EasyDarwin开源平台时叠加时间水印?
- Swift中键盘的弹出隐藏,页面抬高,Return键等的配置
- Inner Join与Left Join
- hashmap和数组哪个速度快
- CIU软考联盟:软件设计师上午试题解析-操作系统篇