题目:
http://acm.swust.edu.cn/#/problem/572/490
题目描述
题目内容来自:https://en.wikipedia.org/w/index.php?title=Boyer%E2%80%93Moore_string-search_algorithm&oldid=280422137
The Boyer–Moore string search algorithm is a particularly efficient string searching algorithm. It was developed by Bob Boyer and J Strother Moore in 1977. The algorithm preprocesses the target string (key) that is being searched for, but not the string being searched (unlike some algorithms which preprocess the string to be searched, and can then amortize the expense of the preprocessing by searching repeatedly). The execution time of the Boyer-Moore algorithm can actually be sub-linear: it doesn’t need to actually check every character of the string to be searched but rather skips over some of them. Generally the algorithm gets faster as the key being searched for becomes longer. Its efficiency derives from the fact that, with each unsuccessful attempt to find a match between the search string and the text it’s searching in, it uses the information gained from that attempt to rule out as many positions of the text as possible where the string could not match.

How the algorithm works

What people frequently find surprising about the Boyer-Moore algorithm when they first encounter it is that its verifications – its attempts to check whether a match exists at a particular position – work backwards. If it starts a search at the beginning of a text for the word “ANPANMAN”, for instance, it checks the eighth position of the text to see if it contains an “N”. If it finds the “N”, it moves to the seventh position to see if that contains the last “A” of the word, and so on until it checks the first position of the text for a “A”. Why Boyer-Moore takes this backward approach is clearer when we consider what happens if the verification fails – for instance, if instead of an “N” in the eighth position, we find an “X”. The “X” doesn’t appear anywhere in “ANPANMAN”, and this means there is no match for the search string at the very start of the text – or at the next seven positions following it, since those would all fall across the “X” as well. After checking just one character, we’re able to skip ahead and start looking for a match starting at the ninth position of the text, just after the “X”. This explains why the best-case performance of the algorithm, for a text of length N and a fixed pattern of length M, is N/M: in the best case, only one in M characters needs to be checked. This also explains the somewhat counter-intuitive result that the longer the pattern we are looking for, the faster the algorithm will be usually able to find it. The algorithm precomputes two tables to process the information it obtains in each failed verification: one table calculates how many positions ahead to start the next search based on the identity of the character that caused the match attempt to fail; the other makes a similar calculation based on how many characters were matched successfully before the match attempt failed. (Because these two tables return results indicating how far ahead in the text to “jump”, they are sometimes called “jump tables”, which should not be confused with the more common meaning of jump tables in computer science.)

the first table

The first table is easy to calculate: Start at the last character of the sought string and move towards the first character. Each time you move left, if the character you are on is not in the table already, add it; its Shift value is its distance from the rightmost character. All other characters receive a count equal to the length of the search string.

Example: For the string ANPANMAN, the first table would be as shown (for clarity, entries are shown in the order they would be added to the table):(The N which is supposed to be zero is based on the 2nd N from the right because we only calculate from letters m-1)

The amount of shift calculated by the first table is sometimes called the “bad character shift”[1].

the second table

The second table is slightly more difficult to calculate: for each value of i less than the length of the search string, we must first calculate the pattern consisting of the last i characters of the search string, preceded by a mis-match for the character before it; then we initially line it up with the search pattern and determine the least number of characters the partial pattern must be shifted left before the two patterns match. For instance, for the search string ANPANMAN, the table would be as follows: (N signifies any character that is not N)

The amount of shift calculated by the second table is sometimes called the “good suffix shift”[2] or “(strong) good suffix rule”. The original published Boyer-Moor algorithm [1] uses a simpler, weaker, version of the good suffix rule in which each entry in the above table did not require a mis-match for the left-most character. This is sometimes called the “weak good suffix rule” and is not sufficient for proving that Boyer-Moore runs in linear worst-case time.

Performance of the Boyer-Moore string search algorithm

The worst-case to find all occurrences in a text needs approximately 3N comparisons, hence the complexity is O(n), regardless whether the text contains a match or not. The proof is due to Richard Cole, see R. COLE,Tight bounds on the complexity of the Boyer-Moore algorithm,Proceedings of the 2nd Annual ACM-SIAM Symposium on Discrete Algorithms, (1991) for details. This proof took some years to determine. In the year the algorithm was devised, 1977, the maximum number of comparisons was shown to be no more than 6N; in 1980 it was shown to be no more than 4*N, until Cole’s result in 1991.

References

Hume and Sunday (1991) [Fast String Searching] SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 21(11), 1221–1248 (NOVEMBER 1991)
^ R. S. Boyer (1977). “A fast string searching algorithm”. Comm. ACM. 20: 762–772. doi:10.1145/359842.359859.
输入
two lines and only characters “ACGT” in the string. the first line is string (< = 102000) the second line is text(< = 700000)

输出
position of the string in text else -1

样例输入
GGCCTCATATCTCTCT
CCCATTGGCCTCATATCTCTCTCCCTCCCTCCCCTGCCCAGGCTGCTTGGCATGG
样例输出
6

说明:
我们套个Boyer-More的函数壳子,里面整一波KMP,关于KMP算法,这里推荐一个大佬博客:https://www.cnblogs.com/zhangtianq/p/5839909.html

代码:

#include<bits/stdc++.h>
using namespace std;
const int maxn=700005;
char s[maxn],p[maxn];
int next[maxn];
void getNext(){int len=strlen(p);int k=-1;int j=0;next[0]=-1;while(j<len-1){if(k==-1||p[j]==p[k]){next[j+1]=k+1;k++;j++;}else{k=next[k];}}
}
int boyerMore(){getNext();int i=0,j=0;int lens=strlen(s),lenp=strlen(p);while(i<lens&&j<lenp){if(j==-1||s[i]==p[j]){i++;j++;}else{j=next[j];}}if(j==lenp) return i-j;else return -1;
}
int main(){cin>>p>>s;cout<<boyerMore()<<endl;return 0;
}

【KMP】572: Boyer–Moore–Horspool algorithm相关推荐

  1. 【kmp】似乎在梦中见过的样子

    参考博客: BZOJ 3620: 似乎在梦中见过的样子 [KMP]似乎在梦中见过的样子 题目描述 「Madoka,不要相信QB!」伴随着Homura的失望地喊叫,Madoka与QB签订了契约. 这是M ...

  2. 【KMP】Radio Transmission(最小循环子串)

    [KMP]Radio Transmission(最小循环子串) Description 给你一个字符串,它是由某个字符串不断自我连接形成的.但是这个字符串是不确定的,现在只想知道它的最短长度是多少. ...

  3. 【KMP】OKR-Periods of Words

    [KMP]OKR-Periods of Words 题目描述 串是有限个小写字符的序列,特别的,一个空序列也可以是一个串.一个串P是串A的前缀,当且仅当存在串B,使得A=PB.如果P≠A并且P不是一个 ...

  4. ACM入门之【KMP】

    KMP可以O(n)的时间查找出一个字符串在另一个字符串出现的次数和位置. KMP 的精髓在于,对于每次失配之后,我都不会从头重新开始枚举,而是根据我已经得知的数据,从某个特定的位置开始匹配:而对于模式 ...

  5. 【bzoj3620】【似乎在梦中见过的样子】【kmp】

    Description "Madoka,不要相信 QB!"伴随着 Homura 的失望地喊叫,Madoka 与 QB 签订了契约. 这是 Modoka 的一个噩梦,也同时是上个轮回 ...

  6. 【KMP】从原理上详解next数组和nextval数组

    本文将从原理上详细解释KMP算法中的next数组以及nextval数组,尽量让大家明白它们到底在记录什么,为什么要这样算.以及现在普遍的KMP算法实现当中的next数组与前两者有何不同.篇幅较长,但尽 ...

  7. [NOI2014]动物园 【kmp】

    题目描述 每次写kmp都要调一万年 这题主要两个数组next[]next[]和num[]num[] num[i]num[i]表示以ii结尾的前缀所能匹配的数量(可重叠的) 代码 #include< ...

  8. [BZOJ 3942] [Usaco2015 Feb] Censoring 【KMP】

    题目链接:BZOJ - 3942 题目分析 我们发现,删掉一段 T 之后,被删除的部分前面的一段可能和后面的一段连接起来出现新的 T . 所以我们删掉一段 T 之后应该接着被删除的位置之前的继续向后匹 ...

  9. P5287-[HNOI2019]JOJO【KMP】

    正题 题目链接:https://www.luogu.com.cn/problem/P5287 题目大意 开始一个空串,nnn个操作 在末尾加入xxx个ccc字符(保证和ccc和前面的字符不同) 返回到 ...

最新文章

  1. CUDA Samples: Long Vector Add
  2. 【FFmpeg】ffmpeg工具源码分析(一):main函数
  3. 用svg实现一个环形进度条
  4. 【推荐系统】推荐系统整体框架概览
  5. C/Cpp / #include
  6. list接口中的常用方法例子
  7. ARM:下一代架构也将继续供给华为
  8. verilog语法实例学习(3)
  9. complie myplayer on svn source code
  10. 设计模式 - 代理模式、委托模式
  11. blos硬盘启动台式计算机,怎么进bios设置硬盘启动顺序|电脑bios硬盘启动设置方法...
  12. Delphi XE10让android的界面设计摆脱繁杂
  13. 互联网广告付费模式专业术语大盘点:CPC、CPM、CPT……
  14. 先掌握这 19 个 css 技巧,解决一些疑难杂症
  15. 简单的MD5密码加密和解密方法
  16. 摄影_焦点、对焦、对焦点
  17. 网络营销都有什么特点
  18. php抢购问题,PHP并发抢购解决方案
  19. 中俄边界上的田园综合体:被额尔古纳河环抱着的“世外桃源”
  20. gdb中文乱码_关于中文和乱码

热门文章

  1. 《3D打印:正在到来的工业革命》——1.5节21世纪的个人计算机
  2. lol1月24服务器维护,LOL测试服1月24日:蔚技能全面调整 刀妹再次削弱
  3. RocketMQ源码分析(十二)之CommitLog同步与异步刷盘
  4. 约瑟环问题(全部代码在本文下方)
  5. WebSocket整理
  6. 《守塔兵团》H5游戏养成玩法攻略
  7. 【苹果推】证书imessage继续APNS文件暗码庇护
  8. 游戏测试 | 测试工具:做一个可以即时修改卡牌属性的工具方便测试
  9. 【电脑使用】送给即将迈入大学的电脑小白的一份电脑使用入场券
  10. 【Windows】无法完成更改。请重新启动你的计算机,然后再试一次。0x8024402C