http://blog.csdn.net/v_july_v/article/details/9024123#comments

前言

之前本一直想写写神经网络算法和EM算法,但写这两个算法实在需要大段大段的时间,而平时上班,周末则跑去北大教室自习看书(顺便以时间为序,说下过去半年看过的自觉还不错的数学史方面的书:《数理统计学简史》《微积分概念发展史》《微积分的历程:从牛顿到勒贝格》《数学恩仇录》《数学与知识的探求》《古今数学思想》《素数之恋》),故一直未曾有时间写。

然最近在负责一款在线编程挑战平台:http://hero.pongo.cn/(简称hero,通俗理解是中国的topcoder,当然,一直在不断完善中,与一般OJ不同点在于,OJ侧重为参与ACM竞赛者提供刷题练习的场所,而hero则着重为企业招聘面试服务),在上面出了几道编程面试题,有些题目看似简单,但一coding,很多问题便立马都在hero上给暴露出来了,故就从hero上的编程挑战题切入,继续更新本程序员编程艺术系列吧。

况且,前几天与一朋友聊天,他说他认识的今年360招进来的三四十人应届生包括他自己找工作时基本都看过我的博客,则更增加了更新此编程艺术系列的动力。

OK,本文讲两个问题:

  • 第三十章、字符串转换成整数,从确定思路,到写出有瑕疵的代码,继而到microsoft & linux的atoi实现,再到第一份比较完整的代码,最后以Net/OS中的实现结尾,看似很简单的一个问题,其实非常不简单;
  • 第三十一章、字符串匹配问题
还是这句老话,有问题恳请随时批评指正,感谢。

第三十章、字符串转换成整数

先看题目:

输入一个表示整数的字符串,把该字符串转换成整数并输出,例如输入字符串"345",则输出整数345。
给定函数原型int StrToInt(const char *str) ,完成函数StrToInt,实现字符串转换成整数的功能,不得用库函数atoi。

我们来一步一步分析(共9小节,重点在下文第8小节及后续内容),直至写出第一份准确的代码:

1、本题考查的实际上就是字符串转换成整数的问题,或者说是要你自行实现atoi函数。那如何实现把表示整数的字符串正确地转换成整数呢?以"345"作为例子:

  1. 当我们扫描到字符串的第一个字符'3'时,由于我们知道这是第一位,所以得到数字3。
  2. 当扫描到第二个数字'4'时,而之前我们知道前面有一个3,所以便在后面加上一个数字4,那前面的3相当于30,因此得到数字:3*10+4=34。
  3. 继续扫描到字符'5','5'的前面已经有了34,由于前面的34相当于340,加上后面扫描到的5,最终得到的数是:34*10+5=345。

因此,此题的思路便是:每扫描到一个字符,我们便把在之前得到的数字乘以10,然后再加上当前字符表示的数字。

2、思路有了,有一些细节需要注意,如zhedahht所说:

  1. “由于整数可能不仅仅之含有数字,还有可能以'+'或者'-'开头,表示整数的正负。因此我们需要把这个字符串的第一个字符做特殊处理。如果第一个字符是'+'号,则不需要做任何操作;如果第一个字符是'-'号,则表明这个整数是个负数,在最后的时候我们要把得到的数值变成负数。
  2. 接着我们试着处理非法输入。由于输入的是指针,在使用指针之前,我们要做的第一件是判断这个指针是不是为空。如果试着去访问空指针,将不可避免地导致程序崩溃。
  3. 另外,输入的字符串中可能含有不是数字的字符。每当碰到这些非法的字符,我们就没有必要再继续转换。
  4. 最后一个需要考虑的问题是溢出问题。由于输入的数字是以字符串的形式输入,因此有可能输入一个很大的数字转换之后会超过能够表示的最大的整数而溢出。”
比如,当给的字符串是如左边图片所示的时候,有考虑到么?当然,它们各自对应的正确输出如右边图片所示(假定你是在32位系统下,且编译环境是VS2008以上):
3、很快,可能你就会写下如下代码:
[cpp] view plaincopyprint?
  1. //copyright@zhedahht 2007
  2. enum Status {kValid = 0, kInvalid};
  3. int g_nStatus = kValid;
  4. // Convert a string into an integer
  5. int StrToInt(const char* str)
  6. {
  7. g_nStatus = kInvalid;
  8. long long num = 0;
  9. if(str != NULL)
  10. {
  11. const char* digit = str;
  12. // the first char in the string maybe '+' or '-'
  13. bool minus = false;
  14. if(*digit == '+')
  15. digit ++;
  16. else if(*digit == '-')
  17. {
  18. digit ++;
  19. minus = true;
  20. }
  21. // the remaining chars in the string
  22. while(*digit != '\0')
  23. {
  24. if(*digit >= '0' && *digit <= '9')
  25. {
  26. num = num * 10 + (*digit - '0');
  27. // overflow
  28. if(num > std::numeric_limits<int>::max())
  29. {
  30. num = 0;
  31. break;
  32. }
  33. digit ++;
  34. }
  35. // if the char is not a digit, invalid input
  36. else
  37. {
  38. num = 0;
  39. break;
  40. }
  41. }
  42. if(*digit == '\0')
  43. {
  44. g_nStatus = kValid;
  45. if(minus)
  46. num = 0 - num;
  47. }
  48. }
  49. return static_cast<int>(num);
  50. }
//copyright@zhedahht 2007
enum Status {kValid = 0, kInvalid};
int g_nStatus = kValid;// Convert a string into an integer
int StrToInt(const char* str)
{g_nStatus = kInvalid;long long num = 0;if(str != NULL){const char* digit = str;// the first char in the string maybe '+' or '-'bool minus = false;if(*digit == '+')digit ++;else if(*digit == '-'){digit ++;minus = true;}// the remaining chars in the stringwhile(*digit != '\0'){if(*digit >= '0' && *digit <= '9'){num = num * 10 + (*digit - '0');// overflow  if(num > std::numeric_limits<int>::max()){num = 0;break;}digit ++;}// if the char is not a digit, invalid inputelse{num = 0;break;}}if(*digit == '\0'){g_nStatus = kValid;if(minus)num = 0 - num;}}return static_cast<int>(num);
}
run下上述程序,会发现当输入字符串是下图中红叉叉部分所对应的时候,程序结果出错:

两个问题:

  1. 当输入的字符串不是数字,而是字符的时候,比如“1a”,上述程序直接返回了0(而正确的结果应该是得到1):

    [cpp] view plaincopyprint?
    1. // if the char is not a digit, invalid input
    2. else
    3. {
    4. num = 0;
    5. break;
    6. }
    // if the char is not a digit, invalid inputelse{num = 0;break;}
  2. 处理溢出时,有问题。因为它遇到溢出情况时,直接返回了0:
    [cpp] view plaincopyprint?
    1. // overflow
    2. if(num > std::numeric_limits<int>::max())
    3. {
    4. num = 0;
    5. break;
    6. }
    // overflow  if(num > std::numeric_limits<int>::max()){num = 0;break;}
4、把代码做下微调,如下(注:库函数atoi规定超过int值,按最大值maxint:2147483647来,超过-int按最小值minint:-2147483648来):
[cpp] view plaincopyprint?
  1. //copyright@SP_daiyq 2013/5/29
  2. int StrToInt(const char* str)
  3. {
  4. int res = 0; // result
  5. int i = 0; // index of str
  6. int signal = '+'; // signal '+' or '-'
  7. int cur; // current digit
  8. if (!str)
  9. return 0;
  10. // skip backspace
  11. while (isspace(str[i]))
  12. i++;
  13. // skip signal
  14. if (str[i] == '+' || str[i] == '-')
  15. {
  16. signal = str[i];
  17. i++;
  18. }
  19. // get result
  20. while (str[i] >= '0' && str[i] <= '9')
  21. {
  22. cur = str[i] - '0';
  23. // judge overlap or not
  24. if ( (signal == '+') && (cur > INT_MAX - res*10) )
  25. {
  26. res = INT_MAX;
  27. break;
  28. }
  29. else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) )
  30. {
  31. res = INT_MIN;
  32. break;
  33. }
  34. res = res * 10 + cur;
  35. i++;
  36. }
  37. return (signal == '-') ? -res : res;
  38. }
//copyright@SP_daiyq 2013/5/29
int StrToInt(const char* str)
{int res = 0; // resultint i = 0; // index of strint signal = '+'; // signal '+' or '-'int cur; // current digitif (!str)return 0;// skip backspacewhile (isspace(str[i]))i++;// skip signalif (str[i] == '+' || str[i] == '-'){signal = str[i];i++;}// get resultwhile (str[i] >= '0' && str[i] <= '9'){cur = str[i] - '0';// judge overlap or notif ( (signal == '+') && (cur > INT_MAX - res*10) ){res = INT_MAX;break;}else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) ){res = INT_MIN;break;}res = res * 10 + cur;i++;}return (signal == '-') ? -res : res;
}

此时会发现,上面第3小节末所述的第1个小问题(当输入的字符串不是数字,而是字符的时候)解决了:
但, 上文第3小节末所述的第2个小问题:溢出问题却没有解决。即当给定下述测试数据的时候,问题就来了:
需要转换的字符串 代码运行结果 理应得到的正确结果
什么问题呢?比如说用上述代码转换这个字符串:" 10522545459",它本应得到的正确结果应该是2147483647,但程序实际得到的结果却是:1932610867。故很明显,程序没有解决好上面的第2个小问题:溢出问题。原因是什么呢?咱们来分析下代码,看是如何具体处理溢出情况的:
[cpp] view plaincopyprint?
  1. // judge overlap or not
  2. if ( (signal == '+') && (cur > INT_MAX - res*10) )
  3. {
  4. res = INT_MAX;
  5. break;
  6. }
  7. else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) )
  8. {
  9. res = INT_MIN;
  10. break;
  11. }
// judge overlap or notif ( (signal == '+') && (cur > INT_MAX - res*10) ){res = INT_MAX;break;}else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) ){res = INT_MIN;break;}

接着上面的例子来,比如给定字符串" 10522545459",除去空格有11位,而MAX_INT,即2147483647是10位数,当扫描到最后一个字符‘9’的时候,程序会比较 9 和 2147483647 - 1052254545*10的大小。

问题立马就暴露出来了,因为此时让res*10,即让1052254545*10 > MAX_INT,溢出无疑,程序已经出错,再执行下面这行代码已无意义:
[cpp] view plaincopyprint?
  1. cur > INT_MAX - res*10
cur > INT_MAX - res*10

也就是说,对于字符串"10522545459", 当扫描到最后一个字符‘9’时,根据上文第1小节的字符串转换成整数的思路:“每扫描到一个字符,我们便把在之前得到的数字乘以10,然后再加上当前字符表示的数字”,为了得到最终的整数,我们得如此计算:

1052254545*10 + 4,

然实际上当程序计算到1052254545*10时,

1052254545*10 >

2147483647

此时已经溢出了,若再执意计算,则程序逻辑将出错,故此后也就不能再判断字串的最后一位4是否大于2147483647%10了(耐不得烦想尽快看到最终正确代码的读者可以直接跳到下文第8节)。

5、上面说给的程序没有“很好的解决溢出问题。由于输入的数字是以字符串的形式输入,因此有可能输入一个很大的数字转换之后会超过能够表示的最大的整数而溢出”。那么,到底代码该如何写呢?
像下面这样?:
[cpp] view plaincopyprint?
  1. //copyright@fuwutu 2013/5/29
  2. int StrToInt(const char* str)
  3. {
  4. bool negative = false;
  5. long long result = 0;
  6. while (*str == ' ' || *str == '\t')
  7. {
  8. ++str;
  9. }
  10. if (*str == '-')
  11. {
  12. negative = true;
  13. ++str;
  14. }
  15. else if (*str == '+')
  16. {
  17. ++str;
  18. }
  19. while (*str != '\0')
  20. {
  21. int n = *str - '0';
  22. if (n < 0 || n > 9)
  23. {
  24. break;
  25. }
  26. if (negative)
  27. {
  28. result = result * 10 - n;
  29. if (result < -2147483648LL)
  30. {
  31. result = -2147483648LL;
  32. }
  33. }
  34. else
  35. {
  36. result = result * 10 + n;
  37. if (result > 2147483647LL)
  38. {
  39. result = 2147483647LL;
  40. }
  41. }
  42. ++str;
  43. }
  44. return result;
  45. }
//copyright@fuwutu 2013/5/29
int StrToInt(const char* str)
{bool negative = false;long long result = 0;while (*str == ' ' || *str == '\t'){++str;}if (*str == '-'){negative = true;++str;}else if (*str == '+'){++str;}while (*str != '\0'){int n = *str - '0';if (n < 0 || n > 9){break;}if (negative){result = result * 10 - n;if (result < -2147483648LL){result = -2147483648LL;}}else{result = result * 10 + n;if (result > 2147483647LL){result = 2147483647LL;}}++str;}return result;
}
run下程序,看看运行结果:
上图所示程序貌似通过了,然实际上它还是未能处理数据溢出的问题,因为它只是做了个取巧,即把返回的值esult定义成了long long,如下所示:
[cpp] view plaincopyprint?
  1. long long result = 0;
long long result = 0;
故严格说来,我们依然未写出准确的规范代码。
6、那到底该如何解决这个数据溢出的问题呢?咱们先来看看Microsoft是如何实现atoi的吧:
[cpp] view plaincopyprint?
  1. //atol函数
  2. //Copyright (c) 1989-1997, Microsoft Corporation. All rights reserved.
  3. long __cdecl atol(
  4. const char *nptr
  5. )
  6. {
  7. int c; /* current char */
  8. long total; /* current total */
  9. int sign; /* if ''-'', then negative, otherwise positive */
  10. /* skip whitespace */
  11. while ( isspace((int)(unsigned char)*nptr) )
  12. ++nptr;
  13. c = (int)(unsigned char)*nptr++;
  14. sign = c; /* save sign indication */
  15. if (c == ''-'' || c == ''+'')
  16. c = (int)(unsigned char)*nptr++; /* skip sign */
  17. total = 0;
  18. while (isdigit(c)) {
  19. total = 10 * total + (c - ''0''); /* accumulate digit */
  20. c = (int)(unsigned char)*nptr++; /* get next char */
  21. }
  22. if (sign == ''-'')
  23. return -total;
  24. else
  25. return total; /* return result, negated if necessary */
  26. }
//atol函数
//Copyright (c) 1989-1997, Microsoft Corporation. All rights reserved.
long __cdecl atol(const char *nptr)
{int c; /* current char */long total; /* current total */int sign; /* if ''-'', then negative, otherwise positive *//* skip whitespace */while ( isspace((int)(unsigned char)*nptr) )++nptr;c = (int)(unsigned char)*nptr++;sign = c; /* save sign indication */if (c == ''-'' || c == ''+'')c = (int)(unsigned char)*nptr++; /* skip sign */total = 0;while (isdigit(c)) {total = 10 * total + (c - ''0''); /* accumulate digit */c = (int)(unsigned char)*nptr++; /* get next char */}if (sign == ''-'')return -total;elsereturn total; /* return result, negated if necessary */
}
其中,isspace和isdigit函数的实现代码为:
[cpp] view plaincopyprint?
  1. isspace(int x)
  2. {
  3. if(x==' '||x=='/t'||x=='/n'||x=='/f'||x=='/b'||x=='/r')
  4. return 1;
  5. else
  6. return 0;
  7. }
  8. isdigit(int x)
  9. {
  10. if(x<='9'&&x>='0')
  11. return 1;
  12. else
  13. return 0;
  14. }
isspace(int x)
{  if(x==' '||x=='/t'||x=='/n'||x=='/f'||x=='/b'||x=='/r')  return 1;  else   return 0;
}  isdigit(int x)
{  if(x<='9'&&x>='0')           return 1;   else   return 0;
} 
然后atoi调用上面的atol函数,如下所示:
[cpp] view plaincopyprint?
  1. //atoi调用上述的atol
  2. int __cdecl atoi(
  3. const char *nptr
  4. )
  5. {
  6. //Overflow is not detected. Because of this, we can just use
  7. return (int)atol(nptr);
  8. }
//atoi调用上述的atol
int __cdecl atoi(const char *nptr)
{//Overflow is not detected. Because of this, we can just usereturn (int)atol(nptr);
}

但很遗憾的是,上述atoi标准代码依然返回的是long:

[cpp] view plaincopyprint?
  1. long total; /* current total */
  2. if (sign == ''-'')
  3. return -total;
  4. else
  5. return total; /* return result, negated if necessary */
long total; /* current total */
if (sign == ''-'')return -total;
elsereturn total; /* return result, negated if necessary */

再者,下面这里定义成long的total与10相乘,即total*10很容易溢出:

[cpp] view plaincopyprint?
  1. long total; /* current total */
  2. total = 10 * total + (c - ''0''); /* accumulate digit */
long total; /* current total */
total = 10 * total + (c - ''0''); /* accumulate digit */
最后,根据本文评论下的读者meiyuli反应:“测试数据是字符串"-21474836480",api算出来的是-2147483648,用上述代码算出来的结果是0”,如此,上述微软的这个atoi源码是有问题的。
7microsoft既然不行,读者想必很自然的想到linux。So,咱们接下来便看看linux内核中是如何实现此字符串转换为整数的问题的。linux内核中提供了以下几个函数:
  1. simple_strtol,把一个字符串转换为一个有符号长整数;
  2. simple_strtoll,把一个字符串转换为一个有符号长长整数;
  3. simple_strtoul,把一个字符串转换为一个无符号长整数;
  4. simple_strtoull,把一个字符串转换为一个无符号长长整数
相关源码及分析如下。
首先,atoi调下面的strtol:
[cpp] view plaincopyprint?
  1. //linux/lib/vsprintf.c
  2. //Copyright (C) 1991, 1992 Linus Torvalds
  3. //simple_strtol - convert a string to a signed long
  4. long simple_strtol(const char *cp, char **endp, unsigned int base)
  5. {
  6. if (*cp == '-')
  7. return -simple_strtoul(cp + 1, endp, base);
  8. return simple_strtoul(cp, endp, base);
  9. }
  10. EXPORT_SYMBOL(simple_strtol);
//linux/lib/vsprintf.c
//Copyright (C) 1991, 1992  Linus Torvalds
//simple_strtol - convert a string to a signed long
long simple_strtol(const char *cp, char **endp, unsigned int base)
{if (*cp == '-')return -simple_strtoul(cp + 1, endp, base);return simple_strtoul(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtol);

然后,上面的strtol调下面的strtoul:

[cpp] view plaincopyprint?
  1. //simple_strtoul - convert a string to an unsigned long
  2. unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base)
  3. {
  4. return simple_strtoull(cp, endp, base);
  5. }
  6. EXPORT_SYMBOL(simple_strtoul);
//simple_strtoul - convert a string to an unsigned long
unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base)
{return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoul);

接着,上面的strtoul调下面的strtoull:

[cpp] view plaincopyprint?
  1. //simple_strtoll - convert a string to a signed long long
  2. long long simple_strtoll(const char *cp, char **endp, unsigned int base)
  3. {
  4. if (*cp == '-')
  5. return -simple_strtoull(cp + 1, endp, base);
  6. return simple_strtoull(cp, endp, base);
  7. }
  8. EXPORT_SYMBOL(simple_strtoll);
//simple_strtoll - convert a string to a signed long long
long long simple_strtoll(const char *cp, char **endp, unsigned int base)
{if (*cp == '-')return -simple_strtoull(cp + 1, endp, base);return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoll);

最后,strtoull调_parse_integer_fixup_radix和_parse_integer来处理相关逻辑:

[cpp] view plaincopyprint?
  1. //simple_strtoull - convert a string to an unsigned long long
  2. unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
  3. {
  4. unsigned long long result;
  5. unsigned int rv;
  6. cp = _parse_integer_fixup_radix(cp, &base);
  7. rv = _parse_integer(cp, base, &result);
  8. /* FIXME */
  9. cp += (rv & ~KSTRTOX_OVERFLOW);
  10. if (endp)
  11. *endp = (char *)cp;
  12. return result;
  13. }
  14. EXPORT_SYMBOL(simple_strtoull);
//simple_strtoull - convert a string to an unsigned long long
unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
{unsigned long long result;unsigned int rv;cp = _parse_integer_fixup_radix(cp, &base);rv = _parse_integer(cp, base, &result);/* FIXME */cp += (rv & ~KSTRTOX_OVERFLOW);if (endp)*endp = (char *)cp;return result;
}
EXPORT_SYMBOL(simple_strtoull);

重头戏来了。接下来,我们来看上面strtoull函数中的parse_integer_fixup_radix和_parse_integer两段代码。如鲨鱼所说

  • “真正的处理逻辑主要是在_parse_integer里面,关于溢出的处理,_parse_integer处理的很优美,
  • 而_parse_integer_fixup_radix是用来自动根据字符串判断进制的”。
先来看_parse_integer函数:

[cpp] view plaincopyprint?
  1. //lib/kstrtox.c, line 39
  2. //Convert non-negative integer string representation in explicitly given radix to an integer.
  3. //Return number of characters consumed maybe or-ed with overflow bit.
  4. //If overflow occurs, result integer (incorrect) is still returned.
  5. unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
  6. {
  7. unsigned long long res;
  8. unsigned int rv;
  9. int overflow;
  10. res = 0;
  11. rv = 0;
  12. overflow = 0;
  13. while (*s) {
  14. unsigned int val;
  15. if ('0' <= *s && *s <= '9')
  16. val = *s - '0';
  17. else if ('a' <= _tolower(*s) && _tolower(*s) <= 'f')
  18. val = _tolower(*s) - 'a' + 10;
  19. else
  20. break;
  21. if (val >= base)
  22. break;
  23. /*
  24. * Check for overflow only if we are within range of
  25. * it in the max base we support (16)
  26. */
  27. if (unlikely(res & (~0ull << 60))) {
  28. if (res > div_u64(ULLONG_MAX - val, base))
  29. overflow = 1;
  30. }
  31. res = res * base + val;
  32. rv++;
  33. s++;
  34. }
  35. *p = res;
  36. if (overflow)
  37. rv |= KSTRTOX_OVERFLOW;
  38. return rv;
  39. }
//lib/kstrtox.c, line 39
//Convert non-negative integer string representation in explicitly given radix to an integer.
//Return number of characters consumed maybe or-ed with overflow bit.
//If overflow occurs, result integer (incorrect) is still returned.
unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
{  unsigned long long res;  unsigned int rv;  int overflow;  res = 0;  rv = 0;  overflow = 0;  while (*s) {  unsigned int val;  if ('0' <= *s && *s <= '9')  val = *s - '0';  else if ('a' <= _tolower(*s) && _tolower(*s) <= 'f')  val = _tolower(*s) - 'a' + 10;  else  break;  if (val >= base)  break;  /* * Check for overflow only if we are within range of * it in the max base we support (16) */  if (unlikely(res & (~0ull << 60))) {  if (res > div_u64(ULLONG_MAX - val, base))  overflow = 1;  }  res = res * base + val;  rv++;  s++;  }  *p = res;  if (overflow)  rv |= KSTRTOX_OVERFLOW;  return rv;
}
解释下两个小细节:
  1. 上头出现了个unlikely,其实unlikely和likely经常出现在linux相关内核源码中

    [cpp] view plaincopyprint?
    1. if(likely(value)){
    2. //等价于if(likely(value)) == if(value)
    3. }
    4. else{
    5. }
    if(likely(value)){//等价于if(likely(value)) == if(value)
    }
    else{
    }

    likely表示value为真的可能性更大,而unlikely表示value为假的可能性更大,这两个宏被定义成:

    [cpp] view plaincopyprint?
    1. //include/linux/compiler.h
    2. # ifndef likely
    3. # define likely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1))
    4. # endif
    5. # ifndef unlikely
    6. # define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
    7. # endif
    //include/linux/compiler.h
    # ifndef likely
    #  define likely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1))
    # endif
    # ifndef unlikely
    #  define unlikely(x)   (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
    # endif
  2. 呈现下div_u64的代码:
    [cpp] view plaincopyprint?
    1. //include/linux/math64.h
    2. //div_u64
    3. static inline u64 div_u64(u64 dividend, u32 divisor)
    4. {
    5. u32 remainder;
    6. return div_u64_rem(dividend, divisor, &remainder);
    7. }
    8. //div_u64_rem
    9. static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
    10. {
    11. *remainder = dividend % divisor;
    12. return dividend / divisor;
    13. }
    //include/linux/math64.h
    //div_u64
    static inline u64 div_u64(u64 dividend, u32 divisor)
    {u32 remainder;return div_u64_rem(dividend, divisor, &remainder);
    }//div_u64_rem
    static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
    {*remainder = dividend % divisor;return dividend / divisor;
    }
最后看下_parse_integer_fixup_radix函数:

[cpp] view plaincopyprint?
  1. //lib/kstrtox.c, line 23
  2. const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
  3. {
  4. if (*base == 0) {
  5. if (s[0] == '0') {
  6. if (_tolower(s[1]) == 'x' && isxdigit(s[2]))
  7. *base = 16;
  8. else
  9. *base = 8;
  10. } else
  11. *base = 10;
  12. }
  13. if (*base == 16 && s[0] == '0' && _tolower(s[1]) == 'x')
  14. s += 2;
  15. return s;
  16. }
//lib/kstrtox.c, line 23
const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
{if (*base == 0) {if (s[0] == '0') {if (_tolower(s[1]) == 'x' && isxdigit(s[2]))*base = 16;else*base = 8;} else*base = 10;}if (*base == 16 && s[0] == '0' && _tolower(s[1]) == 'x')s += 2;return s;
}
读者MJN君在我的建议下,对上述linux内核中的atoi函数进行了测试,咱们来看下测试结果如何。
2147483647 : 2147483647
2147483648 : -2147483648
10522545459 : 1932610867
-2147483648 : -2147483648
-2147483649 : -2147483647
-10522545459 : 1932610867
如上,根据程序的输出结果可以看出,对于某些溢出的情况,程序的处理并不符合本题的要求。路漫漫其修远兮,吾等将上下而求索,但与此同时,我们已渐入佳境。

8、根据我们第1小节达成一致的字符串转换成整数的思路:“每扫描到一个字符,我们便把在之前得到的数字乘以10,然后再加上当前字符表示的数字”,相信读者已经觉察到,在扫描到最后一个字符的时候,如果之前得到的数比较大,此时若再让其扩大10倍,相对来说是比较容易溢出的。

但车到山前必有路,既然让一个比较大的int整型数括大10倍,比较容易溢出, 那么在不好判断是否溢出的情况下,可以尝试使用除法。即如MJN所说:
  1. 与其将n扩大10倍,,冒着溢出的风险, 再与MAX_INT进行比较(如果已经溢出, 则比较的结果没有意义),
  2. 不如未雨绸缪先用n与MAX_INT/10进行比较: 若n>MAX_INT/10(当然同时还要考虑n=MAX_INT/10的情况), 说明最终得到的整数一定会溢出, 故此时可以当即进行溢出处理,直接返回最大值MAX_INT,从而也就免去了计算n*10这一步骤。
也就是说,计算n*10前,先比较n与MAX_INT/10大小,若n>MAX_INT/10,那么n*10肯定大于MAX_INT,即代表最后得到的整数n肯定溢出,既然溢出,不能再计算n*10,直接提前返回MAX_INT就行了。
一直以来,我们努力的目的归根结底是为了更好的处理溢出,但上述做法最重要的是巧妙的规避了计算n*10这一乘法步骤,转换成计算除法MAX_INT/10代替,不能不说此法颇妙。

他的代码如下,如有问题请指出:

[cpp] view plaincopyprint?
  1. //copyright@njnu_mjn 2013
  2. int StrToDecInt(const char* str)
  3. {
  4. static const int MAX = (int)((unsigned)~0 >> 1);
  5. static const int MIN = -(int)((unsigned)~0 >> 1) - 1;
  6. unsigned int n = 0;
  7. int sign = 1;
  8. int c;
  9. while (isspace(*str))
  10. ++str;
  11. if (*str == '+' || *str == '-')
  12. {
  13. if (*str == '-')
  14. sign = -1;
  15. ++str;
  16. }
  17. while (isdigit(*str))
  18. {
  19. c = *str - '0';
  20. if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))
  21. {
  22. n = MAX;
  23. break;
  24. }
  25. else if (sign < 0 && (n > (unsigned)MIN/10
  26. || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))
  27. {
  28. n = MIN;
  29. break;
  30. }
  31. n = n * 10 + c;
  32. ++str;
  33. }
  34. return sign > 0 ? n : -n;
  35. }
//copyright@njnu_mjn 2013
int StrToDecInt(const char* str)
{    static const int MAX = (int)((unsigned)~0 >> 1);    static const int MIN = -(int)((unsigned)~0 >> 1) - 1;    unsigned int n = 0;    int sign = 1;    int c;    while (isspace(*str))    ++str;    if (*str == '+' || *str == '-')    {    if (*str == '-')    sign = -1;    ++str;    }    while (isdigit(*str))    {    c = *str - '0';    if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))    {    n = MAX;    break;    }    else if (sign < 0 && (n > (unsigned)MIN/10     || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))    {    n = MIN;    break;    }    n = n * 10 + c;    ++str;    }    return sign > 0 ? n : -n;
}  

上述代码从测试结果来看,暂未发现什么问题

输入 输出
10522545459 : 2147483647
-10522545459 : -2147483648

咱们再来总结下上述代码是如何处理溢出情况的。对于正数来说,它溢出的可能性有两种:

  1. 一种是诸如2147483650,即n > MAX/10 的;
  2. 一种是诸如2147483649,即n == MAX/10 && c > MAX%10。

故咱们上面处理溢出情况的代码便是:

[cpp] view plaincopyprint?
  1. c = *str - '0';
  2. if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))
  3. {
  4. n = MAX;
  5. break;
  6. }
  7. else if (sign < 0 && (n > (unsigned)MIN/10
  8. || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))
  9. {
  10. n = MIN;
  11. break;
  12. }
c = *str - '0';  if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))  {  n = MAX;  break;  }  else if (sign < 0 && (n > (unsigned)MIN/10   || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))  {  n = MIN;  break;  }  

不过,即便如此,有些细节是改进的,如他自己所说:

  1. n的声明及定义应该为

    [cpp] view plaincopyprint?
    1. int n = 0;
    int n = 0;  
  2. 将MAX/10,MAX%10,(unsigned)MIN/10及(unsigned)MIN%10保存到变量中, 防止重复计算

这样,优化后的代码为:

[cpp] view plaincopyprint?
  1. //copyright@njnu_mjn 2013
  2. int StrToDecInt(const char* str)
  3. {
  4. static const int MAX = (int)((unsigned)~0 >> 1);
  5. static const int MIN = -(int)((unsigned)~0 >> 1) - 1;
  6. static const int MAX_DIV = (int)((unsigned)~0 >> 1) / 10;
  7. static const int MIN_DIV = (int)((((unsigned)~0 >> 1) + 1) / 10);
  8. static const int MAX_R = (int)((unsigned)~0 >> 1) % 10;
  9. static const int MIN_R = (int)((((unsigned)~0 >> 1) + 1) % 10);
  10. int n = 0;
  11. int sign = 1;
  12. int c;
  13. while (isspace(*str))
  14. ++str;
  15. if (*str == '+' || *str == '-')
  16. {
  17. if (*str == '-')
  18. sign = -1;
  19. ++str;
  20. }
  21. while (isdigit(*str))
  22. {
  23. c = *str - '0';
  24. if (sign > 0 && (n > MAX_DIV || (n == MAX_DIV && c >= MAX_R)))
  25. {
  26. n = MAX;
  27. break;
  28. }
  29. else if (sign < 0 && (n > MIN_DIV
  30. || (n == MIN_DIV && c >= MIN_R)))
  31. {
  32. n = MIN;
  33. break;
  34. }
  35. n = n * 10 + c;
  36. ++str;
  37. }
  38. return sign > 0 ? n : -n;
  39. }
//copyright@njnu_mjn 2013
int StrToDecInt(const char* str)
{  static const int MAX = (int)((unsigned)~0 >> 1);  static const int MIN = -(int)((unsigned)~0 >> 1) - 1;  static const int MAX_DIV = (int)((unsigned)~0 >> 1) / 10;  static const int MIN_DIV = (int)((((unsigned)~0 >> 1) + 1) / 10);  static const int MAX_R = (int)((unsigned)~0 >> 1) % 10;  static const int MIN_R = (int)((((unsigned)~0 >> 1) + 1) % 10);  int n = 0;  int sign = 1;  int c;  while (isspace(*str))  ++str;  if (*str == '+' || *str == '-')  {  if (*str == '-')  sign = -1;  ++str;  }  while (isdigit(*str))  {  c = *str - '0';  if (sign > 0 && (n > MAX_DIV || (n == MAX_DIV && c >= MAX_R)))  {  n = MAX;  break;  }  else if (sign < 0 && (n > MIN_DIV   || (n == MIN_DIV && c >= MIN_R)))  {  n = MIN;  break;  }  n = n * 10 + c;  ++str;  }  return sign > 0 ? n : -n;
}  
部分数据的测试结果如下图所示:

输入            输出
10522545459  : 2147483647
-10522545459 : -2147483648
2147483648   : 2147483647
-2147483648  : -2147483648

是否已是完美?如MJN君本人所说“我的实现与linux内核的atoi函数的实现, 都有一个共同的问题: 即使出错, 函数也返回了一个值, 导致调用者误认为自己传入的参数是正确的, 但是可能会导致程序的其他部分产生莫名的错误且很难调试”。

9最后看下Nut/OS中atoi的实现,同时,本小节内容主要来自参考文献条目9,即MJN的博客:

[cpp] view plaincopyprint?
  1. 00077 #include <compiler.h>
  2. 00078 #include <stdlib.h>
  3. 00079
  4. 00084
  5. 00092 int atoi(CONST char *str)
  6. 00093 {
  7. 00094 return ((int) strtol(str, (char **) NULL, 10));
  8. 00095 }
00077 #include <compiler.h>
00078 #include <stdlib.h>
00079
00084
00092 int atoi(CONST char *str)
00093 {
00094     return ((int) strtol(str, (char **) NULL, 10));
00095 }

上述代码中strtol实现的思想跟上文第7节所述的MJN君的思路类似,也是除法代替乘法。加上测试函数后的具体代码如下:

[cpp] view plaincopyprint?
  1. #include <errno.h>
  2. #include <stdio.h>
  3. #include <ctype.h>
  4. #include <limits.h>
  5. #define CONST const
  6. long mstrtol(CONST char *nptr, char **endptr, int base)
  7. {
  8. register CONST char *s;
  9. register long acc, cutoff;
  10. register int c;
  11. register int neg, any, cutlim;
  12. /*
  13. * Skip white space and pick up leading +/- sign if any.
  14. * If base is 0, allow 0x for hex and 0 for octal, else
  15. * assume decimal; if base is already 16, allow 0x.
  16. */
  17. s = nptr;
  18. do {
  19. c = (unsigned char) *s++;
  20. } while (isspace(c));
  21. if (c == '-') {
  22. neg = 1;
  23. c = *s++;
  24. } else {
  25. neg = 0;
  26. if (c == '+')
  27. c = *s++;
  28. }
  29. if ((base == 0 || base == 16) && c == '0' && (*s == 'x' || *s == 'X')) {
  30. c = s[1];
  31. s += 2;
  32. base = 16;
  33. }
  34. if (base == 0)
  35. base = c == '0' ? 8 : 10;
  36. /*
  37. * Compute the cutoff value between legal numbers and illegal
  38. * numbers. That is the largest legal value, divided by the
  39. * base. An input number that is greater than this value, if
  40. * followed by a legal input character, is too big. One that
  41. * is equal to this value may be valid or not; the limit
  42. * between valid and invalid numbers is then based on the last
  43. * digit. For instance, if the range for longs is
  44. * [-2147483648..2147483647] and the input base is 10,
  45. * cutoff will be set to 214748364 and cutlim to either
  46. * 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated
  47. * a value > 214748364, or equal but the next digit is > 7 (or 8),
  48. * the number is too big, and we will return a range error.
  49. *
  50. * Set any if any `digits' consumed; make it negative to indicate
  51. * overflow.
  52. */
  53. cutoff = neg ? LONG_MIN : LONG_MAX;
  54. cutlim = cutoff % base;
  55. cutoff /= base;
  56. if (neg) {
  57. if (cutlim > 0) {
  58. cutlim -= base;
  59. cutoff += 1;
  60. }
  61. cutlim = -cutlim;
  62. }
  63. for (acc = 0, any = 0;; c = (unsigned char) *s++) {
  64. if (isdigit(c))
  65. c -= '0';
  66. else if (isalpha(c))
  67. c -= isupper(c) ? 'A' - 10 : 'a' - 10;
  68. else
  69. break;
  70. if (c >= base)
  71. break;
  72. if (any < 0)
  73. continue;
  74. if (neg) {
  75. if ((acc < cutoff || acc == cutoff) && c > cutlim) {
  76. any = -1;
  77. acc = LONG_MIN;
  78. errno = ERANGE;
  79. } else {
  80. any = 1;
  81. acc *= base;
  82. acc -= c;
  83. }
  84. } else {
  85. if ((acc > cutoff || acc == cutoff) && c > cutlim) {
  86. any = -1;
  87. acc = LONG_MAX;
  88. errno = ERANGE;
  89. } else {
  90. any = 1;
  91. acc *= base;
  92. acc += c;
  93. }
  94. }
  95. }
  96. if (endptr != 0)
  97. *endptr = (char *) (any ? s - 1 : nptr);
  98. return (acc);
  99. }
  100. int matoi2(CONST char *str)
  101. {
  102. return ((int) mstrtol(str, (char **) NULL, 10));
  103. }
  104. int mgetline(char* buf, size_t n) {
  105. size_t idx = 0;
  106. int c;
  107. while (--n > 0 && (c = getchar()) != EOF && c != '\n') {
  108. buf[idx++] = c;
  109. }
  110. buf[idx] = '\0';
  111. return idx;
  112. }
  113. #define MAX_LINE 200
  114. int main() {
  115. char buf[MAX_LINE];
  116. while (mgetline(buf, MAX_LINE) >= 0) {
  117. if (strcmp(buf, "quit") == 0) break;
  118. printf("matoi2=%d\n", matoi2(buf));
  119. }
  120. return 0;
  121. }
#include <errno.h>
#include <stdio.h>
#include <ctype.h>
#include <limits.h>#define CONST      constlong mstrtol(CONST char *nptr, char **endptr, int base)
{register CONST char *s;register long acc, cutoff;register int c;register int neg, any, cutlim;/** Skip white space and pick up leading +/- sign if any.* If base is 0, allow 0x for hex and 0 for octal, else* assume decimal; if base is already 16, allow 0x.*/s = nptr;do {c = (unsigned char) *s++;} while (isspace(c));if (c == '-') {neg = 1;c = *s++;} else {neg = 0;if (c == '+')c = *s++;}if ((base == 0 || base == 16) && c == '0' && (*s == 'x' || *s == 'X')) {c = s[1];s += 2;base = 16;}if (base == 0)base = c == '0' ? 8 : 10;/** Compute the cutoff value between legal numbers and illegal* numbers.  That is the largest legal value, divided by the* base.  An input number that is greater than this value, if* followed by a legal input character, is too big.  One that* is equal to this value may be valid or not; the limit* between valid and invalid numbers is then based on the last* digit.  For instance, if the range for longs is* [-2147483648..2147483647] and the input base is 10,* cutoff will be set to 214748364 and cutlim to either* 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated* a value > 214748364, or equal but the next digit is > 7 (or 8),* the number is too big, and we will return a range error.** Set any if any `digits' consumed; make it negative to indicate* overflow.*/cutoff = neg ? LONG_MIN : LONG_MAX;cutlim = cutoff % base;cutoff /= base;if (neg) {if (cutlim > 0) {cutlim -= base;cutoff += 1;}cutlim = -cutlim;}for (acc = 0, any = 0;; c = (unsigned char) *s++) {if (isdigit(c))c -= '0';else if (isalpha(c))c -= isupper(c) ? 'A' - 10 : 'a' - 10;elsebreak;if (c >= base)break;if (any < 0)continue;if (neg) {if ((acc < cutoff || acc == cutoff) && c > cutlim) {any = -1;acc = LONG_MIN;errno = ERANGE;} else {any = 1;acc *= base;acc -= c;}} else {if ((acc > cutoff || acc == cutoff) && c > cutlim) {any = -1;acc = LONG_MAX;errno = ERANGE;} else {any = 1;acc *= base;acc += c;}}}if (endptr != 0)*endptr = (char *) (any ? s - 1 : nptr);return (acc);
}int matoi2(CONST char *str)
{return ((int) mstrtol(str, (char **) NULL, 10));
}int mgetline(char* buf, size_t n) {size_t idx = 0;int c;while (--n > 0 && (c = getchar()) != EOF && c != '\n') {buf[idx++] = c;}buf[idx] = '\0';return idx;
}#define MAX_LINE 200int main() {char buf[MAX_LINE];while (mgetline(buf, MAX_LINE) >= 0) {if (strcmp(buf, "quit") == 0) break;printf("matoi2=%d\n", matoi2(buf));}return 0;
}

同样,MJN对上述实现测试了下,结果如下:

10522545459
matoi2=2147483647
-10522545459
matoi2=-2147483648

程序貌似对溢出的处理是正确的, 真的吗? 再把测试数据换成"10522545454"(与"10522545459"的区别在于最后一个字符)

10522545454
matoi2=1932610862
-10522545454
matoi2=-1932610862

症结就在于下面这段代码:

[cpp] view plaincopyprint?
  1. if (neg) {
  2. if ((acc < cutoff || acc == cutoff) && c > cutlim) {
  3. any = -1;
  4. acc = LONG_MIN;
  5. errno = ERANGE;
  6. } else {
  7. any = 1;
  8. acc *= base;
  9. acc -= c;
  10. }
  11. } else {
  12. if ((acc > cutoff || acc == cutoff) && c > cutlim) {
  13. any = -1;
  14. acc = LONG_MAX;
  15. errno = ERANGE;
if (neg) {if ((acc < cutoff || acc == cutoff) && c > cutlim) {any = -1;acc = LONG_MIN;errno = ERANGE;} else {any = 1;acc *= base;acc -= c;}} else {if ((acc > cutoff || acc == cutoff) && c > cutlim) {any = -1;acc = LONG_MAX;errno = ERANGE;

要想得到正确的输出结果,需要改动两个地方:

①其中这行:

[cpp] view plaincopyprint?
  1. if ((acc > cutoff || acc == cutoff) && c > cutlim)
if ((acc > cutoff || acc == cutoff) && c > cutlim)

应该改为:

[cpp] view plaincopyprint?
  1. if ( acc > cutoff || (acc == cutoff) && c > cutlim) )
if ( acc > cutoff ||  (acc == cutoff) && c > cutlim)  )

②与此同时,这行:

[cpp] view plaincopyprint?
  1. if ((acc < cutoff || acc == cutoff) && c > cutlim) {
if ((acc < cutoff || acc == cutoff) && c > cutlim) {

改为:

[cpp] view plaincopyprint?
  1. if (acc < cutoff || (acc == cutoff && c > cutlim)) {
if (acc < cutoff || (acc == cutoff && c > cutlim)) {

为何要这样修改呢?细心的读者相信还是会记得上文第8节中关于正数的两种溢出情况的可能性:“对于正数来说,它溢出的可能性有两种:

  1. 一种是诸如2147483650,即n > MAX/10 的;
  2. 一种是诸如2147483649,即n == MAX/10 && c > MAX%10。”

也就是说无论是"10522545459",还是"10522545454",都是属于第1种情况,即“诸如2147483650,即n > MAX/10的”,此时直接返回MAX_INT即可,所以不需要也不能再去判断n == MAX/10的情况。

这个处理思路类似于上文第8节处理溢出情况的代码:

[cpp] view plaincopyprint?
  1. if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))
  2. {
  3. n = MAX;
  4. break;
  5. }
  6. else if (sign < 0 && (n > (unsigned)MIN/10
  7. || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))
  8. {
  9. n = MIN;
  10. break;
  11. }
if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))    {    n = MAX;    break;    }    else if (sign < 0 && (n > (unsigned)MIN/10     || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))    {    n = MIN;    break;    }    

So,修改过后的代码测试正常:

10522545459
matoi2=2147483647
-10522545459\
matoi2=-2147483648
10522545454
matoi2=2147483647
-10522545454
matoi2=-2147483648
quit
OK,字符串转换成整数这一问题已基本解决。但如果面试官继续问你,如何把整数转换成字符串呢?欢迎于本文评论下或hero上show出你的思路或代码。

第三十一章、带通配符的字符串匹配问题

字符串匹配问题,给定一串字符串,按照指定规则对其进行匹配,并将匹配的结果保存至output数组中,多个匹配项用空格间隔,最后一个不需要空格。

要求:

  1. 匹配规则中包含通配符?和*,其中?表示匹配任意一个字符,*表示匹配任意多个(>=0)字符。
  2. 匹配规则要求匹配最大的字符子串,例如a*d,匹配abbdd而非abbd,即最大匹配子串。
  3. 匹配后的输入串不再进行匹配,从当前匹配后的字符串重新匹配其他字符串。

请实现函数:char* my_find(char input[], char rule[])

举例说明

input:abcadefg
rule:a?c
output:abc

input :newsadfanewfdadsf
rule: new
output: new new

input :breakfastfood
rule: f*d
output:fastfood

注意事项:

  1. 自行实现函数my_find,勿在my_find函数里夹杂输出,且不准用C、C++库,和Java的String对象;
  2. 请注意代码的时间,空间复杂度,及可读性,简洁性;
  3. input=aaa,rule=aa时,返回一个结果aa,即可。

1、本题与上述第三十章的题不同,上题字符串转换成整数更多考察对思维的全面性和对细节的处理,本题则更多的是编程技巧。闲不多说,直接上代码:

[cpp] view plaincopyprint?
  1. //copyright@cao_peng 2013/4/23
  2. int str_len(char *a) { //字符串长度
  3. if (a == 0) {
  4. return 0;
  5. }
  6. char *t = a;
  7. for (;*t;++t)
  8. ;
  9. return (int) (t - a);
  10. }
  11. void str_copy(char *a,const char *b,int len) { //拷贝字符串 a = b
  12. for (;len > 0; --len, ++b,++a) {
  13. *a = *b;
  14. }
  15. *a = 0;
  16. }
  17. char *str_join(char *a,const char *b,int lenb) { //连接字符串 第一个字符串被回收
  18. char *t;
  19. if (a == 0) {
  20. t = (char *) malloc(sizeof(char) * (lenb + 1));
  21. str_copy(t, b, lenb);
  22. return t;
  23. }
  24. else {
  25. int lena = str_len(a);
  26. t = (char *) malloc(sizeof(char) * (lena + lenb + 2));
  27. str_copy(t, a, lena);
  28. *(t + lena) = ' ';
  29. str_copy(t + lena + 1, b, lenb);
  30. free(a);
  31. return t;
  32. }
  33. }
  34. int canMatch(char *input, char *rule) { // 返回最长匹配长度 -1表示不匹配 
  35. if (*rule == 0) { //已经到rule尾端
  36. return 0;
  37. }
  38. int r = -1 ,may;
  39. if (*rule == '*') {
  40. r = canMatch(input, rule + 1); // *匹配0个字符
  41. if (*input) {
  42. may = canMatch(input + 1, rule); // *匹配非0个字符
  43. if ((may >= 0) && (++may > r)) {
  44. r = may;
  45. }
  46. }
  47. }
  48. if (*input == 0) { //到尾端
  49. return r;
  50. }
  51. if ((*rule == '?') || (*rule == *input)) {
  52. may = canMatch(input + 1, rule + 1);
  53. if ((may >= 0) && (++may > r)) {
  54. r = may;
  55. }
  56. }
  57. return r;
  58. }
  59. char * my_find(char input[], char rule[]) {
  60. int len = str_len(input);
  61. int *match = (int *) malloc(sizeof(int) * len); //input第i位最多能匹配多少位 匹配不上是-1
  62. int i,max_pos = - 1;
  63. char *output = 0;
  64. for (i = 0; i < len; ++i) {
  65. match[i] = canMatch(input + i, rule);
  66. if ((max_pos < 0) || (match[i] > match[max_pos])) {
  67. max_pos = i;
  68. }
  69. }
  70. if ((max_pos < 0) || (match[max_pos] <= 0)) { //不匹配
  71. output = (char *) malloc(sizeof(char));
  72. *output = 0; // \0
  73. return output;
  74. }
  75. for (i = 0; i < len;) {
  76. if (match[i] == match[max_pos]) { //找到匹配
  77. output = str_join(output, input + i, match[i]);
  78. i += match[i];
  79. }
  80. else {
  81. ++i;
  82. }
  83. }
  84. free(match);
  85. return output;
  86. }
//copyright@cao_peng 2013/4/23
int str_len(char *a) {  //字符串长度if (a == 0) {return 0;}char *t = a;for (;*t;++t);return (int) (t - a);
}void str_copy(char *a,const char *b,int len) {  //拷贝字符串 a = bfor (;len > 0; --len, ++b,++a) {*a = *b;}*a = 0;
}char *str_join(char *a,const char *b,int lenb) { //连接字符串 第一个字符串被回收char *t;if (a == 0) {t = (char *) malloc(sizeof(char) * (lenb + 1)); str_copy(t, b, lenb);return t;}else {int lena = str_len(a);t = (char *) malloc(sizeof(char) * (lena + lenb + 2));str_copy(t, a, lena);*(t + lena) = ' ';str_copy(t + lena + 1, b, lenb);free(a);return t;}
}int canMatch(char *input, char *rule) { // 返回最长匹配长度 -1表示不匹配 if (*rule == 0) { //已经到rule尾端return 0;}int r = -1 ,may;if (*rule == '*') {r = canMatch(input, rule + 1);  // *匹配0个字符if (*input) {may = canMatch(input + 1, rule);  // *匹配非0个字符if ((may >= 0) && (++may > r)) {r = may;}}}if (*input == 0) {  //到尾端return r;}if ((*rule == '?') || (*rule == *input)) {may = canMatch(input + 1, rule + 1);if ((may >= 0) && (++may > r)) {r = may;}}return r;
}char * my_find(char  input[],   char rule[]) {int len = str_len(input);int *match = (int *) malloc(sizeof(int) * len);  //input第i位最多能匹配多少位 匹配不上是-1int i,max_pos = - 1;char *output = 0;for (i = 0; i < len; ++i) {match[i] = canMatch(input + i, rule);if ((max_pos < 0) || (match[i] > match[max_pos])) {max_pos = i;}}if ((max_pos < 0) || (match[max_pos] <= 0)) {  //不匹配output = (char *) malloc(sizeof(char));*output = 0;   // \0return output;}for (i = 0; i < len;) {if (match[i] == match[max_pos]) { //找到匹配output = str_join(output, input + i, match[i]);i += match[i];}else {++i;}}free(match);return output;
}

2、本题也可以直接写出DP方程,如下代码所示:

[cpp] view plaincopyprint?
  1. //copyright@chpeih 2013/4/23
  2. char* my_find(char input[], char rule[])
  3. {
  4. //write your code here
  5. int len1,len2;
  6. for(len1 = 0;input[len1];len1++);
  7. for(len2 = 0;rule[len2];len2++);
  8. int MAXN = len1>len2?(len1+1):(len2+1);
  9. int **dp;
  10. //dp[i][j]表示字符串1和字符串2分别以i j结尾匹配的最大长度
  11. //记录dp[i][j]是由之前那个节点推算过来 i*MAXN+j
  12. dp = new int *[len1+1];
  13. for (int i = 0;i<=len1;i++)
  14. {
  15. dp[i] = new int[len2+1];
  16. }
  17. dp[0][0] = 0;
  18. for(int i = 1;i<=len2;i++)
  19. dp[0][i] = -1;
  20. for(int i = 1;i<=len1;i++)
  21. dp[i][0] = 0;
  22. for (int i = 1;i<=len1;i++)
  23. {
  24. for (int j = 1;j<=len2;j++)
  25. {
  26. if(rule[j-1]=='*'){
  27. dp[i][j] = -1;
  28. if (dp[i-1][j-1]!=-1)
  29. {
  30. dp[i][j] = dp[i-1][j-1]+1;
  31. }
  32. if (dp[i-1][j]!=-1 && dp[i][j]<dp[i-1][j]+1)
  33. {
  34. dp[i][j] = dp[i-1][j]+1;
  35. }
  36. }else if (rule[j-1]=='?')
  37. {
  38. if(dp[i-1][j-1]!=-1){
  39. dp[i][j] = dp[i-1][j-1]+1;
  40. }else dp[i][j] = -1;
  41. }
  42. else
  43. {
  44. if(dp[i-1][j-1]!=-1 && input[i-1]==rule[j-1]){
  45. dp[i][j] = dp[i-1][j-1]+1;
  46. }else dp[i][j] = -1;
  47. }
  48. }
  49. }
  50. int m = -1;//记录最大字符串长度
  51. int *ans = new int[len1];
  52. int count_ans = 0;//记录答案个数
  53. char *returnans = new char[len1+1];
  54. int count = 0;
  55. for(int i = 1;i<=len1;i++)
  56. if (dp[i][len2]>m){
  57. m = dp[i][len2];
  58. count_ans = 0;
  59. ans[count_ans++] = i-m;
  60. }else if(dp[i][len2]!=-1 &&dp[i][len2]==m){
  61. ans[count_ans++] = i-m;
  62. }
  63. if (count_ans!=0)
  64. {
  65. int len = ans[0];
  66. for (int i = 0;i<m;i++)
  67. {
  68. printf("%c",input[i+ans[0]]);
  69. returnans[count++] = input[i+ans[0]];
  70. }
  71. for (int j = 1;j<count_ans;j++)
  72. {
  73. printf(" ");
  74. returnans[count++] = ' ';
  75. len = ans[j];
  76. for (int i = 0;i<m;i++)
  77. {
  78. printf("%c",input[i+ans[j]]);
  79. returnans[count++] = input[i+ans[j]];
  80. }
  81. }
  82. printf("\n");
  83. returnans[count++] = '\0';
  84. }
  85. return returnans;
  86. }
//copyright@chpeih 2013/4/23
char* my_find(char  input[],   char rule[])
{//write your code hereint len1,len2;for(len1 = 0;input[len1];len1++);for(len2 = 0;rule[len2];len2++);int MAXN = len1>len2?(len1+1):(len2+1);int  **dp;//dp[i][j]表示字符串1和字符串2分别以i j结尾匹配的最大长度//记录dp[i][j]是由之前那个节点推算过来  i*MAXN+jdp = new int *[len1+1];for (int i = 0;i<=len1;i++){dp[i] = new int[len2+1];}dp[0][0] = 0;for(int i = 1;i<=len2;i++)dp[0][i] = -1;for(int i = 1;i<=len1;i++)dp[i][0] = 0;for (int i = 1;i<=len1;i++){for (int j = 1;j<=len2;j++){if(rule[j-1]=='*'){dp[i][j] = -1;if (dp[i-1][j-1]!=-1){dp[i][j] = dp[i-1][j-1]+1;}if (dp[i-1][j]!=-1 && dp[i][j]<dp[i-1][j]+1){dp[i][j] = dp[i-1][j]+1;}}else if (rule[j-1]=='?'){if(dp[i-1][j-1]!=-1){dp[i][j] = dp[i-1][j-1]+1;}else dp[i][j] = -1;} else{if(dp[i-1][j-1]!=-1 && input[i-1]==rule[j-1]){dp[i][j] = dp[i-1][j-1]+1;}else dp[i][j] = -1;}}}int m = -1;//记录最大字符串长度int *ans = new int[len1];int count_ans = 0;//记录答案个数char *returnans = new char[len1+1];int count = 0;for(int i = 1;i<=len1;i++)if (dp[i][len2]>m){m = dp[i][len2];count_ans = 0;ans[count_ans++] = i-m;}else if(dp[i][len2]!=-1 &&dp[i][len2]==m){ans[count_ans++] = i-m;}if (count_ans!=0){    int len = ans[0];for (int i = 0;i<m;i++){printf("%c",input[i+ans[0]]);returnans[count++] = input[i+ans[0]];}for (int j = 1;j<count_ans;j++){printf(" ");returnans[count++] = ' ';len = ans[j];for (int i = 0;i<m;i++){printf("%c",input[i+ans[j]]);returnans[count++] = input[i+ans[j]];}}printf("\n");returnans[count++] = '\0';}return returnans;
}

欢迎于本文评论下或hero上 show your code。

参考文献及推荐阅读

  1. http://zhedahht.blog.163.com/blog/static/25411174200731139971/;
  2. http://hero.pongo.cn/,本文大部分代码都取自左边hero上参与答题者提交的代码,欢迎你也去挑战;
  3. 字符串转换成整数题目完整描述:http://hero.pongo.cn/Question/Details?ID=47&ExamID=45;
  4. 字符串匹配问题题目完整描述:http://hero.pongo.cn/Question/Details?ID=28&ExamID=28;
  5. linux3.8.4版本下的相关字符串整数转换函数概览:https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/lib/vsprintf.c?id=refs/tags/v3.9.4;
  6. 关于linux中的likely和unlikely:http://blog.21ic.com/user1/5593/archives/2010/68193.html;
  7. 如果你喜欢编程挑战,除了topcoder和hero,你应该还多去leetcode上逛逛:http://leetcode.com/onlinejudge;
  8. atio函数的实现:http://blog.csdn.net/njnu_mjn/article/details/9099405;
  9. atoi函数的实现: linux内核atoi函数的测试:http://blog.csdn.net/njnu_mjn/article/details/9104143;
  10. Nut/OS中atoi函数的实现:http://www.ethernut.de/api/atoi_8c_source.html;
  11. 一读者写的hero上“字符串转换成整数”一题的解题报告(测试正确):http://blog.csdn.net/u011070134/article/details/9116831;

字符串转换成整数,通配符的字符串匹配问题相关推荐

  1. 程序员编程艺术第三十 三十一章 字符串转换成整数,通配符字符串匹配

    分享一下我老师大神的人工智能教程!零基础,通俗易懂!http://blog.csdn.net/jiangjunshow 也欢迎大家转载本篇文章.分享知识,造福人民,实现我们中华民族伟大复兴! 第三十~ ...

  2. 字符串转换成整数,带通配符的字符串匹配

    之前本一直想写写神经网络算法和EM算法,但写这两个算法实在需要大段大段的时间,而平时上班,周末则跑去北大教室自习看书(顺便以时间为序,说下过去半年看过的自觉还不错的数学史方面的书:<数理统计学简 ...

  3. 程序员编程艺术第三十~三十一章:字符串转换成整数,通配符字符串匹配

    第三十~三十一章:字符串转换成整数,带通配符的字符串匹配 前言 之前本一直想写写神经网络算法和EM算法,但写这两个算法实在需要大段大段的时间,而平时上班,周末则跑去北大教室自习看书(顺便以时间为序,说 ...

  4. 字符串转换成整数,字符串匹配问题

    本文转自csdn大神v_JULY_v的博客 地址: http://blog.csdn.net/v_july_v/article/details/9024123 阅读心得:自己原先想得太天真了... 第 ...

  5. 判断字符为空_49. 把字符串转换成整数(剑指offer)

    49. 把字符串转换成整数 将一个字符串转换成一个整数,要求不能使用字符串转换整数的库函数. 数值为0或者字符串不是一个合法的数值则返回0. 输入描述: 输入一个字符串,包括数字字母符号,可以为空 输 ...

  6. c语言字符怎么变成整数,c语言,字符串转换成整数

    c语言的数字字符串转换为整数,1.可接受"123 2123"处理为1232123;2.空指针.正负号.非纯数字字符串.数据越界溢出的错误处理. #include #include ...

  7. 如何把一个字符串转换成整数

    剑指offer第一章的例子,据说是微软的面试题,发现自己又躺枪了.字符串处理有多烦人不用我多说了吧. //基础版代码 int StrToInt(char* string) {int number = ...

  8. oracle 转化为整数,字符串转换成整数——从源码学习

    字符串转换成整数:输入一个表示整数的字符串,把该字符串转换成整数并输出,例如输入字符串"345",则输出整数345. 在笔试面试中,atoi 即「字符串转换成整数」是一个经典问题了 ...

  9. python 字符串转换成整数

    | String to Int 写一个函数 StrToInt,实现把字符串转换成整数这个功能.不能使用 atoi 或者其他类似的库函数. 首先,该函数会根据需要丢弃无用的开头空格字符,直到寻找到第一个 ...

最新文章

  1. 【机器学习】基于自适应变异粒子群算法的非线性函数寻优
  2. [TJOI2018]xor
  3. 初学者自学python要看什么书-从python初学者到入门算法,这几本书一定要看,附PDF...
  4. DNS服务器以及正向名称解析机制
  5. oracle locked timed,Oracle里面的用户smsdb无法登录 LOCKED(TIMED)
  6. 不看绝对血亏!java字符串转json
  7. mac上的mysql管理工具sequel pro
  8. .net core 装了没生效_王者荣耀:辅助装出现惊天bug?辅助光环对自己无效?
  9. 使用D9的SetFVF无法控制数据字段在内存中的顺序,所有字段的顺序都是固定的。自能选择好用还是不用。...
  10. Spring MVC-表单(Form)标签-单选按钮(RadioButton)示例(转载实践)
  11. 判断闰年的c语言程序_身为程序员还记得C语言经典算法(附带答案)吗?
  12. 超简单的windows发包工具—小兵以太网测试仪
  13. 下docfetcher先下Java,docfetcher
  14. EChart介绍和使用
  15. 3G网络和4G网络的区别
  16. 177、模拟行走机器人
  17. 如何建立风险分析模型
  18. C#注册机与绑定软件(转发自:韩兆新的博客园的C#学习笔记——软件注册与注册机)...
  19. SSM整合(Spring+SpringMVC+MyBatis)
  20. DB2表空间状态代码解释

热门文章

  1. vue跨域 Uncaught (in promise) Proxy error: Could not proxy request xxx from xxx to xxx (EPROTO)解决办法
  2. 人人租机获数千万元A轮融资,投资方为蚂蚁金服
  3. 前端jQuery的jQuery 之家的插件(使用方法)
  4. 递推公式斐波那契数列的几种求法
  5. SQLServer--------附加数据库以及解决附加时出现的错误
  6. 胖客户机服务器系统搭建,瘦(胖)客户机-终端服务器模式的设计及应用
  7. IDEA 安装PlantUML
  8. python Http客户端urllib3包使用
  9. 如何手画三相电相(线)电压(流)波形图
  10. Unity中简单的matcap+fresnel shader的实现