ANSI C文档(C89 http://flash-gordon.me.uk/ansi.c.txt)如是说:

4.3 CHARACTER HANDLING <ctype.h>

The header <ctype.h> declares several functions useful for testingand mapping characters./89/ In all cases the argument is an int , the value of which shall be representable as an unsigned char or shallequal the value of the macro EOF .  If the argument has any othervalue, the behavior is undefined.

这里说明了ctype.h里面实现的所有测试字符的函数传进去的参数都以int形参给出,并且有效值是无符号字符(unsigned char)或EOF宏(通常实现为-1),其他输入的int型值行为未定义。

C标准库规定了需要实现的13个字符处理函数,可以分为两类——字符测试函数和字符转换函数。

下面给出了C99的字符测试函数表,表中是12个字符测试函数,比原来的标准多出来一个isblank。http://www.open-std.org/JTC1/SC22/WG14/www/docs/C99RationaleV5.10.pdf P118

7.4.1.3 The isblank function

A new feature of C99: text processing applications often need to distinguish white space that can 15 occur within lines from white space that separates lines (for example, see §6.10 regarding use of whitespace in the preprocessor). This distinction is also a property of POSIX locale definition files.

ASCII values characters iscntrl isblank isspace isupper islower isalpha isdigit isxdigit isalnum ispunct isgraph isprint
0x00 .. 0x08 NUL, (other control codes) x                      
0x09 tab ('\t') x x x                  
0x0A .. 0x0D (white-space control codes: '\f','\v','\n','\r') x   x                  
0x0E .. 0x1F (other control codes) x                      
0x20 space (' ')   x x                 x
0x21 .. 0x2F !"#$%&'()*+,-./                   x x x
0x30 .. 0x39 0123456789             x x x   x x
0x3a .. 0x40 :;<=>?@                   x x x
0x41 .. 0x46 ABCDEF       x   x   x x   x x
0x47 .. 0x5A GHIJKLMNOPQRSTUVWXYZ       x   x     x   x x
0x5B .. 0x60 [\]^_`                   x x x
0x61 .. 0x66 abcdef         x x   x x   x x
0x67 .. 0x7A ghijklmnopqrstuvwxyz         x x     x   x x
0x7B .. 0x7E {|}~                   x x x
0x7F (DEL) x                      

上表来源:http://www.cplusplus.com/reference/cctype/

下面英文部分是字符测试函数要求描述,摘自C89标准文档说明

4.3.1 Character testing functions

The functions in this section return nonzero (true) if and only ifthe value of the argument c conforms to that in the description of thefunction.

4.3.1.1 The isalnum functionSynopsis

#include <ctype.h>

int isalnum(int c);

Description   The isalnum function tests for any character for which isalpha orisdigit is true.

4.3.1.2 The isalpha functionSynopsis

#include <ctype.h>

int isalpha(int c);

Description   The isalpha function tests for any character for which isupper orislower is true, or any of an implementation-defined set of charactersfor which none of iscntrl , isdigit , ispunct , or isspace is true.In the C locale, isalpha returns true only for the characters forwhich isupper or islower is true.

4.3.1.3 The iscntrl functionSynopsis

#include <ctype.h>

int iscntrl(int c);

Description   The iscntrl function tests for any control character.

4.3.1.4 The isdigit functionSynopsis

#include <ctype.h>

int isdigit(int c);

Description   The isdigit function tests for any decimal-digit character (asdefined in $2.2.1).

4.3.1.5 The isgraph functionSynopsis

#include <ctype.h>

int isgraph(int c);

Description   The isgraph function tests for any printing character except space (' ').

4.3.1.6 The islower functionSynopsis

#include <ctype.h>

int islower(int c);Description

The islower function tests for any lower-case letter or any of animplementation-defined set of characters for which none of iscntrl ,isdigit , ispunct , or isspace is true.  In the C locale, islowerreturns true only for the characters defined as lower-case letters (asdefined in $2.2.1).

4.3.1.7 The isprint functionSynopsis

#include <ctype.h>

int isprint(int c);

Description   The isprint function tests for any printing character includingspace (' ').

4.3.1.8 The ispunct functionSynopsis

#include <ctype.h>

int ispunct(int c);

Description   The ispunct function tests for any printing character except space(' ') or a character for which isalnum is true.

4.3.1.9 The isspace functionSynopsis

#include <ctype.h>

int isspace(int c);

Description   The isspace function tests for the standard white-space charactersor for any of an implementation-defined set of characters for whichisalnum is false.  The standard white-space characters are thefollowing: space (' '), form feed ('\f'), new-line ('\n'), carriagereturn ('\r'), horizontal tab ('\t'), and vertical tab ('\v').  In theC locale, isspace returns true only for the standard white-spacecharacters.

4.3.1.10 The isupper functionSynopsis

#include <ctype.h>

int isupper(int c);

Description   The isupper function tests for any upper-case letter or any of animplementation-defined set of characters for which none of iscntrl ,isdigit , ispunct , or isspace is true.  In the C locale, isupperreturns true only for the characters defined as upper-case letters (asdefined in $2.2.1).

4.3.1.11 The isxdigit functionSynopsis

#include <ctype.h>

int isxdigit(int c);

Description   The isxdigit function tests for any hexadecimal-digit character (asdefined in $3.1.3.2).

我原来以为要isalpha()判断是否是英文字母类型需要一些像 if(c>='A' && c<='Z')之类的代码,但看完了,发现是查表。在线程数据初始化的时候会构造一个所有控制字符和可视字符的数组,存在线程局部数据表里;在我们使用以上函数判断是否为某一类型时,查表确定结果。下面先给出表(数组),探究表设计的原理,然后把跟踪过程学习到的一些windowsAPI原理以及系统知识罗列出来。

在VC/ctr/src/ctype.c里,有如下定义:

const unsigned short __newctype[384] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0,0,                      /* -1 EOF   */_CONTROL,               /* 00 (NUL) */_CONTROL,               /* 01 (SOH) */_CONTROL,               /* 02 (STX) */_CONTROL,               /* 03 (ETX) */_CONTROL,               /* 04 (EOT) */_CONTROL,               /* 05 (ENQ) */_CONTROL,               /* 06 (ACK) */_CONTROL,               /* 07 (BEL) */_CONTROL,               /* 08 (BS)  */_SPACE+_CONTROL,        /* 09 (HT)  */_SPACE+_CONTROL,        /* 0A (LF)  */_SPACE+_CONTROL,        /* 0B (VT)  */_SPACE+_CONTROL,        /* 0C (FF)  */_SPACE+_CONTROL,        /* 0D (CR)  */_CONTROL,               /* 0E (SI)  */_CONTROL,               /* 0F (SO)  */_CONTROL,               /* 10 (DLE) */_CONTROL,               /* 11 (DC1) */_CONTROL,               /* 12 (DC2) */_CONTROL,               /* 13 (DC3) */_CONTROL,               /* 14 (DC4) */_CONTROL,               /* 15 (NAK) */_CONTROL,               /* 16 (SYN) */_CONTROL,               /* 17 (ETB) */_CONTROL,               /* 18 (CAN) */_CONTROL,               /* 19 (EM)  */_CONTROL,               /* 1A (SUB) */_CONTROL,               /* 1B (ESC) */_CONTROL,               /* 1C (FS)  */_CONTROL,               /* 1D (GS)  */_CONTROL,               /* 1E (RS)  */_CONTROL,               /* 1F (US)  */_SPACE+_BLANK,          /* 20 SPACE */_PUNCT,                 /* 21 !     */_PUNCT,                 /* 22 "     */_PUNCT,                 /* 23 #     */_PUNCT,                 /* 24 $     */_PUNCT,                 /* 25 %     */_PUNCT,                 /* 26 &     */_PUNCT,                 /* 27 '     */_PUNCT,                 /* 28 (     */_PUNCT,                 /* 29 )     */_PUNCT,                 /* 2A *     */_PUNCT,                 /* 2B +     */_PUNCT,                 /* 2C ,     */_PUNCT,                 /* 2D -     */_PUNCT,                 /* 2E .     */_PUNCT,                 /* 2F /     */_DIGIT+_HEX,            /* 30 0     */_DIGIT+_HEX,            /* 31 1     */_DIGIT+_HEX,            /* 32 2     */_DIGIT+_HEX,            /* 33 3     */_DIGIT+_HEX,            /* 34 4     */_DIGIT+_HEX,            /* 35 5     */_DIGIT+_HEX,            /* 36 6     */_DIGIT+_HEX,            /* 37 7     */_DIGIT+_HEX,            /* 38 8     */_DIGIT+_HEX,            /* 39 9     */_PUNCT,                 /* 3A :     */_PUNCT,                 /* 3B ;     */_PUNCT,                 /* 3C <     */_PUNCT,                 /* 3D =     */_PUNCT,                 /* 3E >     */_PUNCT,                 /* 3F ?     */_PUNCT,                 /* 40 @     */_UPPER+_HEX,            /* 41 A     */_UPPER+_HEX,            /* 42 B     */_UPPER+_HEX,            /* 43 C     */_UPPER+_HEX,            /* 44 D     */_UPPER+_HEX,            /* 45 E     */_UPPER+_HEX,            /* 46 F     */_UPPER,                 /* 47 G     */_UPPER,                 /* 48 H     */_UPPER,                 /* 49 I     */_UPPER,                 /* 4A J     */_UPPER,                 /* 4B K     */_UPPER,                 /* 4C L     */_UPPER,                 /* 4D M     */_UPPER,                 /* 4E N     */_UPPER,                 /* 4F O     */_UPPER,                 /* 50 P     */_UPPER,                 /* 51 Q     */_UPPER,                 /* 52 R     */_UPPER,                 /* 53 S     */_UPPER,                 /* 54 T     */_UPPER,                 /* 55 U     */_UPPER,                 /* 56 V     */_UPPER,                 /* 57 W     */_UPPER,                 /* 58 X     */_UPPER,                 /* 59 Y     */_UPPER,                 /* 5A Z     */_PUNCT,                 /* 5B [     */_PUNCT,                 /* 5C \     */_PUNCT,                 /* 5D ]     */_PUNCT,                 /* 5E ^     */_PUNCT,                 /* 5F _     */_PUNCT,                 /* 60 `     */_LOWER+_HEX,            /* 61 a     */_LOWER+_HEX,            /* 62 b     */_LOWER+_HEX,            /* 63 c     */_LOWER+_HEX,            /* 64 d     */_LOWER+_HEX,            /* 65 e     */_LOWER+_HEX,            /* 66 f     */_LOWER,                 /* 67 g     */_LOWER,                 /* 68 h     */_LOWER,                 /* 69 i     */_LOWER,                 /* 6A j     */_LOWER,                 /* 6B k     */_LOWER,                 /* 6C l     */_LOWER,                 /* 6D m     */_LOWER,                 /* 6E n     */_LOWER,                 /* 6F o     */_LOWER,                 /* 70 p     */_LOWER,                 /* 71 q     */_LOWER,                 /* 72 r     */_LOWER,                 /* 73 s     */_LOWER,                 /* 74 t     */_LOWER,                 /* 75 u     */_LOWER,                 /* 76 v     */_LOWER,                 /* 77 w     */_LOWER,                 /* 78 x     */_LOWER,                 /* 79 y     */_LOWER,                 /* 7A z     */_PUNCT,                 /* 7B {     */_PUNCT,                 /* 7C |     */_PUNCT,                 /* 7D }     */_PUNCT,                 /* 7E ~     */_CONTROL,               /* 7F (DEL) *//* and the rest are 0... */
};

再补充一下宏(在VC/include/ctype.h):

/* set bit masks for the possible character types */#define _UPPER          0x1     /* upper case letter */
#define _LOWER          0x2     /* lower case letter */
#define _DIGIT          0x4     /* digit[0-9] */
#define _SPACE          0x8     /* tab, carriage return, newline, *//* vertical tab or form feed */
#define _PUNCT          0x10    /* punctuation character */
#define _CONTROL        0x20    /* control character */
#define _BLANK          0x40    /* space char */
#define _HEX            0x80    /* hexadecimal digit */#define _LEADBYTE       0x8000                  /* multibyte leadbyte */
#define _ALPHA          (0x0100|_UPPER|_LOWER)  /* alphabetic character */

那要怎么使用呢?__newctype[]里面前128个元素值都是0,后面的元素在注释里标注了对应的ascii码值,也就是说__newctype+128就是字符‘\0’对应的元素值了。实际在VC实现里面就是使用__newctype+128的形式(还有__newctype[-1]呢!取到的值就是EOF)。上面定义的_UPPER,__LOWER等,明显是使用了位设置单一属性,这样,我们如果要判断一个字符c是否为英文字符,只需要判断 __newctype[128+(int)c]&(128|_UPPER|_LOWER)就可以了!只是一个位与运算。这就是查表原理的实现。照这样说,后面的诸多判断函数,都不用去跟踪解析了,哈哈!

那么,这些内容是存在哪里?我是怎么能跟踪到这个地方的呢?这个过程,我学习到了几个新知识,下面一步一步展开。

写一段再简单不过的语句

#include <ctype.h>
void main()
{ int i = 64;int kk = isalpha(i);
}

这是我跟踪的调用栈:

     msvcr90d.dll!__set_flsgetvalue()  Line 256  Cmsvcr90d.dll!_getptd_noexit()  Line 578 + 0xb bytes   Cmsvcr90d.dll!_getptd()  Line 641 + 0x5 bytes  Cmsvcr90d.dll!_LocaleUpdate::_LocaleUpdate(localeinfo_struct * plocinfo=0x00000000)  Line 264 + 0x5 bytes C++
>    msvcr90d.dll!_chvalidator_l(localeinfo_struct * plocinfo=0x00000000, int c=0x00000040, int mask=0x00000103)  Line 68 C++msvcr90d.dll!_chvalidator(int c=0x00000040, int mask=0x00000103)  Line 57 + 0xf bytes   C++msvcr90d.dll!isalpha(int c=0x00000040)  Line 69 + 0xe bytes  C++
>    ConsoleApp.exe!main()  Line 11 + 0xc bytes C

第一层(在上面的表值定义里也有的#define _ALPHA          (0x0100|_UPPER|_LOWER)  /* alphabetic character */):

extern __inline int (__cdecl isalpha) (int c)
{if (__locale_changed == 0){return __fast_ch_check(c, _ALPHA);}else{return (_isalpha_l)(c, NULL);}
}

我们进入的是__fast_ch_check函数(__local_changed我找不到其意义的定义,但我看了_isalpha_l的实现,确定其最终与__fast_ch_check调用到同一个地方,所以在分析过程可以忽略)。__fast_ch_check是一个宏:

#ifdef _DEBUG
#define __fast_ch_check(a,b)       _chvalidator(a,b)
#else  /* _DEBUG */
#define __fast_ch_check(a,b)       (__initiallocinfo.pctype[(a)] & (b))
#endif  /* _DEBUG */

调试的时候我们使用的是_DEBUG版本的:

#if defined (_DEBUG)
extern "C" int __cdecl _chvalidator(int c,int mask)
{_ASSERTE((unsigned)(c + 1) <= 256);return _chvalidator_l(NULL, c, mask);
}extern "C" int __cdecl _chvalidator_l(_locale_t plocinfo,int c,int mask)
{_LocaleUpdate _loc_update(plocinfo);_ASSERTE((unsigned)(c + 1) <= 256);if (c >= -1 && c <= 255){return (_loc_update.GetLocaleT()->locinfo->pctype[c] & mask);}else{return (_loc_update.GetLocaleT()->locinfo->pctype[-1] & mask);}
}#endif  /* defined (_DEBUG) */

到上面这段代码,mask是0x103(_ALPHA),c是我们输入的0x40(64)。可以看到,这里使用了C++的语法,用了类的函数调用GetLocaleT()。如果输入在-1到255范围内,取pctype[c]与mask作位与运算的结果为返回值。

问题是:_LocaleUpdate类的作用是什么?这个pctype数组又是什么内容?

#ifdef __cplusplus
class _LocaleUpdate
{_locale_tstruct localeinfo;_ptiddata ptd;bool updated;public:_LocaleUpdate(_locale_t plocinfo): updated(false){if (plocinfo == NULL){ptd = _getptd();localeinfo.locinfo = ptd->ptlocinfo;localeinfo.mbcinfo = ptd->ptmbcinfo;__UPDATE_LOCALE(ptd, localeinfo.locinfo);__UPDATE_MBCP(ptd, localeinfo.mbcinfo);if (!(ptd->_ownlocale & _PER_THREAD_LOCALE_BIT)){ptd->_ownlocale |= _PER_THREAD_LOCALE_BIT;updated = true;}}else{localeinfo=*plocinfo;}}~_LocaleUpdate(){if (updated)ptd->_ownlocale = ptd->_ownlocale & ~_PER_THREAD_LOCALE_BIT;}_locale_t GetLocaleT(){return &localeinfo;}
};
#endif  /* __cplusplus */

可以清楚的看到, GetLocaleT返回的是&localeinfo,localeinfo中的locinfo结构中的pctype是我们需要找的数据,自然追到_getptd()函数(在tidtable.c文件中):

_ptiddata __cdecl _getptd (void)
{_ptiddata ptd = _getptd_noexit();if (!ptd) {_amsg_exit(_RT_THREAD); /* write message and die */}return ptd;
}
_ptiddata __cdecl _getptd_noexit (void)
{_ptiddata ptd;DWORD   TL_LastError;TL_LastError = GetLastError();#ifdef _M_IX86/** Initialize FlsGetValue function pointer in TLS by calling __set_flsgetvalue()*/if ( (ptd = (__set_flsgetvalue())(__flsindex)) == NULL ) {
#else  /* _M_IX86 */if ( (ptd = FLS_GETVALUE(__flsindex)) == NULL ) {
#endif  /* _M_IX86 *//** no per-thread data structure for this thread. try to create* one.*/
#ifdef _DEBUGextern void * __cdecl _calloc_dbg_impl(size_t, size_t, int, const char *, int, int *);if ((ptd = _calloc_dbg_impl(1, sizeof(struct _tiddata), _CRT_BLOCK, __FILE__, __LINE__, NULL)) != NULL) {
#else  /* _DEBUG */if ((ptd = _calloc_crt(1, sizeof(struct _tiddata))) != NULL) {
#endif  /* _DEBUG */if (FLS_SETVALUE(__flsindex, (LPVOID)ptd) ) {/** Initialize of per-thread data*/_initptd(ptd,NULL);ptd->_tid = GetCurrentThreadId();ptd->_thandle = (uintptr_t)(-1);}else {/** Return NULL to indicate failure*/_free_crt(ptd);ptd = NULL;}}}SetLastError(TL_LastError);return(ptd);
}

只需要注意_set_flsgetvalue函数:

_CRTIMP PFLS_GETVALUE_FUNCTION __cdecl __set_flsgetvalue()
{
#ifdef _M_IX86PFLS_GETVALUE_FUNCTION flsGetValue = FLS_GETVALUE;if (!flsGetValue){flsGetValue = _decode_pointer(gpFlsGetValue);TlsSetValue(__getvalueindex, flsGetValue);}return flsGetValue;
#else  /* _M_IX86 */return NULL;
#endif  /* _M_IX86 */
}

其中有宏定义:

#define FLS_GETVALUE    ((PFLS_GETVALUE_FUNCTION)TlsGetValue(__getvalueindex))

至此,我们跟到了windows API层次!TlsGetValue,跟不下去了。__getvalueindex是一个全局变量,值为1。这个函数返回函数指针,__set_flsgetvalue())(__flsindex),就相当于TlsGetValue(__flsindex)。

那么TlsGetValue到底干了什么事,返回一个其中带有pctype结构的结构体。当然问MSDN了。

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686812(v=vs.85).aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686749(v=vs.85).aspx

额,这些东西涉及windows操作系统线程的一些数据原理了。简单说就是线程运行需要开辟自己的数据空间,用一个指针数组存放数据的指针,以tlsindex来访问。那回过头来,我们这个线程所使用的数据是用TlsGetValue(tlsindex)取得——但从哪里设置的呢?根据MSDN上说的,那么是用TlsSetValue来设置的咯。

同样在pctype.c里面有如下初始化函数(根据经验找到的,再经过断点验证猜测):

int __cdecl _mtinit (void)
{_ptiddata ptd;#ifdef _M_IX86/** Initialize fiber local storage function pointers.*/HINSTANCE hKernel32 = _crt_wait_module_handle(_KERNEL32);if (hKernel32 == NULL) {_mtterm();return FALSE;       /* fail to load DLL */}gpFlsAlloc = (PFLS_ALLOC_FUNCTION)GetProcAddress(hKernel32,"FlsAlloc");gpFlsGetValue = (PFLS_GETVALUE_FUNCTION)GetProcAddress(hKernel32,"FlsGetValue");gpFlsSetValue = (PFLS_SETVALUE_FUNCTION)GetProcAddress(hKernel32,"FlsSetValue");gpFlsFree = (PFLS_FREE_FUNCTION)GetProcAddress(hKernel32,"FlsFree");if (!gpFlsAlloc || !gpFlsGetValue || !gpFlsSetValue || !gpFlsFree) {gpFlsAlloc = (PFLS_ALLOC_FUNCTION)__crtTlsAlloc;gpFlsGetValue = (PFLS_GETVALUE_FUNCTION)TlsGetValue;gpFlsSetValue = (PFLS_SETVALUE_FUNCTION)TlsSetValue;gpFlsFree = (PFLS_FREE_FUNCTION)TlsFree;}/** Allocate and initialize a TLS index to store FlsGetValue pointer* so that the FLS_* macros can work transparently*/if ( (__getvalueindex = TlsAlloc()) == TLS_OUT_OF_INDEXES ||!TlsSetValue(__getvalueindex, (LPVOID)gpFlsGetValue) ) {return FALSE;}
#endif  /* _M_IX86 */_init_pointers();       /* initialize global function pointers */#ifdef _M_IX86/** Encode the fiber local storage function pointers*/gpFlsAlloc = (PFLS_ALLOC_FUNCTION) _encode_pointer(gpFlsAlloc);gpFlsGetValue = (PFLS_GETVALUE_FUNCTION) _encode_pointer(gpFlsGetValue);gpFlsSetValue = (PFLS_SETVALUE_FUNCTION) _encode_pointer(gpFlsSetValue);gpFlsFree = (PFLS_FREE_FUNCTION) _encode_pointer(gpFlsFree);
#endif  /* _M_IX86 *//** Initialize the mthread lock data base*/if ( !_mtinitlocks() ) {_mtterm();return FALSE;       /* fail to load DLL */}/** Allocate a TLS index to maintain pointers to per-thread data*/if ( (__flsindex = FLS_ALLOC(&_freefls)) == FLS_OUT_OF_INDEXES ) {_mtterm();return FALSE;       /* fail to load DLL */}/** Create a per-thread data structure for this (i.e., the startup)* thread.*/if ( ((ptd = _calloc_crt(1, sizeof(struct _tiddata))) == NULL) ||!FLS_SETVALUE(__flsindex, (LPVOID)ptd) ){_mtterm();return FALSE;       /* fail to load DLL */}/** Initialize the per-thread data*/_initptd(ptd,NULL);ptd->_tid = GetCurrentThreadId();ptd->_thandle = (uintptr_t)(-1);return TRUE;
}

关键点在:_initptd():

_CRTIMP void __cdecl _initptd (_ptiddata ptd,pthreadlocinfo ptloci)
{
#ifdef _M_IX86HINSTANCE hKernel32 = _crt_wait_module_handle(_KERNEL32);
#endif  /* _M_IX86 */ptd->_pxcptacttab = (void *)_XcptActTab;ptd->_holdrand = 1L;#ifdef _M_IX86if (hKernel32 != NULL){// Initialize the function pointers in the ptd dataptd->_encode_ptr = GetProcAddress(hKernel32, _ENCODE_POINTER);ptd->_decode_ptr = GetProcAddress(hKernel32, _DECODE_POINTER);}
#endif  /* _M_IX86 */// It is necessary to always have GLOBAL_LOCALE_BIT set in perthread data// because when doing bitwise or, we won't get __UPDATE_LOCALE to work when// global per thread locale is set.ptd->_ownlocale = _GLOBAL_LOCALE_BIT;// Initialize _setloc_data. These are the only valuse that need to be// initialized.ptd->_setloc_data._cachein[0]='C';ptd->_setloc_data._cacheout[0]='C';ptd->ptmbcinfo = &__initialmbcinfo;_mlock(_MB_CP_LOCK);__try{InterlockedIncrement(&(ptd->ptmbcinfo->refcount));}__finally{_munlock(_MB_CP_LOCK);}// We need to make sure that ptd->ptlocinfo in never NULL, this saves us// perf counts when UPDATING locale._mlock(_SETLOCALE_LOCK);__try {ptd->ptlocinfo = ptloci;/** Note that and caller to _initptd could have passed __ptlocinfo, but* that will be a bug as between the call to _initptd and __addlocaleref* the global locale may have changed and ptloci may be pointing to invalid* memory. Thus if the wants to set the locale to global, NULL should* be passed.*/if (ptd->ptlocinfo == NULL)ptd->ptlocinfo = __ptlocinfo;__addlocaleref(ptd->ptlocinfo);}__finally {_munlock(_SETLOCALE_LOCK);}
}

从上面调用可知道,ptloci传进来的是0,所以ptd->ptlocinfo = __ptlocinfo,看其定义:

pthreadlocinfo __ptlocinfo = &__initiallocinfo;

再看:

typedef struct threadlocaleinfostruct * pthreadlocinfo;
typedef struct threadlocaleinfostruct {int refcount;unsigned int lc_codepage;unsigned int lc_collate_cp;unsigned long lc_handle[6]; /* LCID */LC_ID lc_id[6];struct {char *locale;wchar_t *wlocale;int *refcount;int *wrefcount;} lc_category[6];int lc_clike;int mb_cur_max;int * lconv_intl_refcount;int * lconv_num_refcount;int * lconv_mon_refcount;struct lconv * lconv;int * ctype1_refcount;unsigned short * ctype1;const unsigned short * pctype;const unsigned char * pclmap;const unsigned char * pcumap;struct __lc_time_data * lc_time_curr;
} threadlocinfo;

看到pctype了,跟之前直接找定义是同一个地方的,只是上面用了倒推的方法一步一步找。对于数据,直接可以看到:

threadlocinfo __initiallocinfo = {1,                                        /* refcount                 */_CLOCALECP,                               /* lc_codepage              */_CLOCALECP,                               /* lc_collate_cp            */{   _CLOCALEHANDLE,                       /* lc_handle[_ALL]          */_CLOCALEHANDLE,                       /* lc_handle[_COLLATE]      */_CLOCALEHANDLE,                       /* lc_handle[_CTYPE]        */_CLOCALEHANDLE,                       /* lc_handle[_MONETARY]     */_CLOCALEHANDLE,                       /* lc_handle[_NUMERIC]      */_CLOCALEHANDLE                        /* lc_handle[_TIME]         */},{   {0, 0, 0},                            /* lc_id[LC_ALL]            */{0, 0, 0},                            /* lc_id[LC_COLLATE]        */{0, 0, 0},                            /* lc_id[LC_CTYPE]          */{0, 0, 0},                            /* lc_id[LC_MONETARY]       */{0, 0, 0},                            /* lc_id[LC_NUMERIC]        */{0, 0, 0}                             /* lc_id[LC_TIME]           */},{   {NULL, NULL, NULL, NULL},             /* lc_category[LC_ALL]      */{__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_COLLATE]  */{__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_CTYPE]    */{__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_MONETARY] */{__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_NUMERIC]  */{__clocalestr, NULL, NULL, NULL}      /* lc_category[LC_TIME]     */},1,                                        /* lc_clike                 */1,                                        /* mb_cur_max               */NULL,                                     /* lconv_intl_refcount      */NULL,                                     /* lconv_num_refcount       */NULL,                                     /* lconv_mon_refcount       */&__lconv_c,                               /* lconv                    */NULL,                                     /* ctype1_refcount          */NULL,                                     /* ctype1                   */__newctype + 128,                         /* pctype                   */__newclmap + 128,                         /* pclmap                   */__newcumap + 128,                         /* pcumap                   */&__lc_time_c,                             /* lc_time_curr             */
};

数一数结构,pctype对应的正好是__newctype+128。那么,一切都清楚了!

跟踪的过程,是剖析实现的过程,追到windowsAPI,有助到了解windows的接口和原理。

下面英文部分是字符转换函数要求描述,摘自C89标准文档说明

4.3.2 Character case mapping functions

4.3.2.1 The tolower functionSynopsis

#include <ctype.h>

int tolower(int c);

Description   The tolower function converts an upper-case letter to thecorresponding lower-case letter.Returns   If the argument is an upper-case letter, the tolower functionreturns the corresponding lower-case letter if there is one; otherwisethe argument is returned unchanged.  In the C locale, tolower mapsonly the characters for which isupper is true to the correspondingcharacters for which islower is true.

微软的实现应该是直接用汇编:

extern "C" int __cdecl tolower (int c)
{
70195760  mov         edi,edi
70195762  push        ebp
70195763  mov         ebp,esp
70195765  push        ecx  if (__locale_changed == 0)
70195766  cmp         dword ptr [___locale_changed (702362C8h)],0
7019576D  jne         tolower+33h (70195793h) {return __ascii_towlower(c);
7019576F  cmp         dword ptr [c],41h
70195773  jl          tolower+26h (70195786h)
70195775  cmp         dword ptr [c],5Ah
70195779  jg          tolower+26h (70195786h)
7019577B  mov         eax,dword ptr [c]
7019577E  add         eax,20h
70195781  mov         dword ptr [ebp-4],eax
70195784  jmp         tolower+2Ch (7019578Ch)
70195786  mov         ecx,dword ptr [c]
70195789  mov         dword ptr [ebp-4],ecx
7019578C  mov         eax,dword ptr [ebp-4]
7019578F  jmp         tolower+41h (701957A1h) }else
70195791  jmp         tolower+41h (701957A1h) {return _tolower_l(c, NULL);
70195793  push        0
70195795  mov         edx,dword ptr [c]
70195798  push        edx
70195799  call        _tolower_l (70195580h)
7019579E  add         esp,8 }
}
701957A1  mov         esp,ebp
701957A3  pop         ebp
701957A4  ret   

4.3.2.2 The toupper functionSynopsis

#include <ctype.h>

int toupper(int c);

Description   The toupper function converts a lower-case letter to the corresponding upper-case letter.  Returns   If the argument is a lower-case letter, the toupper functionreturns the corresponding upper-case letter if there is one; otherwisethe argument is returned unchanged.  In the C locale, toupper mapsonly the characters for which islower is true to the correspondingcharacters for which isupper is true.

C标准库源码解读(VC9.0版本)——ctype.h相关推荐

  1. 彻底弄懂Python标准库源码(一)—— os模块

    目录 第1~22行 模块整体注释.nt与posix 第24~46行 模块引入._exists方法._get_exports_list方法 第48~97行 根据系统不同导入不同的方法和属性 第100~1 ...

  2. 整理网上资料---C标准库值篇二 :标准库源码下载地址、标准库手册下载地址

    C标准库源码及手册.zip,包括: http://download.csdn.net/detail/yangzhao0001/9057823 C标准库函数集(头文件索引)------手册-AH.pdf ...

  3. libco协程库源码解读

    2019独角兽企业重金招聘Python工程师标准>>> 协程,又被称为用户级线程,是在应用层被调度,可以减少因为调用系统调用而阻塞的线程切换的时间.目前有很多协程的实现,由于微信内部 ...

  4. Kotlin StandardKt 标准库源码走一波

    距离上篇Kotlin文章,应该有差不多半年时间.没别的原因,因为,懒,而且,想产出一篇稍微质量好的博客好难. 最近在研究Python,所以最近也可能会更新一些Python的学习笔记. Standard ...

  5. 别人家SDK的设计模式——Android Retrofit库源码解读

    作者:网易合作产品部·李若昆 我们在日常编写代码中免不了会用到各种各样第三方库,网络请求.图片加载.数据库等等.有些lib接入可能方便到几行代码搞定,有些lib可能从demo.文档到测试都是坑(比如l ...

  6. Swift标准库源码阅读笔记 - Array和ContiguousArray

    关于 ContiguousArray ,这边有喵神的文章介绍的很详细了,可以先看看这个文章. Array 接着喵神的思路,看一下 Array 以下是从源码中截取的代码片段. public struct ...

  7. 彻底弄懂Python标准库源码(三)—— pprint模块

    目录 模块整体注释 依赖模块导入.对外暴露接口 saferepr函数--返回对象的字符串表示,并为无限递归数据结构提供保护 isreadable函数--返回对象的是否"可读" is ...

  8. CPython 标准库源码分析 collections.Counter

    Counter 是一个专门计数可哈希对象的 dict 子类,元素会被当做 dict 的 key,计数的数量被当做 value 存储. 这是 Counter 的 doc string,直接明确的指出了元 ...

  9. Swift 标准库源码 第三方,Almofire,Kingfisher,SwiftyJson,KakaJson,单元测试 request

    ?? 两个optional比较会包装成等?的类型,再进行比较 json 反射 request 开发工具

  10. spring-session源码解读 sesion

    2019独角兽企业重金招聘Python工程师标准>>> spring-session源码解读 sesion 博客分类: java spring 摘要: session通用策略 Ses ...

最新文章

  1. linux 故障注入_阿里巴巴开源故障注入工具_chaosblade
  2. axios get的parameter /eg /url+?input=6^input=8
  3. 解决iOS微信H5支付跳转微信后不返回App问题(Swift-WKWebview)(转)
  4. Android-2D绘图
  5. java写界面_java能不能直接做界面窗口
  6. linux下杀毒工具clamav
  7. Spring MVC控制器JUnit测试
  8. 深圳大学计算机就业报告,深圳大学2020届毕业生就业质量报告.pdf
  9. tpl-spring-mybatis 模板工程
  10. linux内核镜像的分层,Docker 入门教程:镜像分层
  11. 1t硬盘怎么分区最好_新买的固态硬盘该不该分区?分区后性能如何?真是后悔知道晚了!...
  12. 《javascript高级程序设计》笔记:继承
  13. 计算机硬件设备论文题目,计算机硬件维护论文题目大全 计算机硬件维护论文题目选什么比较好...
  14. 共享软件业余者VS共享软件专业者
  15. 解决数据质量问题方案
  16. 凌晨 3 点不回家:成年人的世界不是他们说的那样子
  17. 【20200422】编译原理课程课业打卡十七之求解文法FirstVTLastVT构造文法算符优先关系表
  18. ACM-ICPC国际大学生程序设计竞赛亚洲区大连赛区(2016)地区赛——花开花落终有时
  19. 网络支付结算{网银、第三方支付、超级网银}
  20. 什么是AP,什么是CP,什么是CAP?

热门文章

  1. ppt太大如何压缩到最小,这个方法你得知道
  2. 教你傻瓜式抠图透明图透头像透明字体
  3. [转]如何在NIOS II中读写EPCS剩余空间
  4. java 定时为每月10号_Java定时器的cron设置详解Quartz
  5. while循环的用法
  6. 办公室电脑里的文件和家里电脑的文件同步,有什么便签软件可以实现
  7. 上线三年却很“鸡肋”的微信声音锁究竟做错了什么?
  8. 3.3 三极管的的概念及其工作原理
  9. 阿里云云计算工程师(ACP)认证证书及考试须知
  10. 最精简的python:把png转换为ico图标