C标准库源码解读(VC9.0版本)——ctype.h
ANSI C文档(C89 http://flash-gordon.me.uk/ansi.c.txt)如是说:
4.3 CHARACTER HANDLING <ctype.h>
The header <ctype.h> declares several functions useful for testingand mapping characters./89/ In all cases the argument is an int , the value of which shall be representable as an unsigned char or shallequal the value of the macro EOF . If the argument has any othervalue, the behavior is undefined.
这里说明了ctype.h里面实现的所有测试字符的函数传进去的参数都以int形参给出,并且有效值是无符号字符(unsigned char)或EOF宏(通常实现为-1),其他输入的int型值行为未定义。
C标准库规定了需要实现的13个字符处理函数,可以分为两类——字符测试函数和字符转换函数。
下面给出了C99的字符测试函数表,表中是12个字符测试函数,比原来的标准多出来一个isblank。http://www.open-std.org/JTC1/SC22/WG14/www/docs/C99RationaleV5.10.pdf P118
7.4.1.3 The isblank function
A new feature of C99: text processing applications often need to distinguish white space that can 15 occur within lines from white space that separates lines (for example, see §6.10 regarding use of whitespace in the preprocessor). This distinction is also a property of POSIX locale definition files.
ASCII values | characters | iscntrl | isblank | isspace | isupper | islower | isalpha | isdigit | isxdigit | isalnum | ispunct | isgraph | isprint |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x00 .. 0x08 | NUL, (other control codes) | x | |||||||||||
0x09 | tab ('\t') | x | x | x | |||||||||
0x0A .. 0x0D | (white-space control codes: '\f','\v','\n','\r') | x | x | ||||||||||
0x0E .. 0x1F | (other control codes) | x | |||||||||||
0x20 | space (' ') | x | x | x | |||||||||
0x21 .. 0x2F | !"#$%&'()*+,-./ | x | x | x | |||||||||
0x30 .. 0x39 | 0123456789 | x | x | x | x | x | |||||||
0x3a .. 0x40 | :;<=>?@ | x | x | x | |||||||||
0x41 .. 0x46 | ABCDEF | x | x | x | x | x | x | ||||||
0x47 .. 0x5A | GHIJKLMNOPQRSTUVWXYZ | x | x | x | x | x | |||||||
0x5B .. 0x60 | [\]^_` | x | x | x | |||||||||
0x61 .. 0x66 | abcdef | x | x | x | x | x | x | ||||||
0x67 .. 0x7A | ghijklmnopqrstuvwxyz | x | x | x | x | x | |||||||
0x7B .. 0x7E | {|}~ | x | x | x | |||||||||
0x7F | (DEL) | x |
上表来源:http://www.cplusplus.com/reference/cctype/
下面英文部分是字符测试函数要求描述,摘自C89标准文档说明
4.3.1 Character testing functions
The functions in this section return nonzero (true) if and only ifthe value of the argument c conforms to that in the description of thefunction.
4.3.1.1 The isalnum functionSynopsis
#include <ctype.h>
int isalnum(int c);
Description The isalnum function tests for any character for which isalpha orisdigit is true.
4.3.1.2 The isalpha functionSynopsis
#include <ctype.h>
int isalpha(int c);
Description The isalpha function tests for any character for which isupper orislower is true, or any of an implementation-defined set of charactersfor which none of iscntrl , isdigit , ispunct , or isspace is true.In the C locale, isalpha returns true only for the characters forwhich isupper or islower is true.
4.3.1.3 The iscntrl functionSynopsis
#include <ctype.h>
int iscntrl(int c);
Description The iscntrl function tests for any control character.
4.3.1.4 The isdigit functionSynopsis
#include <ctype.h>
int isdigit(int c);
Description The isdigit function tests for any decimal-digit character (asdefined in $2.2.1).
4.3.1.5 The isgraph functionSynopsis
#include <ctype.h>
int isgraph(int c);
Description The isgraph function tests for any printing character except space (' ').
4.3.1.6 The islower functionSynopsis
#include <ctype.h>
int islower(int c);Description
The islower function tests for any lower-case letter or any of animplementation-defined set of characters for which none of iscntrl ,isdigit , ispunct , or isspace is true. In the C locale, islowerreturns true only for the characters defined as lower-case letters (asdefined in $2.2.1).
4.3.1.7 The isprint functionSynopsis
#include <ctype.h>
int isprint(int c);
Description The isprint function tests for any printing character includingspace (' ').
4.3.1.8 The ispunct functionSynopsis
#include <ctype.h>
int ispunct(int c);
Description The ispunct function tests for any printing character except space(' ') or a character for which isalnum is true.
4.3.1.9 The isspace functionSynopsis
#include <ctype.h>
int isspace(int c);
Description The isspace function tests for the standard white-space charactersor for any of an implementation-defined set of characters for whichisalnum is false. The standard white-space characters are thefollowing: space (' '), form feed ('\f'), new-line ('\n'), carriagereturn ('\r'), horizontal tab ('\t'), and vertical tab ('\v'). In theC locale, isspace returns true only for the standard white-spacecharacters.
4.3.1.10 The isupper functionSynopsis
#include <ctype.h>
int isupper(int c);
Description The isupper function tests for any upper-case letter or any of animplementation-defined set of characters for which none of iscntrl ,isdigit , ispunct , or isspace is true. In the C locale, isupperreturns true only for the characters defined as upper-case letters (asdefined in $2.2.1).
4.3.1.11 The isxdigit functionSynopsis
#include <ctype.h>
int isxdigit(int c);
Description The isxdigit function tests for any hexadecimal-digit character (asdefined in $3.1.3.2).
我原来以为要isalpha()判断是否是英文字母类型需要一些像 if(c>='A' && c<='Z')之类的代码,但看完了,发现是查表。在线程数据初始化的时候会构造一个所有控制字符和可视字符的数组,存在线程局部数据表里;在我们使用以上函数判断是否为某一类型时,查表确定结果。下面先给出表(数组),探究表设计的原理,然后把跟踪过程学习到的一些windowsAPI原理以及系统知识罗列出来。
在VC/ctr/src/ctype.c里,有如下定义:
const unsigned short __newctype[384] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0,0, /* -1 EOF */_CONTROL, /* 00 (NUL) */_CONTROL, /* 01 (SOH) */_CONTROL, /* 02 (STX) */_CONTROL, /* 03 (ETX) */_CONTROL, /* 04 (EOT) */_CONTROL, /* 05 (ENQ) */_CONTROL, /* 06 (ACK) */_CONTROL, /* 07 (BEL) */_CONTROL, /* 08 (BS) */_SPACE+_CONTROL, /* 09 (HT) */_SPACE+_CONTROL, /* 0A (LF) */_SPACE+_CONTROL, /* 0B (VT) */_SPACE+_CONTROL, /* 0C (FF) */_SPACE+_CONTROL, /* 0D (CR) */_CONTROL, /* 0E (SI) */_CONTROL, /* 0F (SO) */_CONTROL, /* 10 (DLE) */_CONTROL, /* 11 (DC1) */_CONTROL, /* 12 (DC2) */_CONTROL, /* 13 (DC3) */_CONTROL, /* 14 (DC4) */_CONTROL, /* 15 (NAK) */_CONTROL, /* 16 (SYN) */_CONTROL, /* 17 (ETB) */_CONTROL, /* 18 (CAN) */_CONTROL, /* 19 (EM) */_CONTROL, /* 1A (SUB) */_CONTROL, /* 1B (ESC) */_CONTROL, /* 1C (FS) */_CONTROL, /* 1D (GS) */_CONTROL, /* 1E (RS) */_CONTROL, /* 1F (US) */_SPACE+_BLANK, /* 20 SPACE */_PUNCT, /* 21 ! */_PUNCT, /* 22 " */_PUNCT, /* 23 # */_PUNCT, /* 24 $ */_PUNCT, /* 25 % */_PUNCT, /* 26 & */_PUNCT, /* 27 ' */_PUNCT, /* 28 ( */_PUNCT, /* 29 ) */_PUNCT, /* 2A * */_PUNCT, /* 2B + */_PUNCT, /* 2C , */_PUNCT, /* 2D - */_PUNCT, /* 2E . */_PUNCT, /* 2F / */_DIGIT+_HEX, /* 30 0 */_DIGIT+_HEX, /* 31 1 */_DIGIT+_HEX, /* 32 2 */_DIGIT+_HEX, /* 33 3 */_DIGIT+_HEX, /* 34 4 */_DIGIT+_HEX, /* 35 5 */_DIGIT+_HEX, /* 36 6 */_DIGIT+_HEX, /* 37 7 */_DIGIT+_HEX, /* 38 8 */_DIGIT+_HEX, /* 39 9 */_PUNCT, /* 3A : */_PUNCT, /* 3B ; */_PUNCT, /* 3C < */_PUNCT, /* 3D = */_PUNCT, /* 3E > */_PUNCT, /* 3F ? */_PUNCT, /* 40 @ */_UPPER+_HEX, /* 41 A */_UPPER+_HEX, /* 42 B */_UPPER+_HEX, /* 43 C */_UPPER+_HEX, /* 44 D */_UPPER+_HEX, /* 45 E */_UPPER+_HEX, /* 46 F */_UPPER, /* 47 G */_UPPER, /* 48 H */_UPPER, /* 49 I */_UPPER, /* 4A J */_UPPER, /* 4B K */_UPPER, /* 4C L */_UPPER, /* 4D M */_UPPER, /* 4E N */_UPPER, /* 4F O */_UPPER, /* 50 P */_UPPER, /* 51 Q */_UPPER, /* 52 R */_UPPER, /* 53 S */_UPPER, /* 54 T */_UPPER, /* 55 U */_UPPER, /* 56 V */_UPPER, /* 57 W */_UPPER, /* 58 X */_UPPER, /* 59 Y */_UPPER, /* 5A Z */_PUNCT, /* 5B [ */_PUNCT, /* 5C \ */_PUNCT, /* 5D ] */_PUNCT, /* 5E ^ */_PUNCT, /* 5F _ */_PUNCT, /* 60 ` */_LOWER+_HEX, /* 61 a */_LOWER+_HEX, /* 62 b */_LOWER+_HEX, /* 63 c */_LOWER+_HEX, /* 64 d */_LOWER+_HEX, /* 65 e */_LOWER+_HEX, /* 66 f */_LOWER, /* 67 g */_LOWER, /* 68 h */_LOWER, /* 69 i */_LOWER, /* 6A j */_LOWER, /* 6B k */_LOWER, /* 6C l */_LOWER, /* 6D m */_LOWER, /* 6E n */_LOWER, /* 6F o */_LOWER, /* 70 p */_LOWER, /* 71 q */_LOWER, /* 72 r */_LOWER, /* 73 s */_LOWER, /* 74 t */_LOWER, /* 75 u */_LOWER, /* 76 v */_LOWER, /* 77 w */_LOWER, /* 78 x */_LOWER, /* 79 y */_LOWER, /* 7A z */_PUNCT, /* 7B { */_PUNCT, /* 7C | */_PUNCT, /* 7D } */_PUNCT, /* 7E ~ */_CONTROL, /* 7F (DEL) *//* and the rest are 0... */
};
再补充一下宏(在VC/include/ctype.h):
/* set bit masks for the possible character types */#define _UPPER 0x1 /* upper case letter */
#define _LOWER 0x2 /* lower case letter */
#define _DIGIT 0x4 /* digit[0-9] */
#define _SPACE 0x8 /* tab, carriage return, newline, *//* vertical tab or form feed */
#define _PUNCT 0x10 /* punctuation character */
#define _CONTROL 0x20 /* control character */
#define _BLANK 0x40 /* space char */
#define _HEX 0x80 /* hexadecimal digit */#define _LEADBYTE 0x8000 /* multibyte leadbyte */
#define _ALPHA (0x0100|_UPPER|_LOWER) /* alphabetic character */
那要怎么使用呢?__newctype[]里面前128个元素值都是0,后面的元素在注释里标注了对应的ascii码值,也就是说__newctype+128就是字符‘\0’对应的元素值了。实际在VC实现里面就是使用__newctype+128的形式(还有__newctype[-1]呢!取到的值就是EOF)。上面定义的_UPPER,__LOWER等,明显是使用了位设置单一属性,这样,我们如果要判断一个字符c是否为英文字符,只需要判断 __newctype[128+(int)c]&(128|_UPPER|_LOWER)就可以了!只是一个位与运算。这就是查表原理的实现。照这样说,后面的诸多判断函数,都不用去跟踪解析了,哈哈!
那么,这些内容是存在哪里?我是怎么能跟踪到这个地方的呢?这个过程,我学习到了几个新知识,下面一步一步展开。
写一段再简单不过的语句
#include <ctype.h>
void main()
{ int i = 64;int kk = isalpha(i);
}
这是我跟踪的调用栈:
msvcr90d.dll!__set_flsgetvalue() Line 256 Cmsvcr90d.dll!_getptd_noexit() Line 578 + 0xb bytes Cmsvcr90d.dll!_getptd() Line 641 + 0x5 bytes Cmsvcr90d.dll!_LocaleUpdate::_LocaleUpdate(localeinfo_struct * plocinfo=0x00000000) Line 264 + 0x5 bytes C++
> msvcr90d.dll!_chvalidator_l(localeinfo_struct * plocinfo=0x00000000, int c=0x00000040, int mask=0x00000103) Line 68 C++msvcr90d.dll!_chvalidator(int c=0x00000040, int mask=0x00000103) Line 57 + 0xf bytes C++msvcr90d.dll!isalpha(int c=0x00000040) Line 69 + 0xe bytes C++
> ConsoleApp.exe!main() Line 11 + 0xc bytes C
第一层(在上面的表值定义里也有的#define _ALPHA (0x0100|_UPPER|_LOWER) /* alphabetic character */):
extern __inline int (__cdecl isalpha) (int c)
{if (__locale_changed == 0){return __fast_ch_check(c, _ALPHA);}else{return (_isalpha_l)(c, NULL);}
}
我们进入的是__fast_ch_check函数(__local_changed我找不到其意义的定义,但我看了_isalpha_l的实现,确定其最终与__fast_ch_check调用到同一个地方,所以在分析过程可以忽略)。__fast_ch_check是一个宏:
#ifdef _DEBUG
#define __fast_ch_check(a,b) _chvalidator(a,b)
#else /* _DEBUG */
#define __fast_ch_check(a,b) (__initiallocinfo.pctype[(a)] & (b))
#endif /* _DEBUG */
调试的时候我们使用的是_DEBUG版本的:
#if defined (_DEBUG)
extern "C" int __cdecl _chvalidator(int c,int mask)
{_ASSERTE((unsigned)(c + 1) <= 256);return _chvalidator_l(NULL, c, mask);
}extern "C" int __cdecl _chvalidator_l(_locale_t plocinfo,int c,int mask)
{_LocaleUpdate _loc_update(plocinfo);_ASSERTE((unsigned)(c + 1) <= 256);if (c >= -1 && c <= 255){return (_loc_update.GetLocaleT()->locinfo->pctype[c] & mask);}else{return (_loc_update.GetLocaleT()->locinfo->pctype[-1] & mask);}
}#endif /* defined (_DEBUG) */
到上面这段代码,mask是0x103(_ALPHA),c是我们输入的0x40(64)。可以看到,这里使用了C++的语法,用了类的函数调用GetLocaleT()。如果输入在-1到255范围内,取pctype[c]与mask作位与运算的结果为返回值。
问题是:_LocaleUpdate类的作用是什么?这个pctype数组又是什么内容?
#ifdef __cplusplus
class _LocaleUpdate
{_locale_tstruct localeinfo;_ptiddata ptd;bool updated;public:_LocaleUpdate(_locale_t plocinfo): updated(false){if (plocinfo == NULL){ptd = _getptd();localeinfo.locinfo = ptd->ptlocinfo;localeinfo.mbcinfo = ptd->ptmbcinfo;__UPDATE_LOCALE(ptd, localeinfo.locinfo);__UPDATE_MBCP(ptd, localeinfo.mbcinfo);if (!(ptd->_ownlocale & _PER_THREAD_LOCALE_BIT)){ptd->_ownlocale |= _PER_THREAD_LOCALE_BIT;updated = true;}}else{localeinfo=*plocinfo;}}~_LocaleUpdate(){if (updated)ptd->_ownlocale = ptd->_ownlocale & ~_PER_THREAD_LOCALE_BIT;}_locale_t GetLocaleT(){return &localeinfo;}
};
#endif /* __cplusplus */
可以清楚的看到, GetLocaleT返回的是&localeinfo,localeinfo中的locinfo结构中的pctype是我们需要找的数据,自然追到_getptd()函数(在tidtable.c文件中):
_ptiddata __cdecl _getptd (void)
{_ptiddata ptd = _getptd_noexit();if (!ptd) {_amsg_exit(_RT_THREAD); /* write message and die */}return ptd;
}
_ptiddata __cdecl _getptd_noexit (void)
{_ptiddata ptd;DWORD TL_LastError;TL_LastError = GetLastError();#ifdef _M_IX86/** Initialize FlsGetValue function pointer in TLS by calling __set_flsgetvalue()*/if ( (ptd = (__set_flsgetvalue())(__flsindex)) == NULL ) {
#else /* _M_IX86 */if ( (ptd = FLS_GETVALUE(__flsindex)) == NULL ) {
#endif /* _M_IX86 *//** no per-thread data structure for this thread. try to create* one.*/
#ifdef _DEBUGextern void * __cdecl _calloc_dbg_impl(size_t, size_t, int, const char *, int, int *);if ((ptd = _calloc_dbg_impl(1, sizeof(struct _tiddata), _CRT_BLOCK, __FILE__, __LINE__, NULL)) != NULL) {
#else /* _DEBUG */if ((ptd = _calloc_crt(1, sizeof(struct _tiddata))) != NULL) {
#endif /* _DEBUG */if (FLS_SETVALUE(__flsindex, (LPVOID)ptd) ) {/** Initialize of per-thread data*/_initptd(ptd,NULL);ptd->_tid = GetCurrentThreadId();ptd->_thandle = (uintptr_t)(-1);}else {/** Return NULL to indicate failure*/_free_crt(ptd);ptd = NULL;}}}SetLastError(TL_LastError);return(ptd);
}
只需要注意_set_flsgetvalue函数:
_CRTIMP PFLS_GETVALUE_FUNCTION __cdecl __set_flsgetvalue()
{
#ifdef _M_IX86PFLS_GETVALUE_FUNCTION flsGetValue = FLS_GETVALUE;if (!flsGetValue){flsGetValue = _decode_pointer(gpFlsGetValue);TlsSetValue(__getvalueindex, flsGetValue);}return flsGetValue;
#else /* _M_IX86 */return NULL;
#endif /* _M_IX86 */
}
其中有宏定义:
#define FLS_GETVALUE ((PFLS_GETVALUE_FUNCTION)TlsGetValue(__getvalueindex))
至此,我们跟到了windows API层次!TlsGetValue,跟不下去了。__getvalueindex是一个全局变量,值为1。这个函数返回函数指针,__set_flsgetvalue())(__flsindex),就相当于TlsGetValue(__flsindex)。
那么TlsGetValue到底干了什么事,返回一个其中带有pctype结构的结构体。当然问MSDN了。
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686812(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686749(v=vs.85).aspx
额,这些东西涉及windows操作系统线程的一些数据原理了。简单说就是线程运行需要开辟自己的数据空间,用一个指针数组存放数据的指针,以tlsindex来访问。那回过头来,我们这个线程所使用的数据是用TlsGetValue(tlsindex)取得——但从哪里设置的呢?根据MSDN上说的,那么是用TlsSetValue来设置的咯。
同样在pctype.c里面有如下初始化函数(根据经验找到的,再经过断点验证猜测):
int __cdecl _mtinit (void)
{_ptiddata ptd;#ifdef _M_IX86/** Initialize fiber local storage function pointers.*/HINSTANCE hKernel32 = _crt_wait_module_handle(_KERNEL32);if (hKernel32 == NULL) {_mtterm();return FALSE; /* fail to load DLL */}gpFlsAlloc = (PFLS_ALLOC_FUNCTION)GetProcAddress(hKernel32,"FlsAlloc");gpFlsGetValue = (PFLS_GETVALUE_FUNCTION)GetProcAddress(hKernel32,"FlsGetValue");gpFlsSetValue = (PFLS_SETVALUE_FUNCTION)GetProcAddress(hKernel32,"FlsSetValue");gpFlsFree = (PFLS_FREE_FUNCTION)GetProcAddress(hKernel32,"FlsFree");if (!gpFlsAlloc || !gpFlsGetValue || !gpFlsSetValue || !gpFlsFree) {gpFlsAlloc = (PFLS_ALLOC_FUNCTION)__crtTlsAlloc;gpFlsGetValue = (PFLS_GETVALUE_FUNCTION)TlsGetValue;gpFlsSetValue = (PFLS_SETVALUE_FUNCTION)TlsSetValue;gpFlsFree = (PFLS_FREE_FUNCTION)TlsFree;}/** Allocate and initialize a TLS index to store FlsGetValue pointer* so that the FLS_* macros can work transparently*/if ( (__getvalueindex = TlsAlloc()) == TLS_OUT_OF_INDEXES ||!TlsSetValue(__getvalueindex, (LPVOID)gpFlsGetValue) ) {return FALSE;}
#endif /* _M_IX86 */_init_pointers(); /* initialize global function pointers */#ifdef _M_IX86/** Encode the fiber local storage function pointers*/gpFlsAlloc = (PFLS_ALLOC_FUNCTION) _encode_pointer(gpFlsAlloc);gpFlsGetValue = (PFLS_GETVALUE_FUNCTION) _encode_pointer(gpFlsGetValue);gpFlsSetValue = (PFLS_SETVALUE_FUNCTION) _encode_pointer(gpFlsSetValue);gpFlsFree = (PFLS_FREE_FUNCTION) _encode_pointer(gpFlsFree);
#endif /* _M_IX86 *//** Initialize the mthread lock data base*/if ( !_mtinitlocks() ) {_mtterm();return FALSE; /* fail to load DLL */}/** Allocate a TLS index to maintain pointers to per-thread data*/if ( (__flsindex = FLS_ALLOC(&_freefls)) == FLS_OUT_OF_INDEXES ) {_mtterm();return FALSE; /* fail to load DLL */}/** Create a per-thread data structure for this (i.e., the startup)* thread.*/if ( ((ptd = _calloc_crt(1, sizeof(struct _tiddata))) == NULL) ||!FLS_SETVALUE(__flsindex, (LPVOID)ptd) ){_mtterm();return FALSE; /* fail to load DLL */}/** Initialize the per-thread data*/_initptd(ptd,NULL);ptd->_tid = GetCurrentThreadId();ptd->_thandle = (uintptr_t)(-1);return TRUE;
}
关键点在:_initptd():
_CRTIMP void __cdecl _initptd (_ptiddata ptd,pthreadlocinfo ptloci)
{
#ifdef _M_IX86HINSTANCE hKernel32 = _crt_wait_module_handle(_KERNEL32);
#endif /* _M_IX86 */ptd->_pxcptacttab = (void *)_XcptActTab;ptd->_holdrand = 1L;#ifdef _M_IX86if (hKernel32 != NULL){// Initialize the function pointers in the ptd dataptd->_encode_ptr = GetProcAddress(hKernel32, _ENCODE_POINTER);ptd->_decode_ptr = GetProcAddress(hKernel32, _DECODE_POINTER);}
#endif /* _M_IX86 */// It is necessary to always have GLOBAL_LOCALE_BIT set in perthread data// because when doing bitwise or, we won't get __UPDATE_LOCALE to work when// global per thread locale is set.ptd->_ownlocale = _GLOBAL_LOCALE_BIT;// Initialize _setloc_data. These are the only valuse that need to be// initialized.ptd->_setloc_data._cachein[0]='C';ptd->_setloc_data._cacheout[0]='C';ptd->ptmbcinfo = &__initialmbcinfo;_mlock(_MB_CP_LOCK);__try{InterlockedIncrement(&(ptd->ptmbcinfo->refcount));}__finally{_munlock(_MB_CP_LOCK);}// We need to make sure that ptd->ptlocinfo in never NULL, this saves us// perf counts when UPDATING locale._mlock(_SETLOCALE_LOCK);__try {ptd->ptlocinfo = ptloci;/** Note that and caller to _initptd could have passed __ptlocinfo, but* that will be a bug as between the call to _initptd and __addlocaleref* the global locale may have changed and ptloci may be pointing to invalid* memory. Thus if the wants to set the locale to global, NULL should* be passed.*/if (ptd->ptlocinfo == NULL)ptd->ptlocinfo = __ptlocinfo;__addlocaleref(ptd->ptlocinfo);}__finally {_munlock(_SETLOCALE_LOCK);}
}
从上面调用可知道,ptloci传进来的是0,所以ptd->ptlocinfo = __ptlocinfo,看其定义:
pthreadlocinfo __ptlocinfo = &__initiallocinfo;
再看:
typedef struct threadlocaleinfostruct * pthreadlocinfo;
typedef struct threadlocaleinfostruct {int refcount;unsigned int lc_codepage;unsigned int lc_collate_cp;unsigned long lc_handle[6]; /* LCID */LC_ID lc_id[6];struct {char *locale;wchar_t *wlocale;int *refcount;int *wrefcount;} lc_category[6];int lc_clike;int mb_cur_max;int * lconv_intl_refcount;int * lconv_num_refcount;int * lconv_mon_refcount;struct lconv * lconv;int * ctype1_refcount;unsigned short * ctype1;const unsigned short * pctype;const unsigned char * pclmap;const unsigned char * pcumap;struct __lc_time_data * lc_time_curr;
} threadlocinfo;
看到pctype了,跟之前直接找定义是同一个地方的,只是上面用了倒推的方法一步一步找。对于数据,直接可以看到:
threadlocinfo __initiallocinfo = {1, /* refcount */_CLOCALECP, /* lc_codepage */_CLOCALECP, /* lc_collate_cp */{ _CLOCALEHANDLE, /* lc_handle[_ALL] */_CLOCALEHANDLE, /* lc_handle[_COLLATE] */_CLOCALEHANDLE, /* lc_handle[_CTYPE] */_CLOCALEHANDLE, /* lc_handle[_MONETARY] */_CLOCALEHANDLE, /* lc_handle[_NUMERIC] */_CLOCALEHANDLE /* lc_handle[_TIME] */},{ {0, 0, 0}, /* lc_id[LC_ALL] */{0, 0, 0}, /* lc_id[LC_COLLATE] */{0, 0, 0}, /* lc_id[LC_CTYPE] */{0, 0, 0}, /* lc_id[LC_MONETARY] */{0, 0, 0}, /* lc_id[LC_NUMERIC] */{0, 0, 0} /* lc_id[LC_TIME] */},{ {NULL, NULL, NULL, NULL}, /* lc_category[LC_ALL] */{__clocalestr, NULL, NULL, NULL}, /* lc_category[LC_COLLATE] */{__clocalestr, NULL, NULL, NULL}, /* lc_category[LC_CTYPE] */{__clocalestr, NULL, NULL, NULL}, /* lc_category[LC_MONETARY] */{__clocalestr, NULL, NULL, NULL}, /* lc_category[LC_NUMERIC] */{__clocalestr, NULL, NULL, NULL} /* lc_category[LC_TIME] */},1, /* lc_clike */1, /* mb_cur_max */NULL, /* lconv_intl_refcount */NULL, /* lconv_num_refcount */NULL, /* lconv_mon_refcount */&__lconv_c, /* lconv */NULL, /* ctype1_refcount */NULL, /* ctype1 */__newctype + 128, /* pctype */__newclmap + 128, /* pclmap */__newcumap + 128, /* pcumap */&__lc_time_c, /* lc_time_curr */
};
数一数结构,pctype对应的正好是__newctype+128。那么,一切都清楚了!
跟踪的过程,是剖析实现的过程,追到windowsAPI,有助到了解windows的接口和原理。
下面英文部分是字符转换函数要求描述,摘自C89标准文档说明
4.3.2 Character case mapping functions
4.3.2.1 The tolower functionSynopsis
#include <ctype.h>
int tolower(int c);
Description The tolower function converts an upper-case letter to thecorresponding lower-case letter.Returns If the argument is an upper-case letter, the tolower functionreturns the corresponding lower-case letter if there is one; otherwisethe argument is returned unchanged. In the C locale, tolower mapsonly the characters for which isupper is true to the correspondingcharacters for which islower is true.
微软的实现应该是直接用汇编:
extern "C" int __cdecl tolower (int c)
{
70195760 mov edi,edi
70195762 push ebp
70195763 mov ebp,esp
70195765 push ecx if (__locale_changed == 0)
70195766 cmp dword ptr [___locale_changed (702362C8h)],0
7019576D jne tolower+33h (70195793h) {return __ascii_towlower(c);
7019576F cmp dword ptr [c],41h
70195773 jl tolower+26h (70195786h)
70195775 cmp dword ptr [c],5Ah
70195779 jg tolower+26h (70195786h)
7019577B mov eax,dword ptr [c]
7019577E add eax,20h
70195781 mov dword ptr [ebp-4],eax
70195784 jmp tolower+2Ch (7019578Ch)
70195786 mov ecx,dword ptr [c]
70195789 mov dword ptr [ebp-4],ecx
7019578C mov eax,dword ptr [ebp-4]
7019578F jmp tolower+41h (701957A1h) }else
70195791 jmp tolower+41h (701957A1h) {return _tolower_l(c, NULL);
70195793 push 0
70195795 mov edx,dword ptr [c]
70195798 push edx
70195799 call _tolower_l (70195580h)
7019579E add esp,8 }
}
701957A1 mov esp,ebp
701957A3 pop ebp
701957A4 ret
4.3.2.2 The toupper functionSynopsis
#include <ctype.h>
int toupper(int c);
Description The toupper function converts a lower-case letter to the corresponding upper-case letter. Returns If the argument is a lower-case letter, the toupper functionreturns the corresponding upper-case letter if there is one; otherwisethe argument is returned unchanged. In the C locale, toupper mapsonly the characters for which islower is true to the correspondingcharacters for which isupper is true.
C标准库源码解读(VC9.0版本)——ctype.h相关推荐
- 彻底弄懂Python标准库源码(一)—— os模块
目录 第1~22行 模块整体注释.nt与posix 第24~46行 模块引入._exists方法._get_exports_list方法 第48~97行 根据系统不同导入不同的方法和属性 第100~1 ...
- 整理网上资料---C标准库值篇二 :标准库源码下载地址、标准库手册下载地址
C标准库源码及手册.zip,包括: http://download.csdn.net/detail/yangzhao0001/9057823 C标准库函数集(头文件索引)------手册-AH.pdf ...
- libco协程库源码解读
2019独角兽企业重金招聘Python工程师标准>>> 协程,又被称为用户级线程,是在应用层被调度,可以减少因为调用系统调用而阻塞的线程切换的时间.目前有很多协程的实现,由于微信内部 ...
- Kotlin StandardKt 标准库源码走一波
距离上篇Kotlin文章,应该有差不多半年时间.没别的原因,因为,懒,而且,想产出一篇稍微质量好的博客好难. 最近在研究Python,所以最近也可能会更新一些Python的学习笔记. Standard ...
- 别人家SDK的设计模式——Android Retrofit库源码解读
作者:网易合作产品部·李若昆 我们在日常编写代码中免不了会用到各种各样第三方库,网络请求.图片加载.数据库等等.有些lib接入可能方便到几行代码搞定,有些lib可能从demo.文档到测试都是坑(比如l ...
- Swift标准库源码阅读笔记 - Array和ContiguousArray
关于 ContiguousArray ,这边有喵神的文章介绍的很详细了,可以先看看这个文章. Array 接着喵神的思路,看一下 Array 以下是从源码中截取的代码片段. public struct ...
- 彻底弄懂Python标准库源码(三)—— pprint模块
目录 模块整体注释 依赖模块导入.对外暴露接口 saferepr函数--返回对象的字符串表示,并为无限递归数据结构提供保护 isreadable函数--返回对象的是否"可读" is ...
- CPython 标准库源码分析 collections.Counter
Counter 是一个专门计数可哈希对象的 dict 子类,元素会被当做 dict 的 key,计数的数量被当做 value 存储. 这是 Counter 的 doc string,直接明确的指出了元 ...
- Swift 标准库源码 第三方,Almofire,Kingfisher,SwiftyJson,KakaJson,单元测试 request
?? 两个optional比较会包装成等?的类型,再进行比较 json 反射 request 开发工具
- spring-session源码解读 sesion
2019独角兽企业重金招聘Python工程师标准>>> spring-session源码解读 sesion 博客分类: java spring 摘要: session通用策略 Ses ...
最新文章
- linux 故障注入_阿里巴巴开源故障注入工具_chaosblade
- axios get的parameter /eg /url+?input=6^input=8
- 解决iOS微信H5支付跳转微信后不返回App问题(Swift-WKWebview)(转)
- Android-2D绘图
- java写界面_java能不能直接做界面窗口
- linux下杀毒工具clamav
- Spring MVC控制器JUnit测试
- 深圳大学计算机就业报告,深圳大学2020届毕业生就业质量报告.pdf
- tpl-spring-mybatis 模板工程
- linux内核镜像的分层,Docker 入门教程:镜像分层
- 1t硬盘怎么分区最好_新买的固态硬盘该不该分区?分区后性能如何?真是后悔知道晚了!...
- 《javascript高级程序设计》笔记:继承
- 计算机硬件设备论文题目,计算机硬件维护论文题目大全 计算机硬件维护论文题目选什么比较好...
- 共享软件业余者VS共享软件专业者
- 解决数据质量问题方案
- 凌晨 3 点不回家:成年人的世界不是他们说的那样子
- 【20200422】编译原理课程课业打卡十七之求解文法FirstVTLastVT构造文法算符优先关系表
- ACM-ICPC国际大学生程序设计竞赛亚洲区大连赛区(2016)地区赛——花开花落终有时
- 网络支付结算{网银、第三方支付、超级网银}
- 什么是AP,什么是CP,什么是CAP?