string wstring

摘自：stackoverflow

`string`? `wstring`?

std::string is a basic_string templated on a char, and std::wstring on a wchar_t.

`char` vs. `wchar_t`

char is supposed to hold a character, usually a 1-byte character. wchar_t is supposed to hold a wide character, and then, things get tricky: On Linux, a wchar_t is 4-bytes, while on Windows, it's 2-bytes

what about Unicode, then?

The problem is that neither char nor wchar_t is directly tied to unicode.

On Linux?

Let's take a Linux OS: My Ubuntu system is already unicode aware. When I work with a char string, it is natively encoded in UTF-8 (i.e. Unicode string of chars). The following code:

#include<cstring>#include<iostream>int main(int argc,char* argv[]){constchar text[]="olé";constwchar_t wtext[]= L"olé";std::cout <<"sizeof(char)    : "<<sizeof(char)<< std::endl ;std::cout <<"text            : "<< text << std::endl ;std::cout <<"sizeof(text)    : "<<sizeof(text)<< std::endl ;std::cout <<"strlen(text)    : "<< strlen(text)<< std::endl ;std::cout <<"text(binary)    :";for(size_t i =0, iMax = strlen(text); i < iMax;++i){std::cout <<" "<<static_cast<unsignedint>(static_cast<unsignedchar>(text[i]));}std::cout << std::endl << std::endl ;std::cout <<"sizeof(wchar_t) : "<<sizeof(wchar_t)<< std::endl ;//std::cout << "wtext           : " << wtext << std::endl ; <- errorstd::cout <<"wtext           : UNABLE TO CONVERT NATIVELY."<< std::endl ;std::wcout << L"wtext          : "<< wtext << std::endl;std::cout <<"sizeof(wtext)   : "<<sizeof(wtext)<< std::endl ;std::cout <<"wcslen(wtext)   : "<< wcslen(wtext)<< std::endl ;std::cout <<"wtext(binary)   :";for(size_t i =0, iMax = wcslen(wtext); i < iMax;++i){std::cout <<" "<<static_cast<unsignedint>(static_cast<unsignedshort>(wtext[i]));}std::cout << std::endl << std::endl ;return0;}

outputs the following text:

sizeof(char):1
text            : olésizeof(text):5
strlen(text):4
text(binary):111108195169sizeof(wchar_t):4
wtext           : UNABLE TO CONVERT NATIVELY.sizeof(wtext):16
wcslen(wtext):3
wtext(binary):111108233

You'll see the "olé" text in char is really constructed by four chars: 110, 108, 195 and 169 (not counting the trailing zero). (I'll let you study the wchar_t code as an exercise)

So, when working with a char on Linux, you should usually end up using Unicode without even knowing it. And as std::string works with char, so std::string is already unicode-ready.

Note that std::string, like the C string API, will consider the "olé" string to have 4 characters, not three. So you should be cautious when truncating/playing with unicode chars because some combination of chars is forbidden in UTF-8.

On Windows?

On Windows, this is a bit different. Win32 had to support a lot of application working with char and on different charsets/codepages produced in all the world, before the advent of Unicode.

So their solution was an interesting one: If an application works with char, then the char strings are encoded/printed/shown on GUI labels using the local charset/codepage on the machine. For example, "olé" would be "olé" in a french-localized Windows, but would be something différent on an cyrillic-localized Windows ("olй" if you use Windows-1251). Thus, "historical apps" will usually still work the same old way.

For Unicode based applications, Windows uses wchar_t, which is 2-bytes wide, and is encoded in UTF-16, which is Unicode encoded on 2-bytes characters (or at the very least, the mostly compatible UCS-2, which is almost the same thing IIRC).

Applications using char are said "multibyte" (because each glyph is composed of one or more chars), while applications using wchar_t are said "widechar" (because each glyph is composed of one or two wchar_t. See MultiByteToWideChar and WideCharToMultiByte Win32 conversion API for more info.

Thus, if you work on Windows, you badly want to use wchar_t (unless you use a framework hiding that, like GTK+ or QT...). The fact is that behind the scenes, Windows works with wchar_t strings, so even historical applications will have their char strings converted in wchar_t when using API like SetWindowText (low level API function to set the label on a Win32 GUI).

Memory issues?

UTF-32 is 4 bytes per characters, so there is no much to add, if only that a UTF-8 text and UTF-16 text will always use less or the same amount of memory than an UTF-32 text (and usually less).

If there is a memory issue, then you should know than for most western languages, UTF-8 text will use less memory than the same UTF-16 one.

Still, for other languages (chinese, japanese, etc.), the memory used will be either the same, or larger for UTF-8 than for UTF-16.

All in all, UTF-16 will mostly use 2 bytes per characters (unless you're dealing with some kind of esoteric language glyphs (Klingon? Elvish?), while UTF-8 will spend from 1 to 4 bytes.

See http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16 for more info.

Conclusion

1. When I should use std::wstring over std::string?

On Linux? Almost never (§).
On Windows? Almost always (§).
On cross-plateform code? Depends on your toolkit...

(§) : unless you use a toolkit/framework saying otherwise

2. Can std::string hold all the ASCII character set including special characters?

Notice: A std::string is suitable for holding a 'binary' buffer, where a std::wstring is not!

On Linux? Yes.
On Windows? Only special characters available for the current locale of the Windows user.

Edit (After a comment from Johann Gerell): a std::string will be enough to handle all char based strings (each char being a number from 0 to 255). But:

ASCII is supposed to go from 0 to 127. Higher chars are NOT ASCII.
a char from 0 to 127 will be held correctly
a char from 128 to 255 will have a signification depending on your encoding (unicode, non-unicode, etc.), but it will be able to hold all Unicode glyphs as long as they are encoded in UTF-8.

3. Is std::wstring supported by almost all popular C++ compilers?

Mostly, with the exception of GCC based compilers that are ported to Windows
It works on my g++ 4.3.2 (under Linux), and I used Unicode API on Win32 since Visual C++ 6.

4. What is exactly a wide character?

On C/C++, it's a character type written wchar_t which is larger than the simple char character type. It is supposed to be used to put inside characters whose indices (like Unicode glyphs) are larger than 255 (or 127, depending...)

转载于:https://www.cnblogs.com/coolbear/archive/2013/05/24/3096406.html

string wstring相关推荐

boost库学习随记五 Boost.Locale 之字符转换 gbk utf8 big5 string wstring等
Boost.Locale是一个库,它提供高质量的本地化的设施在C + +的方式.它最初是设计的一部分CppCMS - C + +的Web框架的项目,然后促成了提升. Boost.Locale提供强大的 ...
Boost.Locale 之字符转换 gbk utf8 big5 string wstring等
Boost.Locale是一个库,它提供高质量的本地化的设施在C + +的方式.它最初是设计的一部分CppCMS - C + +的Web框架的项目,然后促成了提升. Boost.Locale提供强大的 ...
string,wstring,u16string,u32string相互转换
目录 1.各种编码格式参考: 针对C++中文会乱码的问题 1.各种编码格式中文操作系统默认ansi编码不同的国家和地区制定了不同的标准,由此产生了 GB2312.GBK.GB18030.Big5 ...
C++ string wstring CString 字符串转换
说明: 0.实质是 char* <--> wchar_t* 的转换 1.cout输出char*,wcout输出wchar_t*或char* 2.ANSI (Mult ...
string、wstring、UTF-8、UTF-16、UTF-32之间转换
//string转wstring std::wstring string_to_wstring(const std::string& str) { setlocale(LC_ALL, &quo ...
c++中wstring 和 string的转换
一.wchar_t和char的基础知识 1.C程序中使用的char类型,是占用一个字节,一共可以表示256个字符.而在32位系统中,char*是占用四个字节的,因为一个指针变量占用的存储空间为4个字节 ...
C++string与wstring类型转换
在c++开发时有的库函数必须传递wstring宽字符串作为参数,在代码中通过L"wstring"定义宽字符串传递参数这没什么难度,问题是实际过程中需要接收输入string数据,这时 ...
标准C++中的string类的用法总结
相信使用过MFC编程的朋友对CString这个类的印象应该非常深刻吧?的确,MFC中的CString类使用起来真的非常的方便好用.但是如果离开了MFC框架,还有没有这样使用起来非常方便的类呢?答案是肯 ...
标准C++中string类用法总结
2019独角兽企业重金招聘Python工程师标准>>> 相信使用过MFC编程的朋友对CString这个类的印象应该非常深刻吧?的确,MFC中的CString类使用起来真的非常的方便好 ...

string wstring

`string`? `wstring`?

`char` vs. `wchar_t`

what about Unicode, then?

On Linux?

On Windows?

Memory issues?

Conclusion

string wstring相关推荐

最新文章

热门文章

string wstring

string? wstring?

char vs. wchar_t

what about Unicode, then?

On Linux?

On Windows?

Memory issues?

Conclusion

string wstring相关推荐

最新文章

热门文章

`string`? `wstring`?

`char` vs. `wchar_t`