摘自:stackoverflow

string? wstring?

std::string is a basic_string templated on a char, and std::wstring on a wchar_t.

char vs. wchar_t

char is supposed to hold a character, usually a 1-byte character. wchar_t is supposed to hold a wide character, and then, things get tricky: On Linux, a wchar_t is 4-bytes, while on Windows, it's 2-bytes

what about Unicode, then?

The problem is that neither char nor wchar_t is directly tied to unicode.

On Linux?

Let's take a Linux OS: My Ubuntu system is already unicode aware. When I work with a char string, it is natively encoded in UTF-8 (i.e. Unicode string of chars). The following code:

#include<cstring>#include<iostream>int main(int argc,char* argv[]){constchar text[]="olé";constwchar_t wtext[]= L"olé";std::cout <<"sizeof(char)    : "<<sizeof(char)<< std::endl ;std::cout <<"text            : "<< text << std::endl ;std::cout <<"sizeof(text)    : "<<sizeof(text)<< std::endl ;std::cout <<"strlen(text)    : "<< strlen(text)<< std::endl ;std::cout <<"text(binary)    :";for(size_t i =0, iMax = strlen(text); i < iMax;++i){std::cout <<" "<<static_cast<unsignedint>(static_cast<unsignedchar>(text[i]));}std::cout << std::endl << std::endl ;std::cout <<"sizeof(wchar_t) : "<<sizeof(wchar_t)<< std::endl ;//std::cout << "wtext           : " << wtext << std::endl ; <- errorstd::cout <<"wtext           : UNABLE TO CONVERT NATIVELY."<< std::endl ;std::wcout << L"wtext          : "<< wtext << std::endl;std::cout <<"sizeof(wtext)   : "<<sizeof(wtext)<< std::endl ;std::cout <<"wcslen(wtext)   : "<< wcslen(wtext)<< std::endl ;std::cout <<"wtext(binary)   :";for(size_t i =0, iMax = wcslen(wtext); i < iMax;++i){std::cout <<" "<<static_cast<unsignedint>(static_cast<unsignedshort>(wtext[i]));}std::cout << std::endl << std::endl ;return0;}

outputs the following text:

sizeof(char):1
text            : olésizeof(text):5
strlen(text):4
text(binary):111108195169sizeof(wchar_t):4
wtext           : UNABLE TO CONVERT NATIVELY.sizeof(wtext):16
wcslen(wtext):3
wtext(binary):111108233

You'll see the "olé" text in char is really constructed by four chars: 110, 108, 195 and 169 (not counting the trailing zero). (I'll let you study the wchar_t code as an exercise)

So, when working with a char on Linux, you should usually end up using Unicode without even knowing it. And as std::string works with char, so std::string is already unicode-ready.

Note that std::string, like the C string API, will consider the "olé" string to have 4 characters, not three. So you should be cautious when truncating/playing with unicode chars because some combination of chars is forbidden in UTF-8.

On Windows?

On Windows, this is a bit different. Win32 had to support a lot of application working with char and on different charsets/codepages produced in all the world, before the advent of Unicode.

So their solution was an interesting one: If an application works with char, then the char strings are encoded/printed/shown on GUI labels using the local charset/codepage on the machine. For example, "olé" would be "olé" in a french-localized Windows, but would be something différent on an cyrillic-localized Windows ("olй" if you use Windows-1251). Thus, "historical apps" will usually still work the same old way.

For Unicode based applications, Windows uses wchar_t, which is 2-bytes wide, and is encoded in UTF-16, which is Unicode encoded on 2-bytes characters (or at the very least, the mostly compatible UCS-2, which is almost the same thing IIRC).

Applications using char are said "multibyte" (because each glyph is composed of one or more chars), while applications using wchar_t are said "widechar" (because each glyph is composed of one or two wchar_t. See MultiByteToWideChar and WideCharToMultiByte Win32 conversion API for more info.

Thus, if you work on Windows, you badly want to use wchar_t (unless you use a framework hiding that, like GTK+ or QT...). The fact is that behind the scenes, Windows works with wchar_t strings, so even historical applications will have their char strings converted in wchar_t when using API like SetWindowText (low level API function to set the label on a Win32 GUI).

Memory issues?

UTF-32 is 4 bytes per characters, so there is no much to add, if only that a UTF-8 text and UTF-16 text will always use less or the same amount of memory than an UTF-32 text (and usually less).

If there is a memory issue, then you should know than for most western languages, UTF-8 text will use less memory than the same UTF-16 one.

Still, for other languages (chinese, japanese, etc.), the memory used will be either the same, or larger for UTF-8 than for UTF-16.

All in all, UTF-16 will mostly use 2 bytes per characters (unless you're dealing with some kind of esoteric language glyphs (Klingon? Elvish?), while UTF-8 will spend from 1 to 4 bytes.

See http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16 for more info.

Conclusion

1. When I should use std::wstring over std::string?

On Linux? Almost never (§).
On Windows? Almost always (§).
On cross-plateform code? Depends on your toolkit...

(§) : unless you use a toolkit/framework saying otherwise

2. Can std::string hold all the ASCII character set including special characters?

Notice: A std::string is suitable for holding a 'binary' buffer, where a std::wstring is not!

On Linux? Yes.
On Windows? Only special characters available for the current locale of the Windows user.

Edit (After a comment from Johann Gerell): a std::string will be enough to handle all char based strings (each char being a number from 0 to 255). But:

  1. ASCII is supposed to go from 0 to 127. Higher chars are NOT ASCII.
  2. a char from 0 to 127 will be held correctly
  3. a char from 128 to 255 will have a signification depending on your encoding (unicode, non-unicode, etc.), but it will be able to hold all Unicode glyphs as long as they are encoded in UTF-8.

3. Is std::wstring supported by almost all popular C++ compilers?

Mostly, with the exception of GCC based compilers that are ported to Windows
It works on my g++ 4.3.2 (under Linux), and I used Unicode API on Win32 since Visual C++ 6.

4. What is exactly a wide character?

On C/C++, it's a character type written wchar_t which is larger than the simple char character type. It is supposed to be used to put inside characters whose indices (like Unicode glyphs) are larger than 255 (or 127, depending...)

转载于:https://www.cnblogs.com/coolbear/archive/2013/05/24/3096406.html

string wstring相关推荐

  1. boost库学习随记五 Boost.Locale 之字符转换 gbk utf8 big5 string wstring等

    Boost.Locale是一个库,它提供高质量的本地化的设施在C + +的方式.它最初是设计的一部分CppCMS - C + +的Web框架的项目,然后促成了提升. Boost.Locale提供强大的 ...

  2. Boost.Locale 之字符转换 gbk utf8 big5 string wstring等

    Boost.Locale是一个库,它提供高质量的本地化的设施在C + +的方式.它最初是设计的一部分CppCMS - C + +的Web框架的项目,然后促成了提升. Boost.Locale提供强大的 ...

  3. string,wstring,u16string,u32string相互转换

    目录 1.各种编码格式 参考: 针对C++中文会乱码的问题 1.各种编码格式 中文操作系统默认ansi编码 不同的国家和地区制定了不同的标准,由此产生了 GB2312.GBK.GB18030.Big5 ...

  4. C++ string wstring CString 字符串转换

    说明:     0.实质是 char* <--> wchar_t* 的转换     1.cout输出char*,wcout输出wchar_t*或char*     2.ANSI (Mult ...

  5. string、wstring、UTF-8、UTF-16、UTF-32之间转换

    //string转wstring std::wstring string_to_wstring(const std::string& str) { setlocale(LC_ALL, &quo ...

  6. c++中wstring 和 string的转换

    一.wchar_t和char的基础知识 1.C程序中使用的char类型,是占用一个字节,一共可以表示256个字符.而在32位系统中,char*是占用四个字节的,因为一个指针变量占用的存储空间为4个字节 ...

  7. C++string与wstring类型转换

    在c++开发时有的库函数必须传递wstring宽字符串作为参数,在代码中通过L"wstring"定义宽字符串传递参数这没什么难度,问题是实际过程中需要接收输入string数据,这时 ...

  8. 标准C++中的string类的用法总结

    相信使用过MFC编程的朋友对CString这个类的印象应该非常深刻吧?的确,MFC中的CString类使用起来真的非常的方便好用.但是如果离开了MFC框架,还有没有这样使用起来非常方便的类呢?答案是肯 ...

  9. 标准C++中string类用法总结

    2019独角兽企业重金招聘Python工程师标准>>> 相信使用过MFC编程的朋友对CString这个类的印象应该非常深刻吧?的确,MFC中的CString类使用起来真的非常的方便好 ...

最新文章

  1. OpenCV畸变校正原理以及损失有效像素原理分析
  2. Swift学习: 从Objective-C到Swift
  3. Python3 函数function
  4. java domain层_java框架中的controller层、dao层、domain层、service层、view层
  5. [引]ASP.NET 中 事务处理(SqlTransaction)示例
  6. 洛谷 [P2590] 树的统计
  7. 新闻视频 36:整合首页 用到 Repeater 主要用gridview /gridview去掉边框用到 BorderWidth=”0” inner join和 left...
  8. Java面试知识点:网络编程
  9. Java IO流总结
  10. python3.8.0安装详细步骤_python3.8.0安装教程_后端开发
  11. 《Python数据可视化编程实战》—— 1.3 安装virtualenv和virtualenvwrapper
  12. Greedy Mouse 贪心的耗子 nyoj824(贪心算法)
  13. JS中一些常用的函数(持续更新)
  14. 获取所有股票历史数据存到Excel
  15. Python全栈 Linux基础之2.Linux终端命令简介
  16. 8位数码管静态显示c语言,数码管静态显示介绍_8位数码管静态显示程序解析
  17. css 取偶数节点_CSS选择器:奇偶匹配nth-child(even)
  18. 基于单片机的建筑工地降尘系统
  19. Best practices for a new Go developer
  20. UGUI ContentSizeFitter 嵌套 适配

热门文章

  1. c语言 freopen txt_C语言的文件操作 freopen
  2. 透明色的rgb值是多少_一文掌握PPT主题色原理及使用技巧
  3. 好未来AI Lab-文本检测方法分析
  4. 和显卡驱动要配套吗_显卡有必要更新驱动程序吗?老玩家的建议请收好
  5. upload组件 获得焦点_HTML Input FileUpload autofocus用法及代码示例
  6. activiti jsp 流程设计器_「Activiti精品 悟纤出品」Activiti插件来助你一臂之力 - 第327篇...
  7. python3生成随机数_python3实现随机数
  8. ObjC学习6-分类、协议及预处理程序
  9. Android基站定位——单基站定位(二)
  10. vue computed使用_前端发展方向指南—Vue运行机制