What's the difference between unicode and utf8?

up vote 103 down vote favorite

49

Is it true that unicode=utf16 ?

UPDATE

Many are saying unicode is a standard not an encoding,but most editors support save as Unicode encoding actually.

As Rasmus states in his article "The difference between UTF-8 and Unicode?" (link fixed):

If asked the question, "What is the difference between UTF-8 and Unicode?", would you confidently reply with a short and precise answer? In these days of internationalization all developers should be able to do that. I suspect many of us do not understand these concepts as well as we should. If you feel you belong to this group, you should read this ultra short introduction to character sets and encodings.

Actually, comparing UTF-8 and Unicode is like comparing apples and oranges:

UTF-8 is an encoding - Unicode is a character set

A character set is a list of characters with unique numbers (these numbers are sometimes referred to as "code points"). For example, in the Unicode character set, the number for A is 41.

An encoding on the other hand, is an algorithm that translates a list of numbers to binary so it can be stored on disk. For example UTF-8 would translate the number sequence 1, 2, 3, 4 like this:

00000001 00000010 00000011 00000100 

Our data is now translated into binary and can now be saved to disk.

All together now

Say an application reads the following from the disk:

1101000 1100101 1101100 1101100 1101111 

The app knows this data represent a Unicode string encoded with UTF-8 and must show this as text to the user. First step, is to convert the binary data to numbers. The app uses the UTF-8 algorithm to decode the data. In this case, the decoder returns this:

104 101 108 108 111 

Since the app knows this is a Unicode string, it can assume each number represents a character. We use the Unicode character set to translate each number to a corresponding character. The resulting string is "hello".

Conclusion

So when somebody asks you "What is the difference between UTF-8 and Unicode?", you can now confidently answer short and precise:

UTF-8 and Unicode cannot be compared. UTF-8 is an encoding used to translate numbers into binary data. Unicode is a character set used to translate characters into numbers.

shareimprove this answer
edited May 2 at 15:42
Rasmus Rønn Nielsen

12010

answered Nov 3 '12 at 19:09
vikas devde

5,36772336

 
19  
@vikas...I wish I could upvote you 100 times...but thanks for explaining it very very clearly! – user547453 Dec 28 '12 at 19:04
    
LOVELY! Thankyou... – OceanBlue Mar 31 '13 at 1:36
    
Smashing indeed! – MalsR May 1 '13 at 22:56
2  
This is totally correct, and answers the question posed in the title. It does not however answer the actual question, which is based on a misrepresentation of Microsoft using Unicode to refer to UTF-16. – Mark Ransom Feb 13 '14 at 14:07
2  
Feel relaxed after finding this. Thanks vikas – Ramyavjr Mar 2 '14 at 14:56
          

most editors support save as ‘Unicode’ encoding actually.

This is an unfortunate misnaming perpetrated by Windows.

Because Windows uses UTF-16LE encoding internally as the memory storage format for Unicode strings, it considers this to be the natural encoding of Unicode text. In the Windows world, there are ANSI strings (the system codepage on the current machine, subject to total unportability) and there are Unicode strings (stored internally as UTF-16LE).

This was all devised in the early days of Unicode, before we realised that UCS-2 wasn't enough, and before UTF-8 was invented. This is why Windows's support for UTF-8 is all-round poor.

This misguided naming scheme became part of the user interface. A text editor that uses Windows's encoding support to provide a range of encodings will automatically and inappropriately describe UTF-16LE as “Unicode”, and UTF-16BE, if provided, as “Unicode big-endian”.

(Other editors that do encodings themselves, like Notepad++, don't have this problem.)

If it makes you feel any better about it, ‘ANSI’ strings aren't based on any ANSI standard, either.

UTF-8和Unicode相关推荐

  1. Unicode、UTF-8 和 ISO8859-1到底有什么区别(转载)

    本文主要包括以下几个方面:编码基本知识,java,系统软件,url,工具软件等. 在下面的描述中,将以"中文"两个字为例,经查表可以知道其GB2312编码是"d6d0 c ...

  2. Unicode、UTF 和 ISO-8859-1等编码方式详解与浏览器URL编码

    将字符转换为二进制码的过程,我们称为编码,将二进制码转换为字符的过程,我们称为解码. 编码和解码时所采用的规则,我们称为字符集 常见的字符集: ASCII - 美国人编码,使用7位来对美国常用的字符进 ...

  3. 字符集与编码(四)——Unicode

    2019独角兽企业重金招聘Python工程师标准>>> 注:由于两边同步的麻烦,更多更改及调整可参考我的网站:xiaogd.net 上的字符集编码与乱码系列,已将字符集编码系列与乱码 ...

  4. 关于编码ansi、GB2312、unicode与utf-8的区别

     关于编码ansi.GB2312.unicode与utf-8的区别 2014-01-25 08:51 529人阅读 评论(0) 收藏 举报 本文章已收录于: 关于编码ansi.GB2312.uni ...

  5. ASCII码、ISO8859-1、Unicode、GBK和UTF-8 的区别

    为什么需要编码? 计算机中最小的存储单位是字节(byte),一个字节所能表示的字符数又有限,1byte=8bit,一个字节最多也只能表示255个字符,而世界上的语种又多,都有各种不同的字符,无法用一个 ...

  6. MFC开发IM-第二十二篇、C++中 Unicode 与 UTF-8 编码互转

    1.简述 最近在发送网络请求时遇到了中文字符乱码的问题,在代码中调试字符正常,用抓包工具抓的包中文字符显示正常,就是发送到服务器就显示乱码了,那就要将客户端和服务器设置统一的编码(UTF-8),而我们 ...

  7. AJPFX解析关于编码ansi、GB2312、unicode与utf-8的区别

    大家平时遇到乱码问题是否有自己的一套解决方案?这篇文章就是介绍一下常用的编码方式 关于编码ansi.GB2312.unicode与utf-8的区别 先做一个小小的试验: 在一个文件夹里,把一个txt文 ...

  8. UNICODE与UTF-8的转换详解

    UNICODE与UTF-8的转换详解 1 编码 在计算机中,各种信息都是以二进制编码的形式存在的,也就是说,不管是文字.图形.声音.动画,还是电影等各种信息,在计算机中都是以0和1组成的二进制代码表示 ...

  9. UNICODE与 UTF-8的转换详解

    1 编码 在计算机中,各种信息都是以二进制编码的形式存在的,也就是说,不管是文字.图形.声音.动画,还是电影等各种信息,在计算机中都是以0和1组成的二进制代码表示的.为了区分这些信息,人们就为计算机设 ...

  10. 汉字编码(【Unicode】 【UTF-8】 【Unicode与UTF-8之间的转换】 【汉字 Unicode 编码范围】【中文标点Unicode码】【GBK编码】【批量获取汉字UNICODE码】)

    参考博客: Unicode与UTF-8互转(C语言实现):http://blog.csdn.net/tge7618291/article/details/7599902 汉字 Unicode 编码范围 ...

最新文章

  1. 贫血模型,充血模型(领域驱动设计)
  2. 修改vs17中的cordova模板
  3. netflix 工作原理_Netflix如何在屏幕后面工作?
  4. 在Spring MVC Web应用程序中使用reCaptcha
  5. 使用OpenCV python模块读取图像并将其另存为灰度系统
  6. Keras框架:人脸检测-mtcnn思想及代码
  7. Sqlite 管理工具收藏
  8. 如何在MacBook连接鼠标时,停用内置触控式轨迹板?
  9. 自制hdmi线一头改vga图_东莞VGA数据线厂商价格
  10. lnmp无法删除.user.ini
  11. BFC和haslayout(IE6-7)(待总结。。。)
  12. linux内核分析与应用 -- 并发(下)
  13. python photoshop自动化_你会用Python 搞定你的电子签名吗?
  14. 统计学习——联合概率分布
  15. matlab实现A律13折线的编码和译码以及量化误差的计算
  16. 飞机躲子弹小游戏案例
  17. 经典力学(动力学)——动量守恒定律与能量守恒定律
  18. CVPR 2018视频行为识别挑战赛概览
  19. win10解决解压复制粘贴速度慢的问题
  20. 四点底“灬”在汉字中的含义

热门文章

  1. 2021-05-07 matlab中的addpath用法
  2. javascript 数组以及对象的深拷贝方法
  3. mysql的date函数可以干啥,MySql的Date函数
  4. python怎么将输入的数字变成列表_Python键盘输入转换为列表的实例
  5. ie浏览器在线使用_全球浏览器5月份市场份额公布!Chrome和Edge浏览器迎来新一轮“撕X”大战...
  6. 顺序表应用3:元素位置互换之移位算法
  7. Tesseract使用日记
  8. 吴恩达的视频课程做成了文字版 ~~~
  9. 缓存级别与缓存更新问题
  10. MySQL的几个概念:主键,外键,索引,唯一索引