php baseconvert,mb_convert

用户评论:

[#1]

Daniel [2015-11-17 16:25:39]

If you are attempting to convert "UTF-8" text to "ISO-8859-1" and the result is always returning in "ASCII", place the following line of code before the mb_convert_encoding:

mb_detect_order(array('UTF-8', 'ISO-8859-1'));

It is necessary to force a specific search order for the conversion to work

[#2]

jackycms at outlook dot com [2014-07-19 02:48:44]

// mb_convert_encoding($input,'UTF-8','windows-874'); error : Illegal character encoding specified

// so convert Thai to UTF-8 is better use iconv instead

iconv("windows-874","UTF-8",$input);?>

[#3]

DanielAbbey at Hotmail dot co dot uk [2014-03-05 16:30:52]

When using the Windows Notepad text editor, it is important to note that when you select 'Save As' there is an Encoding selection dropdown. The default encoding is set to ANSI, with the other two options being Unicode and UTF-8. Since most text on the web is in UTF-8 format it could prove vital to save the .txt file with this encoding, since this function does not work on ANSI-encoded text.

[#4]

josip at cubrad dot com [2013-06-27 23:24:17]

For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made ??this:

function w1250_to_utf8($text) {

// map based on:

// http://konfiguracja.c0.pl/iso02vscp1250en.html

// http://konfiguracja.c0.pl/webpl/index_en.html#examp

// http://www.htmlentities.com/html/entities/

$map = array(

chr(0x8A) => chr(0xA9),

chr(0x8C) => chr(0xA6),

chr(0x8D) => chr(0xAB),

chr(0x8E) => chr(0xAE),

chr(0x8F) => chr(0xAC),

chr(0x9C) => chr(0xB6),

chr(0x9D) => chr(0xBB),

chr(0xA1) => chr(0xB7),

chr(0xA5) => chr(0xA1),

chr(0xBC) => chr(0xA5),

chr(0x9F) => chr(0xBC),

chr(0xB9) => chr(0xB1),

chr(0x9A) => chr(0xB9),

chr(0xBE) => chr(0xB5),

chr(0x9E) => chr(0xBE),

chr(0x80) => '€',

chr(0x82) => '‚',

chr(0x84) => '„',

chr(0x85) => '…',

chr(0x86) => '†',

chr(0x87) => '‡',

chr(0x89) => '‰',

chr(0x8B) => '‹',

chr(0x91) => '‘',

chr(0x92) => '’',

chr(0x93) => '“',

chr(0x94) => '”',

chr(0x95) => '•',

chr(0x96) => '–',

chr(0x97) => '—',

chr(0x99) => '™',

chr(0x9B) => '’',

chr(0xA6) => '¦',

chr(0xA9) => '©',

chr(0xAB) => '«',

chr(0xAE) => '®',

chr(0xB1) => '±',

chr(0xB5) => 'µ',

chr(0xB6) => '¶',

chr(0xB7) => '·',

chr(0xBB) => '»',

);

return html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8', 'ISO-8859-2'), ENT_QUOTES, 'UTF-8');

}

[#5]

urko at wegetit dot eu [2012-09-11 18:17:29]

If you are trying to generate a CSV (with extended chars) to be opened at Exel for Mac, the only that worked for me was:

I also tried this:

But the first one didn't show extended chars correctly, and the second one, did't separe fields correctly

[#6]

qdb at kukmara dot ru [2011-11-07 07:44:54]

mb_substr and probably several other functions works faster in ucs-2 than in utf-8. and utf-16 works slower than utf-8. here is test, ucs-2 is near 50 times faster than utf-8, and utf-16 is near 6 times slower than utf-8 here:

header('Content-Type: text/html; charset=utf-8');mb_internal_encoding('utf-8');$s='?????????????????????2049??????????????????.??????????'.'?????????????????????034928348539857???????????????????????';$s.=$s;$s.=$s;$s.=$s;$s.=$s;$s.=$s;$s.=$s;$s.=$s;$t1=microtime(true);$i=0;

while($i

if($i==10)echo$a.'. ';//echo$a.'. ';}

echo$i.'. ';

echo(microtime(true)-$t1);

echo'
';$s=mb_convert_encoding($s,'UCS-2','utf8');mb_internal_encoding('UCS-2');$t1=microtime(true);$i=0;

while($i

if($i==10)echomb_convert_encoding($a,'utf8','ucs2').'. ';//echo$a.'. ';}

echo$i.'. ';

echo(microtime(true)-$t1);

echo'
';$s=mb_convert_encoding($s,'utf-16','ucs-2');mb_internal_encoding('utf-16');$t1=microtime(true);$i=0;

while($i

if($i==10)echomb_convert_encoding($a,'utf8','utf-16').'. ';//echo$a.'. ';}

echo$i.'. ';

echo(microtime(true)-$t1);?>

output:

???. 12416. 1.71738100052

???. 12416. 0.0211279392242

???. 12416. 11.2330229282

[#7]

gullevek at gullevek dot org [2010-08-25 00:27:44]

If you want to convert japanese to ISO-2022-JP it is highly recommended to use ISO-2022-JP-MS as the target encoding instead. This includes the extended character set and avoids ? in the text. For example the often used "1 in a circle" ?? will be correctly converted then.

[#8]

regrunge at hotmail dot it [2010-05-14 08:00:29]

I've been trying to find the charset of a norwegian (with a lot of ?, ?, ?) txt file written on a Mac, i've found it in this way:

$text="A strange string to pass, maybe with some ?, ?, ? characters.";

foreach(mb_list_encodings() as$chr){

echomb_convert_encoding($text,'UTF-8',$chr)." : ".$chr."
";

}?>

The line that looks good, gives you the encoding it was written in.

Hope can help someone

[#9]

Daniel Trebbien [2009-07-23 11:25:38]

Note that `mb_convert_encoding($val, 'HTML-ENTITIES')` does not escape '\'', '"', '', or '&'.

[#10]

me at gsnedders dot com [2009-06-18 15:06:42]

It appears that when dealing with an unknown "from encoding" the function will both throw an E_WARNING and proceed to convert the string from ISO-8859-1 to the "to encoding".

[#11]

chzhang at gmail dot com [2009-01-05 00:34:31]

instead of ini_set(), you can try this

mb_substitute_character("none");

[#12]

francois at bonzon point com [2008-11-10 17:05:38]

aaron, to discard unsupported characters instead of printing a ?, you might as well simply set the configuration directive:

mbstring.substitute_character = "none"

in your php.ini. Be sure to include the quotes around none. Or at run-time with

ini_set('mbstring.substitute_character',"none");?>

[#13]

aaron at aarongough dot com [2008-11-07 08:24:46]

My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)

Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding.

{// detect the character encoding of the incoming file$encoding=mb_detect_encoding($source,"auto");// escape all of the question marks so we can remove artifacts from

// the unicode conversion process$target=str_replace("?","[question_mark]",$source);// convert the string to the target encoding$target=mb_convert_encoding($target,$target_encoding,$encoding);// remove any question marks that have been introduced because of illegal characters$target=str_replace("?","",$target);// replace the token string "[question_mark]" with the symbol "?"$target=str_replace("[question_mark]","?",$target);

return$target;

}?>

Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)

-A

[#14]

Edward [2008-09-16 03:54:55]

If mb_convert_encoding doesn't work for you, and iconv gives you a headache, you might be interested in this free class I found. It can convert almost any charset to almost any other charset. I think it's wonderful and I wish I had found it earlier. It would have saved me tons of headache.

I use it as a fail-safe, in case mb_convert_encoding is not installed. Download it from http://mikolajj.republika.pl/

This is not my own library, so technically it's not spamming, right? ;)

Hope this helps.

[#15]

StigC [2008-08-13 15:38:47]

For the php-noobs (like me) - working with flash and php.

Here's a simple snippet of code that worked great for me, getting php to show special Danish characters, from a Flash email form:

[#16]

nospam at nihonbunka dot com [2008-05-15 18:51:34]

rodrigo at bb2 dot co dot jp wrote that inconv works better than mb_convert_encoding, I find that when converting from uft8 to shift_jis

$conv_str = mb_convert_encoding($str,$toCS,$fromCS);

works while

$conv_str = iconv($fromCS,$toCS.'//IGNORE',$str);

removes tildes from $str.

[#17]

katzlbtjunk at hotmail dot com [2008-01-25 04:36:30]

Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful.

$fileName = 'Test:!"$%&/()=???????<

echo strtr(mb_convert_encoding($fileName,'ASCII'),

' ,;:?*#!??$%&/(){}<>=`?|\\\'"',

'____________________________');

[#18]

rodrigo at bb2 dot co dot jp [2008-01-15 03:47:52]

For those who can?t use mb_convert_encoding() to convert from one charset to another as a metter of lower version of php, try iconv().

I had this problem converting to japanese charset:

$txt=mb_convert_encoding($txt,'SJIS',$this->encode);

And I could fix it by using this:

$txt = iconv('UTF-8', 'SJIS', $txt);

Maybe it?s helpfull for someone else! ;)

[#19]

mightye at gmail dot com [2007-11-13 09:24:48]

To petruzanauticoyahoo?com!ar

If you don't specify a source encoding, then it assumes the internal (default) encoding. ? is a multi-byte character whose bytes in your configuration default (often iso-8859-1) would actually mean ???. mb_convert_encoding() is upgrading those characters to their multi-byte equivalents within UTF-8.

Try this instead:

Of course this function does no work (for the most part - it can actually be used to strip characters which are not valid for UTF-8).

[#20]

volker at machon dot biz [2007-09-24 21:05:34]

Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:

public function encodeToUtf8($string) {

return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));

}

public function encodeToIso($string) {

return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));

}

For me these functions are working fine. Give it a try

[#21]

aofg [2007-08-21 18:49:55]

When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them.

Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win.

[#22]

David Hull [2006-12-20 10:52:40]

As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this:

$text=iconv($from_enc,'US-ASCII//TRANSLIT',$text);?>

The only disadvantage is that it does not convert "?" to "ae", but it handles punctuation and other special characters better.

David

[#23]

phpdoc at jeudi dot de [2006-09-05 06:46:41]

I\'d like to share some code to convert latin diacritics to their

traditional 7bit representation, like, for example,

- à,ç,é,î,... to a,c,e,i,...

- ß to ss

- ä,Ä,... to ae,Ae,...

- ë,... to e,...

(mb_convert \"7bit\" would simply delete any offending characters).

I might have missed on your country\'s typographic

conventions--correct me then.

<?php

function to7bit($text,$from_enc) {

$text = mb_convert_encoding($text,\'HTML-ENTITIES\',$from_enc);

$text = preg_replace(

array(\'/ß/\',\'/&(..)lig;/\',

\'/&([aouAOU])uml;/\',\'/&(.)[^;]*;/\'),

array(\'ss\',\"$1\",\"$1\".\'e\',\"$1\"),

$text);

return $text;

}

Enjoy :-)

Johannes

[EDIT BY danbrown AT php DOT net: Author provided the following update on 27-FEB-2012.]

An addendum to my "to7bit" function referenced below in the notes.

The function is supposed to solve the problem that some languages require a different 7bit rendering of special (umlauted) characters for sorting or other applications. For example, the German ß ligature is usually written "ss" in 7bit context. Dutch ÿ is typically rendered "ij" (not "y").

The original function works well with word (alphabet) character entities and I've seen it used in many places. But non-word entities cause funny results:

The following version fixes this by converting non-alphanumeric characters (also chains thereof) to '_'.

<?php

function to7bit($text,$from_enc) {

$text = preg_replace(/W+/,'_',$text);

$text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc);

$text = preg_replace(

array('/ß/','/&(..)lig;/',

'/&([aouAOU])uml;/','/ÿ/','/&(.)[^;]*;/'),

array('ss',"$1","$1".'e','ij',"$1"),

$text);

return $text;

}

Enjoy again,

Johannes

[#24]

mac.com@nemo [2006-07-08 07:38:47]

For those wanting to convert from $set to MacRoman, use iconv():

$string=iconv('UTF-8','macintosh',$string);?>

('macintosh' is the IANA name for the MacRoman character set.)

[#25]

eion at bigfoot dot com [2006-02-20 16:54:52]

many people below talk about using

mb_convert_encode($s,'HTML-ENTITIES','UTF-8');?>

to convert non-ascii code into html-readable stuff. Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.

So [insert korean characters here] turned into ?????.

I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)

so instead of

mb_convert_encode(&$s,'HTML-ENTITIES','UTF-8');?>

which worked perfectly until I upgraded, so I had to use

call_user_func_array('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));?>

Hope it helps someone else out

[#26]

Tom Class [2005-11-11 07:35:53]

Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:

HTML-ENTITIES

Example:

$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");

[#27]

Stephan van der Feest [2005-09-09 04:47:41]

To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:

function htmltoflash($htmlstr)

{

return str_replace("<br />","\n",

str_replace("

str_replace(">",">",

mb_convert_encoding(html_entity_decode($htmlstr),

"UTF-8","ISO-8859-1"))));

}

[#28]

Stephan van der Feest [2005-09-09 03:50:54]

Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.

Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:

function utf8html($utf8str)

{

return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));

}

[#29]

jamespilcher1 - hotmail [2004-02-01 19:55:57]

be careful when converting from iso-8859-1 to utf-8.

even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed.

for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.

it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).

this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding()

IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.

so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.

[#30]

lanka at eurocom dot od dot ua [2003-02-07 08:03:56]

Another sample of recoding without MultiByte enabling.

(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)

// 1 - koifunctiondetect_encoding($str) {$win=0;$koi=0;

for($i=0;$i

if(ord($str[$i]) >224&&ord($str[$i]) <255)$win++;

if(ord($str[$i]) >192&&ord($str[$i]) <223)$koi++;

}

if($win

return1;

} else return0;

}// recodes koi to winfunctionkoi_to_win($string) {$kw= array(128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,254,224,225,246,228,229,244,227,245,232,233,234,235,236,237,238,239,255,240,241,242,243,230,226,252,251,231,248,253,249,247,250,222,192,193,214,196,197,212,195,213,200,201,202,203,204,205,206,207,223,208,209,210,211,198,194,220,219,199,216,221,217,215,218);$wk= array(128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,225,226,247,231,228,229,246,250,233,234,235,236,237,238,239,240,242,243,244,245,230,232,227,254,251,253,255,249,248,252,224,241,193,194,215,199,196,197,214,218,201,202,203,204,205,206,207,208,210,211,212,213,198,200,195,222,219,221,223,217,216,220,192,209);$end=strlen($string);$pos=0;

do {$c=ord($string[$pos]);

if ($c>128) {$string[$pos] =chr($kw[$c-128]);

}

} while (++$pos

return$string;

}

functionrecode($str) {$enc=detect_encoding($str);

if ($enc==1) {$str=koi_to_win($str);

}

return$str;

}?>

php baseconvert,mb_convert_encoding相关推荐

PHP中的mb_convert_encoding与iconv函数介绍
iconv函数库能够完成各种字符集间的转换,是php编程中不可缺少的基础函数库. 1.下载libiconv函数库http://ftp.gnu.org/pub/gnu/libiconv/libicon ...
PHP的内码转换函数 mb_convert_encoding()
PHP的内码转换函数 mb_convert_encoding() 转载:http://www.bitscn.com/pdb/php/200701/95622.html 因为某程序要用输出UTF-8编码 ...
百度编辑器上传失败问题--转码问题mb_convert_encoding与iconv
纳尼?我...... 多么正常的编辑器,我本地跑的好好的,我赶紧去试了一下,果然报错,WTF...... 但是:虽然报错,当点击"在线管理"(就是选择服务器上已存在的文件)的时候, ...
php detect unicode,php-functions/unicode.php at master · xiilei/php-functions · GitHub
/* DOC @@ 字符编码转换: iconv: document: http://cn2.php.net/manual/zh/function.iconv.php code: // 把UTF-8的编 ...
domdocument php charset gbk,PHP DomDocument无法处理utf-8字符（☆）
小编典典 DOMDocument::loadHTML()需要一个HTML字符串. HTML ISO-8859-1根据其规范使用默认的编码(ISO拉丁字母1号).那是因为更长,请参见 6.1. HTML ...
php 快速导出csv,php快速导出csv格式数据程序代码
$exportdata = '规则111,规则222,审222,规222,服2222,规则1,规则2,规则3,匹配字符,设置时间,有效期'."＼n"; $date = date(& ...
Wing IDE Pro (Wing pro 6.0) for Ubuntu/linux
Wing IDE Pro (Wing pro 6.0) for Ubuntu/linux 首先将下面的代码保存为aa.py 文件代码源自:http://blog.csdn.net/u01288532 ...
php自定义函数出现乱码,php的imagettftext 函数出现乱码的解决方法
php的imagettftext 函数出现乱码的解决方法:今天遇到一个问题,就是往图片上打文字水印,当是汉字的时候出现了乱码,是英文时候并没有出现乱码. 查资料后最终找到两种解决方案,分别如下: 1. ...
php中icon,php中iconv函数的使用方法
本篇文章中的内容介绍的是php中iconv函数的使用方法,在这里分享给大家,有需要的朋友可以参考一下最近在做一个程序,需要用到iconv函数把抓取来过的utf-8编码的页面转成gb2312, 发现只 ...
php中短信验证大致流程,实现php手机短信验证功能的基本思路
现在很多网站为了避免用户烂注册,都在注册环节添加有手机短信验证功能,用户注册时需要短信验证码才可以,那么这种手机短信验证功能是如何实现的呢?其基本思路是什么呢?下面乐信小编就来为大家介绍下: 实现手机 ...

php baseconvert,mb_convert_encoding

php baseconvert,mb_convert_encoding相关推荐

最新文章

热门文章