|
发表于 2009-8-24 16:44:22
|
显示全部楼层
Post by d00m3d;2019313
中文、locale 之類是吾死穴,弱問一句:
export LANG=C
跟
export LANG=en_US.UTF8
區別在哪?
这个问题估计只有老兄才想过,我平时只知道在中英文之间切换,因为现在的linux默认都是utf8所以就知道en_US.UTF8和zh_CN.UTF8。
刚才在wiki上面查了一下,
The UTF-8 encoding is variable-width, ranging from 1-4 bytes. Each byte has 0-4 leading 1 bits followed by a zero bit to indicate its type. N 1 bits indicates the first byte in a N-byte sequence, with the exception that zero 1 bits indicates a one-byte sequence while one 1 bit indicates a continuation byte in a multi-byte sequence (this was done for ASCII compatability). The scalar value of the Unicode code point is the concatenation of the non-control bits. In this table, zeroes and ones represent control bits, xs represent the lowest 8 bits of the Unicode value, ys represent the next higher 8 bits, and zs represent the bits higher than that.
大意是说utf支持1-4byte的扩展,可以更好的支持国际化字符,以前的字符集都是基于英文字母的,没有扩展的余地,不能做到字符显示和处理的国际化。 |
|