Java中char的字节数

来源:互联网 发布:厦门市大数据管理中心 编辑:程序博客网 时间:2024/06/11 22:26

以前一直以为char占一个字节,后来发现远没这么简单。Java中char的字节数,和编码有关。使用UTF-8,英文字符占1个字节,中文占3个字节。下面在是在Ubuntu中测试的结果:

public static void main(String[] args) throws IOException {String chnStr = "中文";System.out.println("length of two Chinese character: " + chnStr.getBytes("UTF-8").length );String engStr = "en";System.out.println("length of two English character: " + engStr.getBytes("UTF-8").length );}

输出:

length of two Chinese character: 6length of two English character: 2

在网上流传这样一个面试题:Java中一个中文char的字节数是?答案为不确定(2,3,4),跟编码有关。下面这段程序可以证明这个答案:

public static void main(String[] args) throws IOException {String chnStr = "华";System.out.println("length of one Chinese character in gbk: " + chnStr.getBytes("GBK").length );System.out.println("length of one Chinese character in UTF-8: " + chnStr.getBytes("UTF-8").length );System.out.println("length of one Chinese character in Unicode: " + chnStr.getBytes("UNICODE").length );}
输出

length of one Chinese character in gbk: 2length of one Chinese character in UTF-8: 3length of one Chinese character in Unicode: 4


在Oracle的官方文档中,关于Java中Unicode字符表示是这样解释的:

Unicode Character Representations

        The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode Standard.)
        The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
        A char value, therefore, represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points. The lower (least significant) 21 bits of int are used to represent Unicode code points and the upper (most significant) 11 bits must be zero. Unless otherwise specified, the behavior with respect to supplementary characters and surrogate char values is as follows:

  • The methods that only accept a char value cannot support supplementary characters. They treat char values from the surrogate ranges as undefined characters. For example, Character.isLetter('\uD840') returns false, even though this specific value if followed by any low-surrogate value in a string would represent a letter.
  • The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).
        In the Java SE API documentation, Unicode code point is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit is used for 16-bit char values that are code units of the UTF-16 encoding. For more information on Unicode terminology, refer to the Unicode Glossary.


原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 xp系统声音没了怎么办 柿子烂到窗台上怎么办 小窗户厨房太暗怎么办 抬东西把腰闪了怎么办 搬重东西后腰疼怎么办 闪了腰怎么办一动就疼 窗户的把手断了怎么办 窗户寸漏不了水怎么办 窗户打开关不上怎么办 新装修的房子有甲醛怎么办 橄榄核上油花了怎么办 虫子飞到耳朵里怎么办 手被虫子咬肿了怎么办 梦见牙掉出血该怎么办 黑户急需5万块钱怎么办 家里欠了好多钱怎么办 欠好多网贷我该怎么办 外面欠了很多钱怎么办 欠了好多网贷怎么办 欠那么多钱我该怎么办 急用钱怎么办谁给指条路 晚上睡不着觉怎么办白天又醒不来 胃疼了好几天怎么办 手机移动卡怎么办副卡 大学我好累我怎么办 感觉婚姻很累了怎么办 一个人的心累了怎么办 头被凉水激着了怎么办 人一但懒惰了怎么办 太懒了怎么办都不想活 和老公和不来该怎么办 摩拜忘记关锁怎么办 绿萝叶子变软了怎么办 水痘疫苗打了3次怎么办 免疫组化p16阳性怎么办 乙肝五项25为阳怎么办 苏宁快递丢件了怎么办 新房苯超标10倍怎么办 结婚后遇到真爱怎么办 除皱针眉毛上扬怎么办 玩游戏充钱后悔怎么办