Redis in Python：Redis 在Python2和Python3中文编码的区别

来源：互联网发布：考试经文知乎编辑：程序博客网时间：2024/06/02 12:40

前言：

Python把中文字符串保存到Redis，中文字符串会被转换为字节流的形式，Python从Redis的获取的中文字符串值是以字节流的形式的，所以想要显示中文还需要解码。Python2从Redis获取的值是str类型，而Python3则是bytes类型，见下面两段代码。从下面两段代码可以看出中文字符串“计算机”在Python2是以'\xbc\xc6\xcb\xe3\xbb\xfa'形式存在的，在Python3中是以b'\xe8\xae\xa1\xe7\xae\x97\xe6\x9c\xba'，也就是编码方式不一样，所以解码为中文的方式也不一样。

Python2：

>>> import redis>>> r = redis.Redis()>>> r.set('test', '计算机')True>>> r.get('test')'\xbc\xc6\xcb\xe3\xbb\xfa'>>> a = r.get('test')>>> type(a)<type 'str'>>>>

Python3:

>>> import redis>>> r = redis.Redis()>>> r.set('foo', '计算机')True>>> r.get('foo')b'\xe8\xae\xa1\xe7\xae\x97\xe6\x9c\xba'>>> a = r.get('foo')>>> type(a)<class 'bytes'>>>>

安装：

下面会用到chardet模块，安装命令：

pip install chardet

Python2解码中文：

Python2中Redis返回的字符串是str类型，所以可以通过str.decode函数来解码，解码为GB2312。

>>> import redis>>> r = redis.Redis()>>> r.set('test', '我是中文字符串')True>>> a = r.get('test')>>> t = a.decode('GB2312')>>> tu'\u6211\u662f\u4e2d\u6587\u5b57\u7b26\u4e32'>>> print(t)我是中文字符串>>>

那么我们是怎么知道要解码为GB2312而不是SB213的呢？我们可以通过chardet来检测，代码如下。从chardet.detect的返回值我们可以看到编码GB2312的置信度为0.99。

>>> import redis>>> r = redis.Redis()>>> r.set('test', '我是中文字符串')True>>> a = r.get('test')>>> import chardet>>> chardet.detect(a){'confidence': 0.99, 'encoding': 'GB2312'}>>>

chardet.detect使用小技巧：

chardet.detect检测的字符串越长越准确，越短越不准确，如下面的代码short中文字符串包含三个字符，检测出来是TIS-620编码，啥玩意？？？而long中文字符串包含12字符串，检测出来是GB2312，置信度为0.99。

>>> import redis, chardet>>> r = redis.Redis()>>> r.set('short', '我很短')True>>> r.set('long', '我是个很长的字符串！！！')True>>> short = r.get('short')>>> long = r.get('long')>>> chardet.detect(short){'confidence': 0.5397318180542452, 'encoding': 'TIS-620'}>>> chardet.detect(long){'confidence': 0.99, 'encoding': 'GB2312'}>>>

Python3解码中文：

用同样的方法，检测出在Redis中文字符串Python3中编码为utf-8，置信度为0.99。

>>> import redis, chardet>>> r = redis.Redis()>>> r.set('foo', '喵小姐，我爱你！！！我想你了。')True>>> a = r.get('foo')>>> chardet.detect(a){'language': '', 'encoding': 'utf-8', 'confidence': 0.99}>>>

知道了编码方式，那就好办了，接着上面的代码：

>>> t = a.decode('utf-8')>>> t'喵小姐，我爱你！！！我想你了。'>>>

也可以这样解码：

>>> t = str(a, encoding='utf-8')>>> t'喵小姐，我爱你！！！我想你了。'>>>

脑洞大开：

上面讲的都是解码都仅仅在Python2环境中或者仅仅在Python3环境中，那么我们能不能再Python2保存中文，然后在Python3解码，或者反过来。

我们先试一下Python2保存，Python3解码：

1.先在Python2环境中执行下面的代码：

>>> # Py2>>> import redis>>> r = redis.Redis()>>> r.set('test', '我是在Py2保存的中文字符串')True>>>

2.然后在Python3执行下面的代码，华丽丽地出错了，前面Python3解码不是说好了是utf-8编码的吗？：

>>> # Py3>>> import redis>>> r = redis.Redis()>>> a = r.get('test')>>> a.decode('utf-8')Traceback (most recent call last):  File "<pyshell#10>", line 1, in <module>    a.decode('utf-8')UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte>>>

3.肯定是打开的方式不对，对没错，就是，在解码之前我们忘了先检查编码方式了，下面的代码还是在Python3中执行，这次正确了：

>> # Py3>>> import redis>>> r = redis.Redis()>>> a = r.get('test')>>> ab'\xce\xd2\xca\xc7\xd4\xdaPy2\xb1\xa3\xb4\xe6\xb5\xc4\xd6\xd0\xce\xc4\xd7\xd6\xb7\xfb\xb4\xae'>>> import chardet>>> chardet.detect(a){'language': 'Chinese', 'encoding': 'GB2312', 'confidence': 0.99}>>> a.decode('GB2312')'我是在Py2保存的中文字符串'>>>

在Python3中保存中文，在Python2中解码：

1.先在Python3中执行下面的代码：

>>> # Py3>>> import redis>>> r = redis.Redis()>>> r.set('test', '我是在Py3保存的中文字符串')True>>>

2.然后在Python2中执行下面的代码，吸取了上面的教训，解码前我们先检测编码方式：

>>> # Py2>>> import redis, chardet>>> r = redis.Redis()>>> a = r.get('test')>>> chardet.detect(a){'confidence': 0.99, 'encoding': 'utf-8'}>>> t = a.decode('utf-8')>>> tu'\u6211\u662f\u5728Py3\u4fdd\u5b58\u7684\u4e2d\u6587\u5b57\u7b26\u4e32'>>> print(t)我是在Py3保存的中文字符串>>>

总结：

1.在解码Redis返回的字符前要先检测编码方式，根据检测出的编码方式来编码。

2.chardet.detect检测的字符串越长越准确，越短越不准确。

3.保存的时候是什么编码方式，解码的时候也要用这种编码方式，如在Python2中保存的中文字符串是GB2312编码方式，在Python3解码该中文字符串要用GB2312编码。

阅读全文

1 0