|
发表于 2008-5-5 12:16:57
|
显示全部楼层
文件大小尽管不一样, 但是信息量是一样的, 只能说用 utf-8 对纯中文进行编码会带来更多的无用的或重复的信息, 但是压缩的目的正是要去除这种信息. 理论上来说是这样, 不过具体到实现中的压缩算法上其结果应该还是有所不同的. 我作了一个实际的测试, 如下
- -rw-rw-r-- 1 yun yun 81806 May 5 12:04 gb18030.txt
- -rw-rw-r-- 1 yun yun 118540 May 5 12:03 utf8.txt
- -rw------- 1 yun yun 40057 May 5 12:07 gb18030.txt.7z
- -rw------- 1 yun yun 43416 May 5 12:07 utf8.txt.7z
- -rw-rw-r-- 1 yun yun 39473 May 5 12:05 gb18030.txt.bz2
- -rw-rw-r-- 1 yun yun 39644 May 5 12:05 utf8.txt.bz2
- -rw-rw-r-- 1 yun yun 43352 May 5 12:04 gb18030.txt.gz
- -rw-rw-r-- 1 yun yun 49430 May 5 12:05 utf8.txt.gz
复制代码
采用的压缩软件的版本分别为- [yun@localhost shm]$ gzip --version
- gzip 1.3.5
- (2002-09-30)
- Copyright 2002 Free Software Foundation
- Copyright 1992-1993 Jean-loup Gailly
- This program comes with ABSOLUTELY NO WARRANTY.
- You may redistribute copies of this program
- under the terms of the GNU General Public License.
- For more information about these matters, see the file named COPYING.
- Compilation options:
- DIRENT UTIME STDC_HEADERS HAVE_UNISTD_H HAVE_MEMORY_H HAVE_STRING_H HAVE_LSTAT
- Written by Jean-loup Gailly.
- [yun@localhost shm]$ bzip2 --version
- bzip2, a block-sorting file compressor. Version 1.0.2, 30-Dec-2001.
- Copyright (C) 1996-2002 by Julian Seward.
- This program is free software; you can redistribute it and/or modify
- it under the terms set out in the LICENSE file, which is included
- in the bzip2-1.0 source distribution.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- LICENSE file for more details.
- bzip2: I won't write compressed data to a terminal.
- bzip2: For help, type: `bzip2 --help'.
- [yun@localhost shm]$ 7za
- 7-Zip (A) 4.57 Copyright (c) 1999-2007 Igor Pavlov 2007-12-06
- p7zip Version 4.57 (locale=zh_CN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)
复制代码
gzip 与 bzip2 都采用 -9 进行压缩, 7za 采用 -mx=9 (或许进一步指定一下字典大小会产生更理想的效果) |
|