[分享]gnochm支持中文显示patch

thinux · 发表于 2007-5-14 09:51:11

1. 背景
  gnochm目前对于有charset设置的文件显示比较正常,但对于有些不标准的chm文件(比如明明编码为gb2312或gbk,却不设置charset),gnochm就显示不正常了。本patch主要针对该缺陷而作。

2. patch使用说明
  解压下载的tar包: tar -xvjf gnochm.tar.bz2
  方法1. 用gnochm-0.9.9-addcharset文件,直接替换/usr/bin/gnochm
   sudo cp /usr/bin/gnochm /usr/bin/gnochm.bak
   sudo cp gnochm-0.9.9-addcharset /usr/bin/gnochm
  方法2. 为/usr/bin/gnochm打补丁
   sudo patch -p0 /usr/bin/gnochm < gnochm-0.9.9-addcharset.patch

3. 测试环境
  ubuntu7.04-i686
  archlinux-i686

4. 改动说明
  patch比较简单，主要是显示之前先判断是否对charset进行了设置，如果没有设置，则将其转换成utf8编码，并且设置charset为utf8。

  由于gnochm用python编码，因此你可以随时查看源文件, 进行bug修改。
  欢迎大家对还存在的其它缺陷或bug在此帖中进行讨论!

diony · 发表于 2007-5-17 04:11:05

嗯，我这里有几个文件，本来是可以正常显示的，使用此补丁或者直接用包里的可执行文件之后就不能正常显示了（分支下的），右侧为空。

都不小，就不能发上来了……也忘了都是从哪儿下载回来的了。

waynef · 发表于 2007-5-17 11:09:21

我与2楼的状况差不多：原先左侧正常，右侧乱码的文件，用了该补丁后，左侧正常，右侧空白；原先左侧无法显示的仍旧无法显示，右侧正常的也不正常了。

diony · 发表于 2007-5-17 14:14:15

-_-~楼上的说法比我的还要令楼主伤心……

我来总结一下吧，目前这个补丁在我这儿没有起到正确的作用，正常的有可能不正常，不正常的依然不正常……

supernatural · 发表于 2007-5-17 17:03:12

看看这个能不能用。。。。
我这没gnochm的源码，直接在原来的patch上修改了一点，默认编码改成了gbk。
不知道能不能patch上.....

thinux · 发表于 2007-5-18 20:33:15

OK, 特地说明一下，本patch不是为了解决所有的gnochm中文显示相关问题，它只能解决我在背景中提到的charset未设置的情况，其他诸如左侧列表显示不正确还需要作新的补丁.（见贴图）

to supernatural:
只要装上了gnochm,源代码就是你的可执行文件/usr/bin/gnochm，python写的，很容易看懂，最好不要直接改为gbk, 那样就只能针对简体中文了(何况还有gb18030), utf8才是正道

ps: 代码其实改动微乎其微，不妨将patch贴出，方便大家指正:

--- /usr/bin/gnochm 2006-12-13 23:33:47.000000000 +0800
+++ gnochm-0.9.9-addcharset 2007-05-18 21:31:37.000000000 +0800
@@ -67,6 +67,7 @@
#import gc
+html_charset='<meta http-equiv="Content-Type" content="text/html; charset="%s">'
html_text='<html><head></head><body><center>%s</center></body></html>'
syn_image_html='<html><head></head><body><img src="%s"></body></html>'
@@ -287,7 +288,7 @@
self.param = ""
self.add_level = 0
self.model = model
- self.icon = self.ICON_TOPIC //此处是因为作者不小心留了个\t符号
+ self.icon = self.ICON_TOPIC
self.linklist = {}
#self.column = 0
@@ -322,7 +323,7 @@
#print ' ' * self.column, ' Local=', y
elif (self.param == "merge"):
self.in_obj = 0
- elif (self.param == "new"): //此处是因为作者不小心留了个\t符号
+ elif (self.param == "new"):
self.icon = self.ICON_NEW
#elif (self.param == "imagenumber") and (int(y) % 2 == 0):
# self.icon = self.ICON_NEW
@@ -1046,6 +1047,13 @@
print_log('to_utf8: Error converting %s' % text)
return text
return text
+
+ def get_html_charset(self, f):
+ match = re.search('charset=(?P<cs>[a-zA-Z0-9_-]*)', f)
+ if match:
+ return match.group('cs')
+ else:
+ return None
def close_all(self):
if len(self.chmfiles) > 0:
@@ -1114,6 +1122,9 @@
ftype = mime.split('/')[0]
if ftype == 'image':
f = syn_image_html % pathname
+ if not self.get_html_charset(f):
+ self.document.write_stream(html_charset % 'utf-8')
+ f = self.to_utf8(f);
self.document.write_stream(f)
self.document.close_stream()
self.chmfiles[-1].directory = os.path.dirname(pathname)
@@ -1352,6 +1363,9 @@
ftype = mime.split('/')[0]
if ftype == 'image':
f = syn_image_html % pathname
+ if not self.get_html_charset(f):
+ self.document.write_stream(html_charset % 'utf-8')
+ f = self.to_utf8(f)
self.document.write_stream(f)
self.document.close_stream()
self.handle_anchors(flink)
@@ -1639,11 +1653,13 @@
# history stack and have it removed as soon as
# the original url gets shown
self.document.open_stream('text/html')
+ self.document.write_stream(html_charset % 'utf-8')
html_buffer = re.sub(r'&', r'&', f)
html_buffer = re.sub(r'<', r'<', html_buffer)
html_buffer = re.sub(r'>', r'>', html_buffer)
html_buffer = re.sub(r"'", r''', html_buffer)
html_buffer = re.sub(r'"', r'"', html_buffer)
+ html_buffer = self.to_utf8(html_buffer)
self.document.write_stream(html_buffer)
self.document.close_stream()
self.statusbar.push(_('Viewing source for %s') % lasturl)

复制代码

thinux · 发表于 2007-5-18 20:52:19

个人觉得在gtk环境下支持chm中文最好的还是firefox的chmreader插件，因为可以用javascript调用libchm，解析出结构后交给firefox显示，而firefox是可以自动识别编码的，也可以手工指定，虽然目前中文显示还有些bug、和桌面环境结合得不太好(不能双击直接打开)
很多论坛上传言chmsee对中文支持很好，但是很遗憾对于charset未设置的情况，它也不能正常显示

记得kde下的kchmviewer确实不错，配置选项很多，自动识别编码，应该对一般chm文件都没有问题吧.

diony · 发表于 2007-5-19 01:08:57

可是在我这里确实是本来乱码的仍然乱码，本来正常的有可能右侧变成空白……

thinux · 发表于 2007-5-19 09:14:07

你再试试我新上传的patch吧，只保留了和charset相关的改动，应该不会对其它功能有影响的。
最好用ctrl+u查看源代码, 看乱码是否是因为charset未设置而造成的。
此外用gnochm -d <文件名.chm> 打开试试，看看~/.gnochm/gnochm.log里面记录了些什么

omegao · 发表于 2007-5-19 19:05:39

Post by thinux
个人觉得在gtk环境下支持chm中文最好的还是firefox的chmreader插件，因为可以用javascript调用libchm，解析出结构后交给firefox显示，而firefox是可以自动识别编码的，也可以手工指定，虽然目前中文显示还有些bug、和桌面环境结合得不太好(不能双击直接打开)
很多论坛上传言chmsee对中文支持很好，但是很遗憾对于charset未设置的情况，它也不能正常显示
.

在gtk环境下支持chm中文最好的还是firefox的chmreader插件

		自动登录	找回密码
密码			注册

[分享]gnochm支持中文显示patch

本帖子中包含更多资源

本帖子中包含更多资源

浏览过的版块