LinuxSir.cn,穿越时空的Linuxsir!

 找回密码
 注册
搜索
热搜: shell linux mysql
查看: 938|回复: 2

[求]NTFS分区无法生成"乱码文件名", 而EXT4上可以; wget/aria2c

[复制链接]
发表于 2009-8-26 04:27:08 | 显示全部楼层 |阅读模式
那啥, 懒得翻译一遍了, 怕翻成乱码 ;-)

I'm testing with a file called "柯有伦-零.mp3", which contains Chinese characters.

My locale: en_US.utf8
Downloader I tested with: wget, aria2c
Target filesystem I tested with: ext4, ntfs

I find it strange the same filename has two forms in two urls:
  1. %BF%C2%D3%D0%C2%D7-%C1%E3.mp3
  2. %E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3
复制代码
I don't know why... Must have something to do with character set/encoding. Somebody explain this to me please.

---------------------------------------------experiment--with--wget--------------------------------------------------

wget "%BF%C2%D3%D0%C2%D7-%C1%E3.mp3" to ext4 partition:
  1. $ wget 'http://down.jsharer.com/user/userAction.do?method=download&urlpath=ftp://-552790109:693110381@58.215.91.170:2022/22487/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3'
  2. --2009-08-25 12:17:41--  http://down.jsharer.com/user/userAction.do?method=download&urlpath=ftp://-552790109:693110381@58.215.91.170:2022/22487/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3
  3. Resolving down.jsharer.com... 222.73.163.168
  4. Connecting to down.jsharer.com|222.73.163.168|:80... connected.
  5. HTTP request sent, awaiting response... 302 Moved Temporarily
  6. Location: ftp://-552790109:693110381@58.215.91.170:2022/22487/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3 [following]
  7. --2009-08-25 12:17:41--  ftp://-552790109:*password*@58.215.91.170:2022/22487/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3
  8.            => `¿ÂÓÐÂ×-Áã.mp3'
  9. Connecting to 58.215.91.170:2022... connected.
  10. Logging in as -552790109 ... Logged in!
  11. ==> SYST ... done.    ==> PWD ... done.
  12. ==> TYPE I ... done.  ==> CWD /22487/200908 ... done.
  13. ==> SIZE \277\302\323\320\302\327-\301\343.mp3 ... 5211995
  14. ==> PASV ... done.    ==> RETR \277\302\323\320\302\327-\301\343.mp3 ... done.
  15. Length: 5211995 (5.0M)
  16. 100%[=====================================================>] 5,211,995   89.1K/s   in 56s     
  17. 2009-08-25 12:18:37 (91.2 KB/s) - `¿ÂÓÐÂ×-Áã.mp3' saved [5211995]
复制代码
wget "%BF%C2%D3%D0%C2%D7-%C1%E3.mp3" to ntfs partition:
  1. $ wget 'http://down.jsharer.com/user/userAction.do?method=download&urlpath=ftp://1368144520:-1398555672@58.215.91.170:2022/41161/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3'
  2. --2009-08-25 12:26:29--  http://down.jsharer.com/user/userAction.do?method=download&urlpath=ftp://1368144520:-1398555672@58.215.91.170:2022/41161/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3
  3. Resolving down.jsharer.com... 222.73.163.168
  4. Connecting to down.jsharer.com|222.73.163.168|:80... connected.
  5. HTTP request sent, awaiting response... 302 Moved Temporarily
  6. Location: ftp://1368144520:-1398555672@58.215.91.170:2022/41161/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3 [following]
  7. --2009-08-25 12:26:29--  ftp://1368144520:*password*@58.215.91.170:2022/41161/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3
  8.            => `¿ÂÓÐÂ×-Áã.mp3'
  9. Connecting to 58.215.91.170:2022... connected.
  10. Logging in as 1368144520 ... Logged in!
  11. ==> SYST ... done.    ==> PWD ... done.
  12. ==> TYPE I ... done.  ==> CWD /41161/200908 ... done.
  13. ==> SIZE \277\302\323\320\302\327-\301\343.mp3 ... 5211995
  14. ==> PASV ... done.    ==> RETR \277\302\323\320\302\327-\301\343.mp3 ... done.
  15. ¿ÂÓÐÂ×-Áã.mp3: Invalid or incomplete multibyte or wide character
复制代码
wget  "%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3" to ext4 partition:
  1. $ wget 'http://mp3.tktt.com/eec38e543bc6c0e4/15/%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3'
  2. --2009-08-25 12:29:59--  http://mp3.tktt.com/eec38e543bc6c0e4/15/%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3
  3. Resolving mp3.tktt.com... 58.215.81.44
  4. Connecting to mp3.tktt.com|58.215.81.44|:80... connected.
  5. HTTP request sent, awaiting response... 200 OK
  6. Length: 126229 (123K) [audio/x-ms-wma]
  7. Saving to: `æ%9F¯æ%9C%89伦-é%9B¶.mp3'
  8. 100%[=====================================================>] 126,229      129K/s   in 1.0s   
  9. 2009-08-25 12:30:03 (129 KB/s) - `æ%9F¯æ%9C%89伦-é%9B¶.mp3' saved [126229/126229]
复制代码
wget "%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3" to ntfs partition:
  1. $ wget 'http://mp3.tktt.com/eec38e543bc6c0e4/15/%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3'
  2. --2009-08-25 12:37:30--  http://mp3.tktt.com/eec38e543bc6c0e4/15/%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3
  3. Resolving mp3.tktt.com... 58.215.81.44
  4. Connecting to mp3.tktt.com|58.215.81.44|:80... connected.
  5. HTTP request sent, awaiting response... 200 OK
  6. Length: 126229 (123K) [audio/x-ms-wma]
  7. æ%9F¯æ%9C%89伦-é%9B¶.mp3: Invalid or incomplete multibyte or wide character
  8. Cannot write to `æ%9F¯æ%9C%89伦-é%9B¶.mp3' (Invalid or incomplete multibyte or wide character).
复制代码
---------------------------------------------let's--try--with--aria2c--------------------------------------------------

aria2c "%BF%C2%D3%D0%C2%D7-%C1%E3.mp3" to ext4 partition:
  1. $ aria2c 'http://down.jsharer.com/user/userAction.do?method=download&urlpath=ftp://1368144520:-1398555672@58.215.91.170:2022/41161/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3'
  2. 2009-08-25 12:47:52.079592 NOTICE - #1 - Download has already completed: /home/canti/Desktop/¿ÂÓÐÂ×-Áã.mp3
  3. 2009-08-25 12:47:52.080014 NOTICE - Download complete: /home/canti/Desktop/¿ÂÓÐÂ×-Áã.mp3
  4. Download Results:
  5. gid|stat|avg speed  |path/URI
  6. ===+====+===========+===========================================================
  7.   1|  OK|        n/a|/home/canti/Desktop/¿ÂÓÐÂ×-Áã.mp3
  8. Status Legend:
  9. (OK):download completed.
复制代码
aria2c "%BF%C2%D3%D0%C2%D7-%C1%E3.mp3" to ntfs partition:
  1. $ aria2c 'http://down.jsharer.com/user/userAction.do?method=download&urlpath=ftp://1368144520:-1398555672@58.215.91.170:2022/41161/200908/%BF%C2%D3%D0%C2%D7-%C1%E3.mp3'
  2. [#1 SIZE:0B/0B CN:1 SPD:0Bs]                                                                  
  3. 2009-08-25 12:46:06.942979 ERROR - Exception caught
  4. Exception: [RequestGroup.cc:528] Download aborted.
  5.   -> [AbstractDiskWriter.cc:115] Failed to open the file /media/20G/¿ÂÓÐÂ×-Áã.mp3, cause: Invalid or incomplete multibyte or wide character
  6. Download Results:
  7. gid|stat|avg speed  |path/URI
  8. ===+====+===========+===========================================================
  9.   1| ERR|        n/a|/media/20G/¿ÂÓÐÂ×-Áã.mp3
  10. Status Legend:
  11. (ERR):error occurred.
  12. aria2 will resume download if the transfer is restarted.
  13. If there are any errors, then see the log file. See '-l' option in help/man page for details.
复制代码
aria2c "%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3" to ext4 partition:
  1. $ aria2c 'http://mp3.tktt.com/eec38e543bc6c0e4/15/%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3'
  2. [#1 SIZE:96.0KiB/123.2KiB(77%) CN:1 SPD:135.1KiBs]                                             
  3. 2009-08-25 12:30:17.753443 NOTICE - Download complete: /home/canti/Desktop/柯有伦-零.mp3
  4. Download Results:
  5. gid|stat|avg speed  |path/URI
  6. ===+====+===========+===========================================================
  7.   1|  OK| 136.0KiB/s|/home/canti/Desktop/柯有伦-零.mp3
  8. Status Legend:
  9. (OK):download completed.
复制代码
aria2c "%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3" to ntfs partition:
  1. $ aria2c 'http://mp3.tktt.com/eec38e543bc6c0e4/15/%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3'
  2. [#1 SIZE:0B/123.2KiB(0%) CN:1 SPD:0Bs]                                                         
  3. 2009-08-25 12:42:14.735797 NOTICE - Download complete: /media/20G/柯有伦-零.mp3
  4. Download Results:
  5. gid|stat|avg speed  |path/URI
  6. ===+====+===========+===========================================================
  7.   1|  OK|  20.0MiB/s|/media/20G/柯有伦-零.mp3
  8. Status Legend:
  9. (OK):download completed.
复制代码
---------------------------------------------------------------------------------------------------------------------------

So I'm asking:

1. Why is '%E6%9F%AF%E6%9C%89%E4%BC%A6-%E9%9B%B6.mp3' interpreted correctly by aria2c but not with wget?
2. Why is '%BF%C2%D3%D0%C2%D7-%C1%E3.mp3' interpreted to '¿ÂÓÐÂ×-Áã.mp3' by aria2c and wget both?
3. Can I tune aria2c/wget to get those characters interpreted right? I have some experience with FTP clients such as filezilla and gftp, they both have a charset option for correctly displaying filenames from servers that has an encoding other than utf-8.
4. Despite wget/aria2c both get the '%BF%C2%D3%D0%C2%D7-%C1%E3.mp3' interpreted wrong, they are able to write a file called '¿ÂÓÐÂ×-Áã.mp3' to EXT4 partitions. But they can't do this to NTFS partitions. (PcManFM can display Chinese/Japnese filenames on NTFS partitions correctly. Just to point it out.) How can I tune ntfs-3g's options to fix this?
5. What I asked in the beginning, why does the same filename has two versions in two URLs?
发表于 2009-8-26 08:53:47 | 显示全部楼层
显然与 locale 有关。这种情况下,要么临时改 locale,要么试试用 firefox 这类支持自动转码的东西。
回复 支持 反对

使用道具 举报

 楼主| 发表于 2009-8-26 11:07:46 | 显示全部楼层
有道理, 能不能详细讲讲? 比如说aria2c为什么可以把wget弄错的名字弄对?

另外, 有没有一个专门用来自动转码的东西? 最好是可以以一句命令表达... 让它在中间翻译一下~
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

快速回复 返回顶部 返回列表