Hello, readers of my blog
(what, there are no more readers? then I'll write it for myself)
From time to time, you need to extract some data from a "legacy windows file format" (In my case, it was a JET database). Often, the data is encoded using the (windows-specific) cp1256 encoding (or something similar), but the tool you use assumes cp1252 (or latin1, they are mostly compatible, unlike say cp1256 and iso-8859-6) and you end up with a text containing "arcane" latin characters like:
While it should read
Previously, I thought that after such (incorrect) conversions, it would be mostly damaged beyond repair But yesterday, as I met this problem again, I thought I could try to get something out of it (even if it is a bit broken), so I fired up ipython and after a few trials I found this:
and got the text in a readable form as you can see above. I hope someone finds this useful (by replacing cp1256 by some other encoding).
(what, there are no more readers? then I'll write it for myself)
From time to time, you need to extract some data from a "legacy windows file format" (In my case, it was a JET database). Often, the data is encoded using the (windows-specific) cp1256 encoding (or something similar), but the tool you use assumes cp1252 (or latin1, they are mostly compatible, unlike say cp1256 and iso-8859-6) and you end up with a text containing "arcane" latin characters like:
 ÇáÃáÝ ÊÃúáíÝåÇ ãä åãÒÉ æáÇã æÝÇÁ
While it should read
آ الألف تأْليفها من همزة ولام وفاء
Previously, I thought that after such (incorrect) conversions, it would be mostly damaged beyond repair But yesterday, as I met this problem again, I thought I could try to get something out of it (even if it is a bit broken), so I fired up ipython and after a few trials I found this:
f = file('file.in').read()
print >> open('file.out', "w"), f.decode('utf8').encode('latin1').decode('cp1256').encode('utf8')
and got the text in a readable form as you can see above. I hope someone finds this useful (by replacing cp1256 by some other encoding).
(Page 1 of 1, totaling 1 entries)
Layout by Ricky Wilson | Serendipity Template by Carl Galloway | Login
Calendar
| « | August '11 | » | ||||
| Mo | Tu | We | Th | Fr | Sa | Su |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 | 31 | ||||

