Package encodings
Standard "encodings" Package
Standard Python encoding modules are stored in this package
directory.
Codec modules must have names corresponding to normalized encoding
names as defined in the normalize_encoding() function below, e.g.
'utf-8' must be implemented by the module 'utf_8.py'.
Each codec module must export the following interface:
* getregentry() -> codecs.CodecInfo object
The getregentry() API must a CodecInfo object with encoder, decoder,
incrementalencoder, incrementaldecoder, streamwriter and streamreader
atttributes which adhere to the Python Codec Interface Standard.
In addition, a module may optionally also define the following
APIs which are then used by the package's codec search function:
* getaliases() -> sequence of encoding name strings to use as aliases
Alias names returned by getaliases() must be normalized encoding
names as defined by normalize_encoding().
Written by Marc-Andre Lemburg (mal@lemburg.com).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
|
|
|
search_function(encoding) |
|
|
|
_cache = {}
|
|
_unknown = ' --unknown-- '
|
|
_import_tail = [ ' * ' ]
|
|
_norm_encoding_map = ' ...
|
|
_aliases = { ' 037 ' : ' cp037 ' , ' 1026 ' : ' cp1026 ' , ' 1140 ' : ' cp1140 ' ...
|
Imports:
codecs,
types,
aliases,
ascii,
base64_codec,
big5,
big5hkscs,
bz2_codec,
charmap,
cp037,
cp1006,
cp1026,
cp1140,
cp1250,
cp1251,
cp1252,
cp1253,
cp1254,
cp1255,
cp1256,
cp1257,
cp1258,
cp424,
cp437,
cp500,
cp737,
cp775,
cp850,
cp852,
cp855,
cp856,
cp857,
cp860,
cp861,
cp862,
cp863,
cp864,
cp865,
cp866,
cp869,
cp874,
cp875,
cp932,
cp949,
cp950,
euc_jis_2004,
euc_jisx0213,
euc_jp,
euc_kr,
gb18030,
gb2312,
gbk,
hex_codec,
hp_roman8,
hz,
idna,
iso2022_jp,
iso2022_jp_1,
iso2022_jp_2,
iso2022_jp_2004,
iso2022_jp_3,
iso2022_jp_ext,
iso2022_kr,
iso8859_1,
iso8859_10,
iso8859_11,
iso8859_13,
iso8859_14,
iso8859_15,
iso8859_16,
iso8859_2,
iso8859_3,
iso8859_4,
iso8859_5,
iso8859_6,
iso8859_7,
iso8859_8,
iso8859_9,
johab,
koi8_r,
koi8_u,
latin_1,
mac_arabic,
mac_centeuro,
mac_croatian,
mac_cyrillic,
mac_farsi,
mac_greek,
mac_iceland,
mac_latin2,
mac_roman,
mac_romanian,
mac_turkish,
palmos,
ptcp154,
punycode,
quopri_codec,
raw_unicode_escape,
rot_13,
shift_jis,
shift_jis_2004,
shift_jisx0213,
string_escape,
tis_620,
undefined,
unicode_escape,
unicode_internal,
utf_16,
utf_16_be,
utf_16_le,
utf_7,
utf_8,
utf_8_sig,
uu_codec,
zlib_codec
normalize_encoding(encoding)
|
|
Normalize an encoding name.
Normalization works as follows: all non-alphanumeric characters except
the dot used for Python package names are collapsed and replaced with a
single underscore, e.g. ' -;#' becomes '_'. Leading and trailing
underscores are removed.
Note that encoding names should be ASCII only; if they do use
non-ASCII characters, these must be Latin-1 compatible.
|
_norm_encoding_map
- Value:
' . 0123456789 ABCD
EFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz
'
|
|
_aliases
- Value:
{ ' 037 ' : ' cp037 ' ,
' 1026 ' : ' cp1026 ' ,
' 1140 ' : ' cp1140 ' ,
' 1250 ' : ' cp1250 ' ,
' 1251 ' : ' cp1251 ' ,
' 1252 ' : ' cp1252 ' ,
' 1253 ' : ' cp1253 ' ,
' 1254 ' : ' cp1254 ' ,
...
|
|