Package encodings
[hide private]
[frames] | no frames]

Package encodings

Standard "encodings" Package

    Standard Python encoding modules are stored in this package
    directory.

    Codec modules must have names corresponding to normalized encoding
    names as defined in the normalize_encoding() function below, e.g.
    'utf-8' must be implemented by the module 'utf_8.py'.

    Each codec module must export the following interface:

    * getregentry() -> codecs.CodecInfo object
    The getregentry() API must a CodecInfo object with encoder, decoder,
    incrementalencoder, incrementaldecoder, streamwriter and streamreader
    atttributes which adhere to the Python Codec Interface Standard.

    In addition, a module may optionally also define the following
    APIs which are then used by the package's codec search function:

    * getaliases() -> sequence of encoding name strings to use as aliases

    Alias names returned by getaliases() must be normalized encoding
    names as defined by normalize_encoding().

Written by Marc-Andre Lemburg (mal@lemburg.com).

(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.

Submodules [hide private]

Classes [hide private]
CodecRegistryError
Functions [hide private]
 
normalize_encoding(encoding)
Normalize an encoding name.
 
search_function(encoding)
Variables [hide private]
  _cache = {}
  _unknown = '--unknown--'
  _import_tail = ['*']
  _norm_encoding_map = ' ...
  _aliases = {'037': 'cp037', '1026': 'cp1026', '1140': 'cp1140'...

Imports: codecs, types, aliases, ascii, base64_codec, big5, big5hkscs, bz2_codec, charmap, cp037, cp1006, cp1026, cp1140, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, cp424, cp437, cp500, cp737, cp775, cp850, cp852, cp855, cp856, cp857, cp860, cp861, cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp875, cp932, cp949, cp950, euc_jis_2004, euc_jisx0213, euc_jp, euc_kr, gb18030, gb2312, gbk, hex_codec, hp_roman8, hz, idna, iso2022_jp, iso2022_jp_1, iso2022_jp_2, iso2022_jp_2004, iso2022_jp_3, iso2022_jp_ext, iso2022_kr, iso8859_1, iso8859_10, iso8859_11, iso8859_13, iso8859_14, iso8859_15, iso8859_16, iso8859_2, iso8859_3, iso8859_4, iso8859_5, iso8859_6, iso8859_7, iso8859_8, iso8859_9, johab, koi8_r, koi8_u, latin_1, mac_arabic, mac_centeuro, mac_croatian, mac_cyrillic, mac_farsi, mac_greek, mac_iceland, mac_latin2, mac_roman, mac_romanian, mac_turkish, palmos, ptcp154, punycode, quopri_codec, raw_unicode_escape, rot_13, shift_jis, shift_jis_2004, shift_jisx0213, string_escape, tis_620, undefined, unicode_escape, unicode_internal, utf_16, utf_16_be, utf_16_le, utf_7, utf_8, utf_8_sig, uu_codec, zlib_codec


Function Details [hide private]

normalize_encoding(encoding)

 

Normalize an encoding name.

Normalization works as follows: all non-alphanumeric characters except the dot used for Python package names are collapsed and replaced with a single underscore, e.g. ' -;#' becomes '_'. Leading and trailing underscores are removed.

Note that encoding names should be ASCII only; if they do use non-ASCII characters, these must be Latin-1 compatible.


Variables Details [hide private]

_norm_encoding_map

Value:
'                                              . 0123456789       ABCD\
EFGHIJKLMNOPQRSTUVWXYZ      abcdefghijklmnopqrstuvwxyz                \
                                                                      \
                                               '

_aliases

Value:
{'037': 'cp037',
 '1026': 'cp1026',
 '1140': 'cp1140',
 '1250': 'cp1250',
 '1251': 'cp1251',
 '1252': 'cp1252',
 '1253': 'cp1253',
 '1254': 'cp1254',
...