Package email :: Module charset
[hide private]
[frames] | no frames]

Module charset

Classes [hide private]
Charset
Map character sets to their email properties.
Functions [hide private]
 
add_charset(charset, header_enc=None, body_enc=None, output_charset=None)
Add character set properties to the global registry.
 
add_alias(alias, canonical)
Add a character set alias.
 
add_codec(charset, codecname)
Add a codec that map characters in the given charset to/from Unicode.
Variables [hide private]
  QP = 1
  BASE64 = 2
  SHORTEST = 3
  MISC_LEN = 7
  DEFAULT_CHARSET = 'us-ascii'
  CHARSETS = {'8bit': (None, 2, 'utf-8'), 'big5': (2, 2, None), ...
  ALIASES = {'ascii': 'us-ascii', 'cp949': 'ks_c_5601-1987', 'eu...
  CODEC_MAP = {'big5': 'big5_tw', 'gb2312': 'eucgb2312_cn', 'us-...

Imports: email, errors, encode_7or8bit


Function Details [hide private]

add_charset(charset, header_enc=None, body_enc=None, output_charset=None)

 

Add character set properties to the global registry.

charset is the input character set, and must be the canonical name of a character set.

Optional header_enc and body_enc is either Charset.QP for quoted-printable, Charset.BASE64 for base64 encoding, Charset.SHORTEST for the shortest of qp or base64 encoding, or None for no encoding. SHORTEST is only valid for header_enc. It describes how message headers and message bodies in the input charset are to be encoded. Default is no encoding.

Optional output_charset is the character set that the output should be in. Conversions will proceed from input charset, to Unicode, to the output charset when the method Charset.convert() is called. The default is to output in the same character set as the input.

Both input_charset and output_charset must have Unicode codec entries in the module's charset-to-codec mapping; use add_codec(charset, codecname) to add codecs the module does not know about. See the codecs module's documentation for more information.

add_alias(alias, canonical)

 

Add a character set alias.

alias is the alias name, e.g. latin-1 canonical is the character set's canonical name, e.g. iso-8859-1

add_codec(charset, codecname)

 

Add a codec that map characters in the given charset to/from Unicode.

charset is the canonical name of a character set. codecname is the name of a Python codec, as appropriate for the second argument to the unicode() built-in, or to the encode() method of a Unicode string.


Variables Details [hide private]

CHARSETS

Value:
{'8bit': (None, 2, 'utf-8'),
 'big5': (2, 2, None),
 'euc-jp': (2, None, 'iso-2022-jp'),
 'gb2312': (2, 2, None),
 'iso-2022-jp': (2, None, None),
 'iso-8859-1': (1, 1, None),
 'iso-8859-10': (1, 1, None),
 'iso-8859-13': (1, 1, None),
...

ALIASES

Value:
{'ascii': 'us-ascii',
 'cp949': 'ks_c_5601-1987',
 'euc_jp': 'euc-jp',
 'euc_kr': 'euc-kr',
 'latin-1': 'iso-8859-1',
 'latin-2': 'iso-8859-2',
 'latin-3': 'iso-8859-3',
 'latin-4': 'iso-8859-4',
...

CODEC_MAP

Value:
{'big5': 'big5_tw', 'gb2312': 'eucgb2312_cn', 'us-ascii': None}