Module codecs
[hide private]
[frames] | no frames]

Module codecs

codecs -- Python Codec Registry, API and helpers.

Written by Marc-Andre Lemburg (mal@lemburg.com).

(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.

Classes [hide private]
CodecInfo
Codec
Defines the interface for stateless encoders/decoders.
IncrementalEncoder
An IncrementalEncoder encodes an input in multiple steps.
BufferedIncrementalEncoder
This subclass of IncrementalEncoder can be used as the baseclass for an incremental encoder if the encoder must keep some of the output in a buffer between calls to encode().
IncrementalDecoder
An IncrementalDecoder decodes an input in multiple steps.
BufferedIncrementalDecoder
This subclass of IncrementalDecoder can be used as the baseclass for an incremental decoder if the decoder must be able to handle incomplete byte sequences.
StreamWriter
StreamReader
StreamReaderWriter
StreamReaderWriter instances allow wrapping streams which work in both read and write modes.
StreamRecoder
StreamRecoder instances provide a frontend - backend view of encoding data.
Functions [hide private]
 
open(filename, mode='rb', encoding=None, errors='strict', buffering=1)
Open an encoded file using the given mode and return a wrapped version providing transparent encoding/decoding.
 
EncodedFile(file, data_encoding, file_encoding=None, errors='strict')
Return a wrapped version of file which provides transparent encoding translation.
 
getencoder(encoding)
Lookup up the codec for the given encoding and return its encoder function.
 
getdecoder(encoding)
Lookup up the codec for the given encoding and return its decoder function.
 
getincrementalencoder(encoding)
Lookup up the codec for the given encoding and return its IncrementalEncoder class or factory function.
 
getincrementaldecoder(encoding)
Lookup up the codec for the given encoding and return its IncrementalDecoder class or factory function.
 
getreader(encoding)
Lookup up the codec for the given encoding and return its StreamReader class or factory function.
 
getwriter(encoding)
Lookup up the codec for the given encoding and return its StreamWriter class or factory function.
 
iterencode(iterator, encoding, errors='strict', **kwargs)
Encoding iterator.
 
iterdecode(iterator, encoding, errors='strict', **kwargs)
Decoding iterator.
dict
make_identity_dict(rng)
Return a dictionary where elements of the rng sequence are mapped to themselves.
 
make_encoding_map(decoding_map)
Creates an encoding map from a decoding map.
 
strict_errors(...)
 
ignore_errors(...)
 
replace_errors(...)
 
xmlcharrefreplace_errors(...)
 
backslashreplace_errors(...)
(encoder, decoder, stream_reader, stream_writer)
lookup(encoding)
Looks up a codec tuple in the Python codec registry and returns a tuple of functions.
handler
lookup_error(errors)
Return the error handler for the specified error handling name or raise a LookupError, if no handler exists under this name.
 
register(search_function)
Register a codec search function.
 
register_error(errors, handler)
Register the specified error handler under the name errors.
Variables [hide private]
  BOM_UTF8 = '\xef\xbb\xbf'
  BOM_UTF16_LE = '\xff\xfe'
  BOM_LE = '\xff\xfe'
  BOM_UTF16_BE = '\xfe\xff'
  BOM_BE = '\xfe\xff'
  BOM_UTF32_LE = '\xff\xfe\x00\x00'
  BOM_UTF32_BE = '\x00\x00\xfe\xff'
  BOM_UTF16 = '\xff\xfe'
  BOM = '\xff\xfe'
  BOM_UTF32 = '\xff\xfe\x00\x00'
  BOM32_LE = '\xff\xfe'
  BOM32_BE = '\xfe\xff'
  BOM64_LE = '\xff\xfe\x00\x00'
  BOM64_BE = '\x00\x00\xfe\xff'
  _false = 0

Imports: __builtin__, sys, encodings, ascii_decode, ascii_encode, charbuffer_encode, charmap_build, charmap_decode, charmap_encode, decode, encode, escape_decode, escape_encode, latin_1_decode, latin_1_encode, raw_unicode_escape_decode, raw_unicode_escape_encode, readbuffer_encode, unicode_escape_decode, unicode_escape_encode, unicode_internal_decode, unicode_internal_encode, utf_16_be_decode, utf_16_be_encode, utf_16_decode, utf_16_encode, utf_16_ex_decode, utf_16_le_decode, utf_16_le_encode, utf_7_decode, utf_7_encode, utf_8_decode, utf_8_encode


Function Details [hide private]

open(filename, mode='rb', encoding=None, errors='strict', buffering=1)

 

Open an encoded file using the given mode and return a wrapped version providing transparent encoding/decoding.

Note: The wrapped version will only accept the object format defined by the codecs, i.e. Unicode objects for most builtin codecs. Output is also codec dependent and will usually be Unicode as well.

Files are always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using 8-bit values. The default file mode is 'rb' meaning to open the file in binary read mode.

encoding specifies the encoding which is to be used for the file.

errors may be given to define the error handling. It defaults to 'strict' which causes ValueErrors to be raised in case an encoding error occurs.

buffering has the same meaning as for the builtin open() API. It defaults to line buffered.

The returned wrapped file object provides an extra attribute .encoding which allows querying the used encoding. This attribute is only available if an encoding was specified as parameter.

EncodedFile(file, data_encoding, file_encoding=None, errors='strict')

 

Return a wrapped version of file which provides transparent encoding translation.

Strings written to the wrapped file are interpreted according to the given data_encoding and then written to the original file as string using file_encoding. The intermediate encoding will usually be Unicode but depends on the specified codecs.

Strings are read from the file using file_encoding and then passed back to the caller as string using data_encoding.

If file_encoding is not given, it defaults to data_encoding.

errors may be given to define the error handling. It defaults to 'strict' which causes ValueErrors to be raised in case an encoding error occurs.

The returned wrapped file object provides two extra attributes .data_encoding and .file_encoding which reflect the given parameters of the same name. The attributes can be used for introspection by Python programs.

getencoder(encoding)

 

Lookup up the codec for the given encoding and return its encoder function.

Raises a LookupError in case the encoding cannot be found.

getdecoder(encoding)

 

Lookup up the codec for the given encoding and return its decoder function.

Raises a LookupError in case the encoding cannot be found.

getincrementalencoder(encoding)

 

Lookup up the codec for the given encoding and return its IncrementalEncoder class or factory function.

Raises a LookupError in case the encoding cannot be found or the codecs doesn't provide an incremental encoder.

getincrementaldecoder(encoding)

 

Lookup up the codec for the given encoding and return its IncrementalDecoder class or factory function.

Raises a LookupError in case the encoding cannot be found or the codecs doesn't provide an incremental decoder.

getreader(encoding)

 

Lookup up the codec for the given encoding and return its StreamReader class or factory function.

Raises a LookupError in case the encoding cannot be found.

getwriter(encoding)

 

Lookup up the codec for the given encoding and return its StreamWriter class or factory function.

Raises a LookupError in case the encoding cannot be found.

iterencode(iterator, encoding, errors='strict', **kwargs)

 

Encoding iterator.

Encodes the input strings from the iterator using a IncrementalEncoder.

errors and kwargs are passed through to the IncrementalEncoder constructor.

iterdecode(iterator, encoding, errors='strict', **kwargs)

 

Decoding iterator.

Decodes the input strings from the iterator using a IncrementalDecoder.

errors and kwargs are passed through to the IncrementalDecoder constructor.

make_encoding_map(decoding_map)

 

Creates an encoding map from a decoding map.

If a target mapping in the decoding map occurs multiple times, then that target is mapped to None (undefined mapping), causing an exception when encountered by the charmap codec during translation.

One example where this happens is cp875.py which decodes multiple character to \u001a.

register(search_function)

 

Register a codec search function. Search functions are expected to take one argument, the encoding name in all lower case letters, and return a tuple of functions (encoder, decoder, stream_reader, stream_writer).

register_error(errors, handler)

 

Register the specified error handler under the name errors. handler must be a callable object, that will be called with an exception instance containing information about the location of the encoding/decoding error and must return a (replacement, new position) tuple.