Module pickletools
[hide private]
[frames] | no frames]

Module pickletools

"Executable documentation" for the pickle module.

Extensive comments about the pickle protocols and pickle-machine opcodes
can be found here.  Some functions meant for external use:

genops(pickle)
   Generate all the opcodes in a pickle, as (opcode, arg, position) triples.

dis(pickle, out=None, memo=None, indentlevel=4)
   Print a symbolic disassembly of a pickle.

Classes [hide private]
ArgumentDescriptor
StackObject
OpcodeInfo
_Example
Functions [hide private]
 
read_uint1(f)
 
read_uint2(f)
 
read_int4(f)
 
read_stringnl(f, decode=True, stripquotes=True)
Embedded escapes are undone in the result.
 
read_stringnl_noescape(f)
 
read_stringnl_noescape_pair(f)
 
read_string4(f)
 
read_string1(f)
 
read_unicodestringnl(f)
 
read_unicodestring4(f)
 
read_decimalnl_short(f)
 
read_decimalnl_long(f)
Someday the trailing 'L' will probably go away from this output.
 
read_floatnl(f)
 
read_float8(f)
 
read_long1(f)
 
read_long4(f)
 
genops(pickle)
Generate all the opcodes in a pickle.
 
dis(pickle, out=None, memo=None, indentlevel=4)
Produce a symbolic disassembly of a pickle.
 
_test()
Variables [hide private]
  UP_TO_NEWLINE = -1
  TAKEN_FROM_ARGUMENT1 = -2
  TAKEN_FROM_ARGUMENT4 = -3
  uint1 = ArgumentDescriptor(name= 'uint1', n= 1, reader= read_u...
  uint2 = ArgumentDescriptor(name= 'uint2', n= 2, reader= read_u...
  int4 = ArgumentDescriptor(name= 'int4', n= 4, reader= read_int...
  stringnl = ArgumentDescriptor(name= 'stringnl', n= UP_TO_NEWLI...
  stringnl_noescape = ArgumentDescriptor(name= 'stringnl_noescap...
  stringnl_noescape_pair = ArgumentDescriptor(name= 'stringnl_no...
  string4 = ArgumentDescriptor(name= "string4", n= TAKEN_FROM_AR...
  string1 = ArgumentDescriptor(name= "string1", n= TAKEN_FROM_AR...
  unicodestringnl = ArgumentDescriptor(name= 'unicodestringnl', ...
  unicodestring4 = ArgumentDescriptor(name= "unicodestring4", n=...
  decimalnl_short = ArgumentDescriptor(name= 'decimalnl_short', ...
  decimalnl_long = ArgumentDescriptor(name= 'decimalnl_long', n=...
  floatnl = ArgumentDescriptor(name= 'floatnl', n= UP_TO_NEWLINE...
  float8 = ArgumentDescriptor(name= 'float8', n= 8, reader= read...
  long1 = ArgumentDescriptor(name= "long1", n= TAKEN_FROM_ARGUME...
  long4 = ArgumentDescriptor(name= "long4", n= TAKEN_FROM_ARGUME...
  pyint = int
  pylong = long
  pyinteger_or_bool = int_or_bool
  pybool = bool
  pyfloat = float
  pystring = str
  pyunicode = unicode
  pynone = None
  pytuple = tuple
  pylist = list
  pydict = dict
  anyobject = any
  markobject = mark
  stackslice = stackslice
  opcodes = [I(name= 'INT', code= 'I', arg= decimalnl_short, sta...
  code2op = {}
  _dis_test = '\n>>> import pickle\n>>> x = [1, 2, (3, 4), {\'ab...
  _memo_test = '\n>>> import pickle\n>>> from StringIO import St...
  __test__ = {'disassembler_memo_test': '\n>>> import pickle\n>>...

Imports: _unpack, decode_long


Function Details [hide private]

read_stringnl(f, decode=True, stripquotes=True)

 
>>> import StringIO
>>> read_stringnl(StringIO.StringIO("'abcd'\nefg\n"))
'abcd'
>>> read_stringnl(StringIO.StringIO("\n"))
Traceback (most recent call last):
...
ValueError: no string quotes around ''
>>> read_stringnl(StringIO.StringIO("\n"), stripquotes=False)
''
>>> read_stringnl(StringIO.StringIO("''\n"))
''
>>> read_stringnl(StringIO.StringIO('"abcd"'))
Traceback (most recent call last):
...
ValueError: no newline found when trying to read stringnl

Embedded escapes are undone in the result. >>> read_stringnl(StringIO.StringIO(r"'a\n\\b\x00c\td'" + "\n'e'")) 'a\n\\b\x00c\td'

read_decimalnl_long(f)

 
>>> import StringIO
>>> read_decimalnl_long(StringIO.StringIO("1234\n56"))
Traceback (most recent call last):
...
ValueError: trailing 'L' required in '1234'

Someday the trailing 'L' will probably go away from this output.

>>> read_decimalnl_long(StringIO.StringIO("1234L\n56"))
1234L
>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\n6"))
123456789012345678901234L

genops(pickle)

 
Generate all the opcodes in a pickle.

'pickle' is a file-like object, or string, containing the pickle.

Each opcode in the pickle is generated, from the current pickle position,
stopping after a STOP opcode is delivered.  A triple is generated for
each opcode:

    opcode, arg, pos

opcode is an OpcodeInfo record, describing the current opcode.

If the opcode has an argument embedded in the pickle, arg is its decoded
value, as a Python object.  If the opcode doesn't have an argument, arg
is None.

If the pickle has a tell() method, pos was the value of pickle.tell()
before reading the current opcode.  If the pickle is a string object,
it's wrapped in a StringIO object, and the latter's tell() result is
used.  Else (the pickle doesn't have a tell(), and it's not obvious how
to query its current position) pos is None.

dis(pickle, out=None, memo=None, indentlevel=4)

 
Produce a symbolic disassembly of a pickle.

'pickle' is a file-like object, or string, containing a (at least one)
pickle.  The pickle is disassembled from the current position, through
the first STOP opcode encountered.

Optional arg 'out' is a file-like object to which the disassembly is
printed.  It defaults to sys.stdout.

Optional arg 'memo' is a Python dict, used as the pickle's memo.  It
may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
Passing the same memo object to another dis() call then allows disassembly
to proceed across multiple pickles that were all created by the same
pickler with the same memo.  Ordinarily you don't need to worry about this.

Optional arg indentlevel is the number of blanks by which to indent
a new MARK level.  It defaults to 4.

In addition to printing the disassembly, some sanity checks are made:

+ All embedded opcode arguments "make sense".

+ Explicit and implicit pop operations have enough items on the stack.

+ When an opcode implicitly refers to a markobject, a markobject is
  actually on the stack.

+ A memo entry isn't referenced before it's defined.

+ The markobject isn't stored in the memo.

+ A memo entry isn't redefined.


Variables Details [hide private]

uint1

Value:
ArgumentDescriptor(name= 'uint1', n= 1, reader= read_uint1, doc= "One-\
byte unsigned integer.")

uint2

Value:
ArgumentDescriptor(name= 'uint2', n= 2, reader= read_uint2, doc= "Two-\
byte unsigned integer, little-endian.")

int4

Value:
ArgumentDescriptor(name= 'int4', n= 4, reader= read_int4, doc= "Four-b\
yte signed integer, little-endian, 2's complement.")

stringnl

Value:
ArgumentDescriptor(name= 'stringnl', n= UP_TO_NEWLINE, reader= read_st\
ringnl, doc= """A newline-terminated string.

                   This is a repr-style string, with embedded escapes,\
 and
                   bracketing quotes.
                   """)

stringnl_noescape

Value:
ArgumentDescriptor(name= 'stringnl_noescape', n= UP_TO_NEWLINE, reader\
= read_stringnl_noescape, doc= """A newline-terminated string.

                        This is a str-style string, without embedded e\
scapes,
                        or bracketing quotes.  It should consist solel\
y of
                        printable ASCII characters.
...

stringnl_noescape_pair

Value:
ArgumentDescriptor(name= 'stringnl_noescape_pair', n= UP_TO_NEWLINE, r\
eader= read_stringnl_noescape_pair, doc= """A pair of newline-terminat\
ed strings.

                             These are str-style strings, without embe\
dded
                             escapes, or bracketing quotes.  They shou\
ld
...

string4

Value:
ArgumentDescriptor(name= "string4", n= TAKEN_FROM_ARGUMENT4, reader= r\
ead_string4, doc= """A counted string.

              The first argument is a 4-byte little-endian signed int \
giving
              the number of bytes in the string, and the second argume\
nt is
              that many bytes.
...

string1

Value:
ArgumentDescriptor(name= "string1", n= TAKEN_FROM_ARGUMENT1, reader= r\
ead_string1, doc= """A counted string.

              The first argument is a 1-byte unsigned int giving the n\
umber
              of bytes in the string, and the second argument is that \
many
              bytes.
...

unicodestringnl

Value:
ArgumentDescriptor(name= 'unicodestringnl', n= UP_TO_NEWLINE, reader= \
read_unicodestringnl, doc= """A newline-terminated Unicode string.

                      This is raw-unicode-escape encoded, so consists \
of
                      printable ASCII characters, and may contain embe\
dded
                      escape sequences.
...

unicodestring4

Value:
ArgumentDescriptor(name= "unicodestring4", n= TAKEN_FROM_ARGUMENT4, re\
ader= read_unicodestring4, doc= """A counted Unicode string.

                    The first argument is a 4-byte little-endian signe\
d int
                    giving the number of bytes in the string, and the \
second
                    argument-- the UTF-8 encoding of the Unicode strin\
...

decimalnl_short

Value:
ArgumentDescriptor(name= 'decimalnl_short', n= UP_TO_NEWLINE, reader= \
read_decimalnl_short, doc= """A newline-terminated decimal integer lit\
eral.

                          This never has a trailing 'L', and the integ\
er fit
                          in a short Python int on the box where the p\
ickle
...

decimalnl_long

Value:
ArgumentDescriptor(name= 'decimalnl_long', n= UP_TO_NEWLINE, reader= r\
ead_decimalnl_long, doc= """A newline-terminated decimal integer liter\
al.

                         This has a trailing 'L', and can represent in\
tegers
                         of any size.
                         """)

floatnl

Value:
ArgumentDescriptor(name= 'floatnl', n= UP_TO_NEWLINE, reader= read_flo\
atnl, doc= """A newline-terminated decimal floating literal.

              In general this requires 17 significant digits for round\
trip
              identity, and pickling then unpickling infinities, NaNs,\
 and
              minus zero doesn't work across boxes, or on some boxes e\
...

float8

Value:
ArgumentDescriptor(name= 'float8', n= 8, reader= read_float8, doc= """\
An 8-byte binary representation of a float, big-endian.

             The format is unique to Python, and shared with the struc\
t
             module (format string '>d') "in theory" (the struct and c\
Pickle
             implementations don't share the code -- they should).  It\
...

long1

Value:
ArgumentDescriptor(name= "long1", n= TAKEN_FROM_ARGUMENT1, reader= rea\
d_long1, doc= """A binary long, little-endian, using 1-byte size.

    This first reads one byte as an unsigned size, then reads that
    many bytes and interprets them as a little-endian 2's-complement l\
ong.
    If the size is 0, that's taken as a shortcut for the long 0L.
    """)

long4

Value:
ArgumentDescriptor(name= "long4", n= TAKEN_FROM_ARGUMENT4, reader= rea\
d_long4, doc= """A binary representation of a long, little-endian.

    This first reads four bytes as a signed size (but requires the
    size to be >= 0), then reads that many bytes and interprets them
    as a little-endian 2's-complement long.  If the size is 0, that's \
taken
    as a shortcut for the long 0L, although LONG1 should really be use\
...

opcodes

Value:
[I(name= 'INT', code= 'I', arg= decimalnl_short, stack_before= [], sta\
ck_after= [pyinteger_or_bool], proto= 0, doc= """Push an integer or bo\
ol.

      The argument is a newline-terminated decimal literal string.

      The intent may have been that this always fit in a short Python \
int,
...

_dis_test

Value:
'''
>>> import pickle
>>> x = [1, 2, (3, 4), {\'abc\': u"def"}]
>>> pkl = pickle.dumps(x, 0)
>>> dis(pkl)
    0: (    MARK
    1: l        LIST       (MARK at 0)
    2: p    PUT        0
...

_memo_test

Value:
'''
>>> import pickle
>>> from StringIO import StringIO
>>> f = StringIO()
>>> p = pickle.Pickler(f, 2)
>>> x = [1, 2, 3]
>>> p.dump(x)
>>> p.dump(x)
...

__test__

Value:
{'disassembler_memo_test': '''
>>> import pickle
>>> from StringIO import StringIO
>>> f = StringIO()
>>> p = pickle.Pickler(f, 2)
>>> x = [1, 2, 3]
>>> p.dump(x)
>>> p.dump(x)
...