Module difflib :: Class Differ
[hide private]
[frames] | no frames]

_ClassType Differ


Differ is a class for comparing sequences of lines of text, and
producing human-readable differences or deltas.  Differ uses
SequenceMatcher both to compare sequences of lines, and to compare
sequences of characters within similar (near-matching) lines.

Each line of a Differ delta begins with a two-letter code:

    '- '    line unique to sequence 1
    '+ '    line unique to sequence 2
    '  '    line common to both sequences
    '? '    line not present in either input sequence

Lines beginning with '? ' attempt to guide the eye to intraline
differences, and were not present in either input sequence.  These lines
can be confusing if the sequences contain tab characters.

Note that Differ makes no claim to produce a *minimal* diff.  To the
contrary, minimal diffs are often counter-intuitive, because they synch
up anywhere possible, sometimes accidental matches 100 pages apart.
Restricting synch points to contiguous matches preserves some notion of
locality, at the occasional cost of producing a longer diff.

Example: Comparing two texts.

First we set up the texts, sequences of individual single-line strings
ending with newlines (such sequences can also be obtained from the
`readlines()` method of file-like objects):

>>> text1 = '''  1. Beautiful is better than ugly.
...   2. Explicit is better than implicit.
...   3. Simple is better than complex.
...   4. Complex is better than complicated.
... '''.splitlines(1)
>>> len(text1)
4
>>> text1[0][-1]
'\n'
>>> text2 = '''  1. Beautiful is better than ugly.
...   3.   Simple is better than complex.
...   4. Complicated is better than complex.
...   5. Flat is better than nested.
... '''.splitlines(1)

Next we instantiate a Differ object:

>>> d = Differ()

Note that when instantiating a Differ object we may pass functions to
filter out line and character 'junk'.  See Differ.__init__ for details.

Finally, we compare the two:

>>> result = list(d.compare(text1, text2))

'result' is a list of strings, so let's pretty-print it:

>>> from pprint import pprint as _pprint
>>> _pprint(result)
['    1. Beautiful is better than ugly.\n',
 '-   2. Explicit is better than implicit.\n',
 '-   3. Simple is better than complex.\n',
 '+   3.   Simple is better than complex.\n',
 '?     ++\n',
 '-   4. Complex is better than complicated.\n',
 '?            ^                     ---- ^\n',
 '+   4. Complicated is better than complex.\n',
 '?           ++++ ^                      ^\n',
 '+   5. Flat is better than nested.\n']

As a single multi-line string it looks like this:

>>> print ''.join(result),
    1. Beautiful is better than ugly.
-   2. Explicit is better than implicit.
-   3. Simple is better than complex.
+   3.   Simple is better than complex.
?     ++
-   4. Complex is better than complicated.
?            ^                     ---- ^
+   4. Complicated is better than complex.
?           ++++ ^                      ^
+   5. Flat is better than nested.

Methods:

__init__(linejunk=None, charjunk=None)
    Construct a text differencer, with optional filters.

compare(a, b)
    Compare two sequences of lines; generate the resulting delta.

Instance Methods [hide private]
 
__init__(self, linejunk=None, charjunk=None)
Construct a text differencer, with optional filters.
 
compare(self, a, b)
Compare two sequences of lines; generate the resulting delta.
 
_dump(self, tag, x, lo, hi)
Generate comparison results for a same-tagged range.
 
_plain_replace(self, a, alo, ahi, b, blo, bhi)
 
_fancy_replace(self, a, alo, ahi, b, blo, bhi)
When replacing one block of lines with another, search the blocks for *similar* lines; the best-matching pair (if any) is used as a synch point, and intraline difference marking is done on the similar pair.
 
_fancy_helper(self, a, alo, ahi, b, blo, bhi)
 
_qformat(self, aline, bline, atags, btags)
Format "?" output and deal with leading tabs.
Method Details [hide private]

__init__(self, linejunk=None, charjunk=None)
(Constructor)

 

Construct a text differencer, with optional filters.

The two optional keyword parameters are for filter functions:

- `linejunk`: A function that should accept a single string argument,
  and return true iff the string is junk. The module-level function
  `IS_LINE_JUNK` may be used to filter out lines without visible
  characters, except for at most one splat ('#').  It is recommended
  to leave linejunk None; as of Python 2.3, the underlying
  SequenceMatcher class has grown an adaptive notion of "noise" lines
  that's better than any static definition the author has ever been
  able to craft.

- `charjunk`: A function that should accept a string of length 1. The
  module-level function `IS_CHARACTER_JUNK` may be used to filter out
  whitespace characters (a blank or tab; **note**: bad idea to include
  newline in this!).  Use of IS_CHARACTER_JUNK is recommended.

compare(self, a, b)

 

Compare two sequences of lines; generate the resulting delta.

Each sequence must contain individual single-line strings ending with newlines. Such sequences can be obtained from the `readlines()` method of file-like objects. The delta generated also consists of newline- terminated strings, ready to be printed as-is via the writeline() method of a file-like object.

Example:

>>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1),
...                                'ore\ntree\nemu\n'.splitlines(1))),
- one
?  ^
+ ore
?  ^
- two
- three
?  -
+ tree
+ emu

_fancy_replace(self, a, alo, ahi, b, blo, bhi)

 

When replacing one block of lines with another, search the blocks for *similar* lines; the best-matching pair (if any) is used as a synch point, and intraline difference marking is done on the similar pair. Lots of work, but often worth it.

Example:

>>> d = Differ()
>>> results = d._fancy_replace(['abcDefghiJkl\n'], 0, 1,
...                            ['abcdefGhijkl\n'], 0, 1)
>>> print ''.join(results),
- abcDefghiJkl
?    ^  ^  ^
+ abcdefGhijkl
?    ^  ^  ^

_qformat(self, aline, bline, atags, btags)

 

Format "?" output and deal with leading tabs.

Example:

>>> d = Differ()
>>> results = d._qformat('\tabcDefghiJkl\n', '\t\tabcdefGhijkl\n',
...                      '  ^ ^  ^      ', '+  ^ ^  ^      ')
>>> for line in results: print repr(line)
...
'- \tabcDefghiJkl\n'
'? \t ^ ^  ^\n'
'+ \t\tabcdefGhijkl\n'
'? \t  ^ ^  ^\n'