Package docutils :: Package parsers :: Package rst :: Module tableparser :: Class GridTableParser
[show private | hide private]
[frames | no frames]

Class GridTableParser

TableParser --+
              |
             GridTableParser


Parse a grid table using parse().

Here's an example of a grid table:

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | Cells may span columns.          |
+------------------------+------------+---------------------+
| body row 3             | Cells may  | - Table cells       |
+------------------------+ span rows. | - contain           |
| body row 4             |            | - body elements.    |
+------------------------+------------+---------------------+

Intersections use '+', row separators use '-' (except for one optional head/body row separator, which uses '='), and column separators use '|'.

Passing the above table to the parse() method will result in the following data structure:

([24, 12, 10, 10],
 [[(0, 0, 1, ['Header row, column 1']),
   (0, 0, 1, ['Header 2']),
   (0, 0, 1, ['Header 3']),
   (0, 0, 1, ['Header 4'])]],
 [[(0, 0, 3, ['body row 1, column 1']),
   (0, 0, 3, ['column 2']),
   (0, 0, 3, ['column 3']),
   (0, 0, 3, ['column 4'])],
  [(0, 0, 5, ['body row 2']),
   (0, 2, 5, ['Cells may span columns.']),
   None,
   None],
  [(0, 0, 7, ['body row 3']),
   (1, 0, 7, ['Cells may', 'span rows.', '']),
   (1, 1, 7, ['- Table cells', '- contain', '- body elements.']),
   None],
  [(0, 0, 9, ['body row 4']), None, None, None]])

The first item is a list containing column widths (colspecs). The second item is a list of head rows, and the third is a list of body rows. Each row contains a list of cells. Each cell is either None (for a cell unused because of another cell's span), or a tuple. A cell tuple contains four items: the number of extra rows used by the cell in a vertical span (morerows); the number of extra columns used by the cell in a horizontal span (morecols); the line offset of the first line of the cell contents; and the cell contents, a list of lines of text.


Method Summary
  check_parse_complete(self)
Each text column should have been completely seen.
  get_cell_block(self, top, left, bottom, right)
Given the corners, extract the text of a cell.
  mark_done(self, top, left, bottom, right)
For keeping track of how much of each text column has been seen.
  parse_table(self)
Start with a queue of upper-left corners, containing the upper-left corner of the table itself.
  scan_cell(self, top, left)
Starting at the top-left corner, start tracing out a cell.
  scan_down(self, top, left, right)
Look for the bottom-right corner of the cell, making note of all row boundaries.
  scan_left(self, top, left, bottom, right)
Noting column boundaries, look for the bottom-left corner of the cell.
  scan_right(self, top, left)
Look for the top-right corner of the cell, and make note of all column boundaries ('+').
  scan_up(self, top, left, bottom, right)
Noting row boundaries, see if we can return to the starting point.
  setup(self, block)
  structure_from_cells(self)
From the data colledted by scan_cell(), convert to the final data structure.
    Inherited from TableParser
  find_head_body_sep(self)
Look for a head/body row separator line; store the line index.
  parse(self, block)
Analyze the text block and return a table data structure.

Class Variable Summary
SRE_Pattern head_body_separator_pat = \+=[=\+]+=\+ *$

Method Details

check_parse_complete(self)

Each text column should have been completely seen.

get_cell_block(self, top, left, bottom, right)

Given the corners, extract the text of a cell.

mark_done(self, top, left, bottom, right)

For keeping track of how much of each text column has been seen.

parse_table(self)

Start with a queue of upper-left corners, containing the upper-left corner of the table itself. Trace out one rectangular cell, remember it, and add its upper-right and lower-left corners to the queue of potential upper-left corners of further cells. Process the queue in top-to-bottom order, keeping track of how much of each text column has been seen.

We'll end up knowing all the row and column boundaries, cell positions and their dimensions.

scan_cell(self, top, left)

Starting at the top-left corner, start tracing out a cell.

scan_down(self, top, left, right)

Look for the bottom-right corner of the cell, making note of all row boundaries.

scan_left(self, top, left, bottom, right)

Noting column boundaries, look for the bottom-left corner of the cell. It must line up with the starting point.

scan_right(self, top, left)

Look for the top-right corner of the cell, and make note of all column boundaries ('+').

scan_up(self, top, left, bottom, right)

Noting row boundaries, see if we can return to the starting point.

structure_from_cells(self)

From the data colledted by scan_cell(), convert to the final data structure.

Class Variable Details

head_body_separator_pat

Type:
SRE_Pattern
Value:
\+=[=\+]+=\+ *$                                                        

Generated by Epydoc 2.0 on Tue Jul 22 05:30:59 2003 http://epydoc.sf.net