Package docutils :: Package parsers :: Package rst :: Module states
[show private | hide private]
[frames | no frames]

Module docutils.parsers.rst.states

This is the docutils.parsers.restructuredtext.states module, the core of the reStructuredText parser. It defines the following:

Parser Overview

The reStructuredText parser is implemented as a recursive state machine, examining its input one line at a time. To understand how the parser works, please first become familiar with the docutils.statemachine module. In the description below, references are made to classes defined in this module; please see the individual classes for details.

Parsing proceeds as follows:

  1. The state machine examines each line of input, checking each of the transition patterns of the state Body, in order, looking for a match. The implicit transitions (blank lines and indentation) are checked before any others. The 'text' transition is a catch-all (matches anything).
  2. The method associated with the matched transition pattern is called.
    1. Some transition methods are self-contained, appending elements to the document tree (Body.doctest parses a doctest block). The parser's current line index is advanced to the end of the element, and parsing continues with step 1.
    2. Other transition methods trigger the creation of a nested state machine, whose job is to parse a compound construct ('indent' does a block quote, 'bullet' does a bullet list, 'overline' does a section [first checking for a valid section header], etc.).
      • In the case of lists and explicit markup, a one-off state machine is created and run to parse contents of the first item.
      • A new state machine is created and its initial state is set to the appropriate specialized state (BulletList in the case of the 'bullet' transition; see SpecializedBody for more detail). This state machine is run to parse the compound element (or series of explicit markup elements), and returns as soon as a non-member element is encountered. For example, the BulletList state machine ends as soon as it encounters an element which is not a list item of that bullet list. The optional omission of inter-element blank lines is enabled by this nested state machine.
      • The current line index is advanced to the end of the elements parsed, and parsing continues with step 1.
    3. The result of the 'text' transition depends on the next line of text. The current state is changed to Text, under which the second line is examined. If the second line is:
      • Indented: The element is a definition list item, and parsing proceeds similarly to step 2.B, using the DefinitionList state.
      • A line of uniform punctuation characters: The element is a section header; again, parsing proceeds as in step 2.B, and Body is still used.
      • Anything else: The element is a paragraph, which is examined for inline markup and appended to the parent element. Processing continues with step 1.

Classes
Body Generic classifier of the first line of a block.
BulletList Second and subsequent bullet_list list_items.
Definition Second line of potential definition_list_item.
DefinitionList Second and subsequent definition_list_items.
EnumeratedList Second and subsequent enumerated_list list_items.
Explicit Second and subsequent explicit markup construct.
ExtensionOptions Parse field_list fields for extension options.
FieldList Second and subsequent field_list fields.
Inliner Parse inline markup; call the parse() method.
Line Second line of over- & underlined section title or transition marker.
NestedStateMachine StateMachine run from within other StateMachine runs, to parse nested document structures.
OptionList Second and subsequent option_list option_list_items.
RFC2822Body RFC2822 headers are only valid as the first constructs in documents.
RFC2822List Second and subsequent RFC2822-style field_list fields.
RSTState reStructuredText State superclass.
RSTStateMachine reStructuredText's master StateMachine.
SpecializedBody Superclass for second and subsequent compound element members.
SpecializedText Superclass for second and subsequent lines of Text-variants.
Struct Stores data attributes for dotted-attribute access.
SubstitutionDef Parser for the contents of a substitution_definition element.
Text Classifier of second line of a text block.

Exceptions
InterpretedRoleNotImplementedError  
MarkupError  
MarkupMismatch  
ParserError  
UnknownInterpretedRoleError  

Function Summary
  build_regexp(definition, compile)
Build, compile and return a regular expression based on definition.
  escape2null(text)
Return a string with escape-backslashes converted to nulls.
  unescape(text, restore_backslashes)
Return a string with nulls removed or restored to backslashes.

Variable Summary
tuple state_classes = (<class docutils.parsers.rst.states.Body...

Function Details

build_regexp(definition, compile=1)

Build, compile and return a regular expression based on definition.

escape2null(text)

Return a string with escape-backslashes converted to nulls.

unescape(text, restore_backslashes=0)

Return a string with nulls removed or restored to backslashes. Backslash-escaped spaces are also removed.

Variable Details

state_classes

Type:
tuple
Value:
(<class docutils.parsers.rst.states.Body at 0x8302934>,
 <class docutils.parsers.rst.states.BulletList at 0x8303254>,
 <class docutils.parsers.rst.states.DefinitionList at 0x82ef5dc>,
 <class docutils.parsers.rst.states.EnumeratedList at 0x83023d4>,
 <class docutils.parsers.rst.states.FieldList at 0x8340b2c>,
 <class docutils.parsers.rst.states.OptionList at 0x82e4c3c>,
 <class docutils.parsers.rst.states.ExtensionOptions at 0x83009dc>,
 <class docutils.parsers.rst.states.Explicit at 0x8301ba4>,
...                                                                    

Generated by Epydoc 2.0 on Tue Jul 22 05:30:36 2003 http://epydoc.sf.net