The reStructuredText parser is implemented as a recursive state machine,
examining its input one line at a time. To understand how the parser works,
please first become familiar with the docutils.statemachine
module. In the
description below, references are made to classes defined in this module;
please see the individual classes for details.
Parsing proceeds as follows:
- The state machine examines each line of input, checking each of the
transition patterns of the state
Body
, in order, looking for a match.
The implicit transitions (blank lines and indentation) are checked before
any others. The 'text' transition is a catch-all (matches anything).
- The method associated with the matched transition pattern is called.
- Some transition methods are self-contained, appending elements to the
document tree (
Body.doctest
parses a doctest block). The parser's
current line index is advanced to the end of the element, and parsing
continues with step 1.
- Other transition methods trigger the creation of a nested state machine,
whose job is to parse a compound construct ('indent' does a block quote,
'bullet' does a bullet list, 'overline' does a section [first checking
for a valid section header], etc.).
- In the case of lists and explicit markup, a one-off state machine is
created and run to parse contents of the first item.
- A new state machine is created and its initial state is set to the
appropriate specialized state (
BulletList
in the case of the
'bullet' transition; see SpecializedBody
for more detail). This
state machine is run to parse the compound element (or series of
explicit markup elements), and returns as soon as a non-member element
is encountered. For example, the BulletList
state machine ends as
soon as it encounters an element which is not a list item of that
bullet list. The optional omission of inter-element blank lines is
enabled by this nested state machine.
- The current line index is advanced to the end of the elements parsed,
and parsing continues with step 1.
- The result of the 'text' transition depends on the next line of text.
The current state is changed to
Text
, under which the second line is
examined. If the second line is:
- Indented: The element is a definition list item, and parsing proceeds
similarly to step 2.B, using the
DefinitionList
state.
- A line of uniform punctuation characters: The element is a section
header; again, parsing proceeds as in step 2.B, and
Body
is still
used.
- Anything else: The element is a paragraph, which is examined for
inline markup and appended to the parent element. Processing
continues with step 1.
Classes |
Body |
Generic classifier of the first line of a block. |
BulletList |
Second and subsequent bullet_list list_items. |
Definition |
Second line of potential definition_list_item. |
DefinitionList |
Second and subsequent definition_list_items. |
EnumeratedList |
Second and subsequent enumerated_list list_items. |
Explicit |
Second and subsequent explicit markup construct. |
ExtensionOptions |
Parse field_list fields for extension options. |
FieldList |
Second and subsequent field_list fields. |
Inliner |
Parse inline markup; call the parse() method. |
Line |
Second line of over- & underlined section title or transition marker. |
NestedStateMachine |
StateMachine run from within other StateMachine runs, to parse nested
document structures. |
OptionList |
Second and subsequent option_list option_list_items. |
RFC2822Body |
RFC2822 headers are only valid as the first constructs in documents. |
RFC2822List |
Second and subsequent RFC2822-style field_list fields. |
RSTState |
reStructuredText State superclass. |
RSTStateMachine |
reStructuredText's master StateMachine. |
SpecializedBody |
Superclass for second and subsequent compound element members. |
SpecializedText |
Superclass for second and subsequent lines of Text-variants. |
Struct |
Stores data attributes for dotted-attribute access. |
SubstitutionDef |
Parser for the contents of a substitution_definition element. |
Text |
Classifier of second line of a text block. |