Module htmllib :: Class HTMLParser
[hide private]
[frames] | no frames]

_ClassType HTMLParser

markupbase.ParserBase --+    
                        |    
       sgmllib.SGMLParser --+
                            |
                           HTMLParser
Known Subclasses:

This is the basic HTML parser class.

It supports all entity names required by the XHTML 1.0 Recommendation. It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.

Instance Methods [hide private]
 
__init__(self, formatter, verbose=0)
Creates an instance of the HTMLParser class.
 
error(self, message)
 
reset(self)
Reset this instance.
 
handle_data(self, data)
 
save_bgn(self)
Begins saving character data in a buffer instead of sending it to the formatter object.
 
save_end(self)
Ends buffering character data and returns all data saved since the preceding call to the save_bgn() method.
 
anchor_bgn(self, href, name, type)
This method is called at the start of an anchor region.
 
anchor_end(self)
This method is called at the end of an anchor region.
 
handle_image(self, src, alt, *args)
This method is called to handle images.
 
start_html(self, attrs)
 
end_html(self)
 
start_head(self, attrs)
 
end_head(self)
 
start_body(self, attrs)
 
end_body(self)
 
start_title(self, attrs)
 
end_title(self)
 
do_base(self, attrs)
 
do_isindex(self, attrs)
 
do_link(self, attrs)
 
do_meta(self, attrs)
 
do_nextid(self, attrs)
 
start_h1(self, attrs)
 
end_h1(self)
 
start_h2(self, attrs)
 
end_h2(self)
 
start_h3(self, attrs)
 
end_h3(self)
 
start_h4(self, attrs)
 
end_h4(self)
 
start_h5(self, attrs)
 
end_h5(self)
 
start_h6(self, attrs)
 
end_h6(self)
 
do_p(self, attrs)
 
start_pre(self, attrs)
 
end_pre(self)
 
start_xmp(self, attrs)
 
end_xmp(self)
 
start_listing(self, attrs)
 
end_listing(self)
 
start_address(self, attrs)
 
end_address(self)
 
start_blockquote(self, attrs)
 
end_blockquote(self)
 
start_ul(self, attrs)
 
end_ul(self)
 
do_li(self, attrs)
 
start_ol(self, attrs)
 
end_ol(self)
 
start_menu(self, attrs)
 
end_menu(self)
 
start_dir(self, attrs)
 
end_dir(self)
 
start_dl(self, attrs)
 
end_dl(self)
 
do_dt(self, attrs)
 
do_dd(self, attrs)
 
ddpop(self, bl=0)
 
start_cite(self, attrs)
 
end_cite(self)
 
start_code(self, attrs)
 
end_code(self)
 
start_em(self, attrs)
 
end_em(self)
 
start_kbd(self, attrs)
 
end_kbd(self)
 
start_samp(self, attrs)
 
end_samp(self)
 
start_strong(self, attrs)
 
end_strong(self)
 
start_var(self, attrs)
 
end_var(self)
 
start_i(self, attrs)
 
end_i(self)
 
start_b(self, attrs)
 
end_b(self)
 
start_tt(self, attrs)
 
end_tt(self)
 
start_a(self, attrs)
 
end_a(self)
 
do_br(self, attrs)
 
do_hr(self, attrs)
 
do_img(self, attrs)
 
do_plaintext(self, attrs)
 
unknown_starttag(self, tag, attrs)
 
unknown_endtag(self, tag)

Inherited from sgmllib.SGMLParser: close, convert_charref, convert_codepoint, convert_entityref, feed, finish_endtag, finish_shorttag, finish_starttag, get_starttag_text, goahead, handle_charref, handle_comment, handle_decl, handle_endtag, handle_entityref, handle_pi, handle_starttag, parse_endtag, parse_pi, parse_starttag, report_unbalanced, setliteral, setnomoretags, unknown_charref, unknown_entityref

Inherited from sgmllib.SGMLParser (private): _convert_ref

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_declaration, parse_marked_section, unknown_decl, updatepos

Class Variables [hide private]

Inherited from sgmllib.SGMLParser: entity_or_charref

Inherited from sgmllib.SGMLParser (private): _decl_otherchars

Imports: entitydefs


Method Details [hide private]

__init__(self, formatter, verbose=0)
(Constructor)

 

Creates an instance of the HTMLParser class.

The formatter parameter is the formatter instance associated with the parser.

Overrides: markupbase.ParserBase.__init__

error(self, message)

 
Overrides: markupbase.ParserBase.error

reset(self)

 

Reset this instance. Loses all unprocessed data.

Overrides: markupbase.ParserBase.reset

handle_data(self, data)

 
Overrides: sgmllib.SGMLParser.handle_data

save_bgn(self)

 

Begins saving character data in a buffer instead of sending it to the formatter object.

Retrieve the stored data via the save_end() method. Use of the save_bgn() / save_end() pair may not be nested.

save_end(self)

 

Ends buffering character data and returns all data saved since the preceding call to the save_bgn() method.

If the nofill flag is false, whitespace is collapsed to single spaces. A call to this method without a preceding call to the save_bgn() method will raise a TypeError exception.

anchor_bgn(self, href, name, type)

 

This method is called at the start of an anchor region.

The arguments correspond to the attributes of the <A> tag with the same names. The default implementation maintains a list of hyperlinks (defined by the HREF attribute for <A> tags) within the document. The list of hyperlinks is available as the data attribute anchorlist.

anchor_end(self)

 

This method is called at the end of an anchor region.

The default implementation adds a textual footnote marker using an index into the list of hyperlinks created by the anchor_bgn()method.

handle_image(self, src, alt, *args)

 

This method is called to handle images.

The default implementation simply passes the alt value to the handle_data() method.

unknown_starttag(self, tag, attrs)

 
Overrides: sgmllib.SGMLParser.unknown_starttag

unknown_endtag(self, tag)

 
Overrides: sgmllib.SGMLParser.unknown_endtag