Module robotparser :: Class RobotFileParser
[hide private]
[frames] | no frames]

_ClassType RobotFileParser

This class provides a set of methods to read, parse and answer questions about a single robots.txt file.

Instance Methods [hide private]
 
__init__(self, url='')
 
mtime(self)
Returns the time the robots.txt file was last fetched.
 
modified(self)
Sets the time the robots.txt file was last fetched to the current time.
 
set_url(self, url)
Sets the URL referring to a robots.txt file.
 
read(self)
Reads the robots.txt URL and feeds it to the parser.
 
_add_entry(self, entry)
 
parse(self, lines)
parse the input lines from a robots.txt file.
 
can_fetch(self, useragent, url)
using the parsed robots.txt decide if useragent can fetch url
 
__str__(self)
Method Details [hide private]

mtime(self)

 

Returns the time the robots.txt file was last fetched.

This is useful for long-running web spiders that need to check for new robots.txt files periodically.

parse(self, lines)

 

parse the input lines from a robots.txt file. We allow that a user-agent: line is not preceded by one or more blank lines.