Module urllib
[hide private]
[frames] | no frames]

Module urllib

Open an arbitrary URL.

See the following document for more info on URLs:
"Names and Addresses, URIs, URLs, URNs, URCs", at
http://www.w3.org/pub/WWW/Addressing/Overview.html

See also the HTTP spec (from which the error codes are derived):
"HTTP - Hypertext Transfer Protocol", at
http://www.w3.org/pub/WWW/Protocols/

Related standards and specs:
- RFC1808: the "relative URL" spec. (authoritative status)
- RFC1738 - the "URL standard". (authoritative status)
- RFC1630 - the "URI spec". (informational status)

The object returned by URLopener().open(file) will differ per
protocol.  All you know is that is has methods read(), readline(),
readlines(), fileno(), close() and info().  The read*(), fileno()
and close() methods work like those of open files.
The info() method returns a mimetools.Message object which can be
used to query various info about the object, if available.
(mimetools.Message objects are queried with the getheader() method.)


Version: 1.17

Classes [hide private]
ContentTooShortError
URLopener
Class to open URLs.
FancyURLopener
Derived class with handlers for errors we can handle (perhaps).
ftpwrapper
Class used by open_ftp() for cache of open FTP connections.
addbase
Base class for addinfo and addclosehook.
addclosehook
Class to add a close hook to an open file.
addinfo
class to add an info() method to an open file.
addinfourl
class to add info() and geturl() methods to an open file.
Functions [hide private]
 
basejoin(base, url, allow_fragments=True)
Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.
 
url2pathname(pathname)
OS-specific conversion from a relative URL of the 'file' scheme to a file system path; not recommended for general use.
 
pathname2url(pathname)
OS-specific conversion from a file system path to a relative URL of the 'file' scheme; not recommended for general use.
open file-like object
urlopen(url, data=...)
 
urlretrieve(url, filename=None, reporthook=None, data=None)
 
urlcleanup()
 
localhost()
Return the IP address of the magic hostname 'localhost'.
 
thishost()
Return the IP address of the current host.
 
ftperrors()
Return the set of errors raised by the FTP class.
 
noheaders()
Return an empty mimetools.Message object.
 
_is_unicode(x)
 
toBytes(url)
toBytes(u"URL") --> 'URL'.
 
unwrap(url)
unwrap('<URL:type://host/path>') --> 'type://host/path'.
 
splittype(url)
splittype('type:opaquestring') --> 'type', 'opaquestring'.
 
splithost(url)
splithost('//host[:port]/path') --> 'host[:port]', '/path'.
 
splituser(host)
splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'.
 
splitpasswd(user)
splitpasswd('user:passwd') -> 'user', 'passwd'.
 
splitport(host)
splitport('host:port') --> 'host', 'port'.
 
splitnport(host, defport=-1)
Split host and port, returning numeric port.
 
splitquery(url)
splitquery('/path?query') --> '/path', 'query'.
 
splittag(url)
splittag('/path#tag') --> '/path', 'tag'.
 
splitattr(url)
splitattr('/path;attr1=value1;attr2=value2;...') -> '/path', ['attr1=value1', 'attr2=value2', ...].
 
splitvalue(attr)
splitvalue('attr=value') --> 'attr', 'value'.
 
splitgophertype(selector)
splitgophertype('/Xselector') --> 'X', 'selector'.
 
unquote(s)
unquote('abc%20def') -> 'abc def'.
 
unquote_plus(s)
unquote('%7e/abc+def') -> '~/abc def'
 
quote(s, safe='/')
quote('abc def') -> 'abc%20def'
 
quote_plus(s, safe='')
Quote the query fragment of a URL; replacing ' ' with '+'
 
urlencode(query, doseq=0)
Encode a sequence of two-element tuples or dictionary into a URL query string.
 
getproxies_environment()
Return a dictionary of scheme -> proxy server URL mappings.
 
getproxies_internetconfig()
Return a dictionary of scheme -> proxy server URL mappings.
 
getproxies_registry()
Return a dictionary of scheme -> proxy server URL mappings.
 
getproxies()
Return a dictionary of scheme -> proxy server URL mappings.
 
proxy_bypass(host)
 
test1()
 
reporthook(blocknum, blocksize, totalsize)
 
test(args=[])
 
main()
Variables [hide private]
  MAXFTPCACHE = 10
  _urlopener = None
  ftpcache = {}
  _localhost = None
  _thishost = None
  _ftperrors = None
  _noheaders = None
  _typeprog = None
  _hostprog = None
  _userprog = None
  _passwdprog = None
  _portprog = None
  _nportprog = None
  _queryprog = None
  _tagprog = None
  _valueprog = None
  _hextochr = {'00': '\x00', '01': '\x01', '02': '\x02', '03': '...
  always_safe = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu...
  _safemaps = {('/', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop...

Imports: string, socket, os, time, sys


Function Details [hide private]

splitnport(host, defport=-1)

 

Split host and port, returning numeric port. Return given default port if no ':' found; defaults to -1. Return numerical port if a valid number are found after ':'. Return None if ':' but not a valid number.

quote(s, safe='/')

 
quote('abc def') -> 'abc%20def'

Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted.

RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters.

reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
              "$" | ","

Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.

By default, the quote function is intended for quoting the path
section of a URL.  Thus, it will not encode '/'.  This character
is reserved, but in typical usage the quote function is being
called on a path where the existing slash characters are used as
reserved characters.

urlencode(query, doseq=0)

 

Encode a sequence of two-element tuples or dictionary into a URL query string.

If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.

If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input.

getproxies_environment()

 

Return a dictionary of scheme -> proxy server URL mappings.

Scan the environment for variables named <scheme>_proxy; this seems to be the standard convention. If you need a different way, you can pass a proxies dictionary to the [Fancy]URLopener constructor.

getproxies_internetconfig()

 

Return a dictionary of scheme -> proxy server URL mappings.

By convention the mac uses Internet Config to store proxies. An HTTP proxy, for instance, is stored under the HttpProxy key.

getproxies_registry()

 

Return a dictionary of scheme -> proxy server URL mappings.

Win32 uses the registry to store proxies.

getproxies()

 

Return a dictionary of scheme -> proxy server URL mappings.

Scan the environment for variables named <scheme>_proxy; this seems to be the standard convention. If you need a different way, you can pass a proxies dictionary to the [Fancy]URLopener constructor.


Variables Details [hide private]

_hextochr

Value:
{'00': '\x00',
 '01': '\x01',
 '02': '\x02',
 '03': '\x03',
 '04': '\x04',
 '05': '\x05',
 '06': '\x06',
 '07': '\x07',
...

always_safe

Value:
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-'

_safemaps

Value:
{('/',
  'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-')\
: {'\x00': '%00',
   '\x01': '%01',
   '\x02': '%02',
   '\x03': '%03',
   '\x04': '%04',
   '\x05': '%05',
...