The function testencoding is used as an end-to-end test for unicode encodings. It takes a given string, writes it to a python file, and processes that file's documentation. It then generates HTML output from the documentation, extracts all docstrings from the generated HTML output, and displays them. (In order to extract & display all docstrings, it monkey-patches the HMTLwriter.docstring_to_html() method.)
>>> from epydoc.test.util import testencoding>>> from epydoc.test.util import print_warnings >>> print_warnings()
This section tests the output for a variety of different encodings. Note that some encodings (such as cp424) are not supported, since the ascii coding directive would result in a syntax error in the new encoding.
Tests for several Microsoft codepges:
>>> testencoding('''# -*- coding: cp874 -*- ... """abc ABC 123 \x80 \x85""" ... ''') <p>abc ABC 123 € …</p>>>> testencoding('''# -*- coding: cp1250 -*- ... """abc ABC 123 \x80 \x82 \x84 \x85 \xff""" ... ''') <p>abc ABC 123 € ‚ „ … ˙</p>>>> testencoding('''# -*- coding: cp1251 -*- ... """abc ABC 123 \x80 \x81 \x82 \xff""" ... ''') <p>abc ABC 123 Ђ Ѓ ‚ я</p>>>> testencoding('''# -*- coding: cp1252 -*- ... """abc ABC 123 \x80 \x82 \x83 \xff""" ... ''') <p>abc ABC 123 € ‚ ƒ ÿ</p>>>> testencoding('''# -*- coding: cp1253 -*- ... """abc ABC 123 \x80 \x82 \x83 \xfe""" ... ''') <p>abc ABC 123 € ‚ ƒ ώ</p>
Unicode tests:
>>> utf8_test ='''\ ... """abc ABC 123 ... ... 0x80-0x7ff range: ... \xc2\x80 \xc2\x81 \xdf\xbe \xdf\xbf ... ... 0x800-0xffff range: ... \xe0\xa0\x80 \xe0\xa0\x81 \xef\xbf\xbe \xef\xbf\xbf ... ... 0x10000-0x10ffff range: ... \xf0\x90\x80\x80 \xf0\x90\x80\x81 ... \xf4\x8f\xbf\xbe \xf4\x8f\xbf\xbf ... """\n''' >>> utf8_bom = '\xef\xbb\xbf'>>> # UTF-8 with a coding directive: >>> testencoding("# -*- coding: utf-8 -*-\n"+utf8_test) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p>>>> # UTF-8 with a BOM & a coding directive: >>> testencoding(utf8_bom+"# -*- coding: utf-8 -*-\n"+utf8_test) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p>>>> # UTF-8 with a BOM & no coding directive: >>> testencoding(utf8_bom+utf8_test) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p>
Tests for KOI8-R:
>>> testencoding('''# -*- coding: koi8-r -*- ... """abc ABC 123 \x80 \x82 \x83 \xff""" ... ''') <p>abc ABC 123 ─ ┌ ┐ Ъ</p>
Tests for 'coding' directive on the second line:
>>> testencoding('''\n# -*- coding: cp1252 -*- ... """abc ABC 123 \x80 \x82 \x83 \xff""" ... ''') <p>abc ABC 123 € ‚ ƒ ÿ</p>>>> testencoding('''# comment on the first line.\n# -*- coding: cp1252 -*- ... """abc ABC 123 \x80 \x82 \x83 \xff""" ... ''') <p>abc ABC 123 € ‚ ƒ ÿ</p>>>> testencoding("\n# -*- coding: utf-8 -*-\n"+utf8_test) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p>>>> testencoding("# comment\n# -*- coding: utf-8 -*-\n"+utf8_test) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p>
Tests for shift-jis
>>> testencoding('''# -*- coding: shift_jis -*- ... """abc ABC 123 \xA1 \xA2 \xA3""" ... ''') # doctest: +PYTHON2.4 abc ABC 123 。 「 」
Make sure that we use the coding for both str and unicode docstrings.
>>> testencoding('''# -*- coding: utf-8 -*- ... """abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80""" ... ''') <p>abc ABC 123 € ߿ ࠀ</p>>>> testencoding('''# -*- coding: utf-8 -*- ... u"""abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80""" ... ''') <p>abc ABC 123 € ߿ ࠀ</p>
Under special circumstances, we may not be able to tell what the proper encoding for a docstring is. This happens if:
Under these circumstances, we issue a warning, and treat the docstring as latin-1. An example of this is a non-unicode docstring for properties:
>>> testencoding('''# -*- coding: utf-8 -*- ... p=property(doc="""\xc2\x80""") ... ''') # doctest: +ELLIPSIS <property object at ...>'s docstring is not a unicode string, but it contains non-ascii data -- treating it as latin-1. €
This section checks to make sure that both introspection & parsing are getting the right results.
>>> testencoding("# -*- coding: utf-8 -*-\n"+utf8_test, introspect=False) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p> >>> testencoding(utf8_bom+"# -*- coding: utf-8 -*-\n"+utf8_test, introspect=False) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p> >>> testencoding(utf8_bom+utf8_test, introspect=False) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p>>>> testencoding("# -*- coding: utf-8 -*-\n"+utf8_test, parse=False) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p> >>> testencoding(utf8_bom+"# -*- coding: utf-8 -*-\n"+utf8_test, parse=False) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p> >>> testencoding(utf8_bom+utf8_test, parse=False) <p>abc ABC 123</p> <p>0x80-0x7ff range: €  ߾ ߿</p> <p>0x800-0xffff range: ࠀ ࠁ  </p> <p>0x10000-0x10ffff range: 𐀀 𐀁  </p>
Make sure that docstrings are rendered correctly in different contexts.
>>> testencoding('''# -*- coding: utf-8 -*- ... """ ... @var x: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @group \xc2\x80: x ... """ ... ''') abc ABC 123 € ߿ ࠀ>>> testencoding('''# -*- coding: utf-8 -*- ... def f(x): ... """ ... abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @param x: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @type x: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @return: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @rtype: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @except X: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... """ ... ''') abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ <p>abc ABC 123 € ߿ ࠀ</p> abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ>>> testencoding('''# -*- coding: utf-8 -*- ... class A: ... """ ... abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @ivar x: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @cvar y: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... @type x: abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80 ... """ ... ... z = property(doc=u"abc ABC 123 \xc2\x80 \xdf\xbf \xe0\xa0\x80") ... ''') abc ABC 123 € ߿ ࠀ <p>abc ABC 123 € ߿ ࠀ</p> abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ abc ABC 123 € ߿ ࠀ
Home | Installing Epydoc | Using Epydoc | Epytext |