Use Python 2.x for the following exercises. Python 2.x has two kinds of strings:
Furthermore, the default encoding for strings in Pyton 2.x is set to "ascii", which means that you will get an error if you try to print anything that's not an ASCII character:
>>> print u"\u1000"If you want to see interesting Unicode characters, you need to explicitly encode your strings in your terminal's encoding. The default terminal emulator under Gnome uses a "utf-8" encoding. If you just encode, the weird characters show up escaped:
>>> u"\u1000".encode('utf-8')If you actually want this 8-bit string sent to the console literally, you need to print them explicitly:
>>> print u"\u1000".encode("utf-8")However, you don't actually need to display any of these strings in order to do the exercises (although you may find it useful).
Many Unicode fonts do not have glyphs for all Unicode characters; one font that tries for complete coverage of the Basic Multilingual Plane is the ``unifont'' font. On Debian and Ubuntu, you can apt-get install ttf-unifont (for other platforms, search for Unifont on Google and follow the instructions). To make a Gnome terminal that uses Unifont, create a new profile and select ``unifont'' as the font.
You can put extended Unicode characters into strings using quoting:
Write a pair of functions that convert a Python unicode string to a list of integer codepoints and back.
>>> s = u"\ud800\udc00"What happens if you use utf-8 instead of utf-16?
Write a pair of functions,
Write a function