Zeit und Ort
- Do., 15:30 - 17:00 Uhr in 48-208
- Start: 21.04.2011
- 11 Termine
Uebungen - Do., 17:15 - 18:00 in 48-211
Dozenten
- Prof. Dr. Thomas Breuel
- Dr.
Faisal Shafait
Weitere Informationen
- Anzahl der SWS: 2 Std. / 1 Std.
- Anzahl
der ECTS Credits: 4
- Unterrichtssprache: English
- Kennung:
INF-13-54-V-7
- Prüfungsnummer des
Prüfungsamtes: 61354
- KIS
Link
- Exercises: Ilya Mezhirov (mezhirov@iupr.com)
- Oral exam: July 27, alternatively October 6 or 13, 2011
- Schedule for exam will be published 1 week ahead.
- Register (besides with the Prüfungsamt) by email to secretary@iupr.com till June 17.
|
Topics and Applications
Most of the data we interact with
day-to-day does not come in the form of data structures or databases,
but instead in the form of documents and document images. This course
introduces students to the formats, techniques, and algorithms used for
representing, compressing, analyzing, processing, and displaying
documents. Topics covered include:
- document formats and
standards (TIFF, JPEG, PDF, PostScript, SVG)
- document image
compression (G4, MRC, token based compression, JPEG2000)
- logical
markup (HTML, XML, word processing formats, DocBook)
- writings
systems of the world
- character sets and character encodings
(ASCII, Unicode, special coding systems)
- text rendering, layout,
ligatures, and hyphenation (Pango)
- typesetting and page layout
systems (text flow, Word, LaTeX, etc.)
- OCR (character
recognition, page segmentation)
- spelling and orthographic
variation, statistical language modeling
- document capture, page
image dewarping and handheld document capture
- named entity
recognition, information extraction, table recognition
- document
search and retrieval, text mining, document databases
- reading,
psychophysics, and human-document interaction
- document security
and forensics
|