Course

 

Zeit und Ort

  • Do., 15:30 - 17:00 Uhr in 48-208     


  • Start: 21.04.2011         
  • 11 Termine
Uebungen
  • Do., 17:15 - 18:00 in 48-211 

Dozenten

  • Prof. Dr. Thomas Breuel
  • Dr. Faisal Shafait

Weitere Informationen

  • Anzahl der SWS: 2 Std. / 1 Std.
  • Anzahl der ECTS Credits: 4
  • Unterrichtssprache: English
  • Kennung: INF-13-54-V-7
  • Prüfungsnummer des Prüfungsamtes: 61354
  • KIS Link
  • Exercises: Ilya Mezhirov (mezhirov@iupr.com)
  • Oral exam: July 27, alternatively October 6 or 13, 2011
  • Schedule for exam will be published 1 week ahead.
  • Register (besides with the Prüfungsamt) by email to secretary@iupr.com till June 17.





 

Topics and Applications

Most of the data we interact with day-to-day does not come in the form of data structures or databases, but instead in the form of documents and document images. This course introduces students to the formats, techniques, and algorithms used for representing, compressing, analyzing, processing, and displaying documents. Topics covered include:
  • document formats and standards (TIFF, JPEG, PDF, PostScript, SVG)
  • document image compression (G4, MRC, token based compression, JPEG2000)
  • logical markup (HTML, XML, word processing formats, DocBook)
  • writings systems of the world
  • character sets and character encodings (ASCII, Unicode, special coding systems)
  • text rendering, layout, ligatures, and hyphenation (Pango)
  • typesetting and page layout systems (text flow, Word, LaTeX, etc.)
  • OCR (character recognition, page segmentation)
  • spelling and orthographic variation, statistical language modeling
  • document capture, page image dewarping and handheld document capture
  • named entity recognition, information extraction, table recognition
  • document search and retrieval, text mining, document databases
  • reading, psychophysics, and human-document interaction
  • document security and forensics




SS2010-Document Analysis - termin/topic plan


Subpages (1): Registration