CMPS 340 (File Processing)
Fall 2008
Final Exam Study Guide
- When: Wednesday, December 10, 12:45pm.
- Where: St. Thomas Hall 314? (the usual classroom)
- What to bring: textbook, notes, graded homeworks and programs,
hard copy of course web pages.
- Kinds of questions to expect (the majority of which will pertain
to material not covered on the first test):
- File Organization: (fields, records, buckets)
- Indexing:
- Tradeoffs in the concrete vs. abstract pointer/reference
issue (i.e., using ``physical'' addresses vs. using
logical addresses in the form of a record key (e.g., SSN))
-
B+-trees: insertions and deletions
-
Extendible Hashing: insertions and deletions
-
Plain Old Hashing: calculate # of probes required, on
average, for searching (both successful and unsuccessful)
in a (plain) hash table in which linear probing is used
as the collision resolution strategy.
-
Tradeoffs between tree-based indexing and hash-based indexing
-
External Sorting:
- Estimating running time of merging phase of
external sorting (see web page)
- Replacement Selection technique for producing
initial sorted runs larger than available RAM
- Sequential File Update
- Database:
- Relational Algebra and SQL queries: evaluating and developing
-
Algorithms for Query Processing (see web page)
-
Query Optimization
- Data Coding/Compression/Decompression:
- coding, unique decipherability, deciphering delay
- Run-length encoding
(be prepared to develop an algorithm for compression or
decompression)
- Huffman compression/decompression
- Encoding a Huffman tree as a bit string
- Canonical codewords (left-skewed Huffman tree)
- LZW compression/decompression
- Information Retrieval:
- Notions of Precision and Recall
- Some basic notions regarding measuring the relevance
of a document with respect to a query: term frequency
(within a document), term scarcity (within the set of
documents).
- Case Folding, Stemming, and Stop Words