Red-Black Tree Dictionaries

MCS-178: Introduction to Computer Science II
Max Hailperin, Fall 2001

Due: November 14th, at the beginning of class

Prelab

In this lab you will be using the dictionary data-type defined in section 13.6 of the text, which builds on the red-black tree data-type from section 13.5. Therefore, you will need to be familiar with that material. It will also be important to review the material from chapter 8 (page 219) on using ``onto'' list arguments to avoid append in doing tree traversals.

In lab

You will build a dictionary that allows a user to find the first and last names and login name for all users with a given last name, or with a given ``wild-carded'' last name. A wild-carded last name is one that for which only the first part is specified, with a * indicating that more can follow. For example, the wild-carded last name Hans* matches not only the name Hans, but also Hanson and Hansen. The names will always be specified as strings, i.e., between double quote marks.

The lab assignment is to become familiar with the provided code, make the few additions to it that are needed, and make timing comparisons between it and the (provided) alternative of simply filtering through a list. The details of what you are supposed to do are given in a comment at the end of the file containing the provided code. That file is in ~mc28/labs/red-black-dictionaries/rb-dicts.scm. Here is another copy of what you are supposed to do:

  1. You should write the dictionary-insert! procedure and test it by inserting some entries (such as the first few from names) into names-by-last-name and then displaying the dictionary's red-black tree by doing
        (display-ranked-btree (red-black-tree names-by-last-name))
    
    Note that the dictionary-insert! procedure should not be large or complex: it just calls red-black-insert! with the appropriate arguments.

  2. Since you'll want to do tests with varying numbers of names in the red-black tree, you should write a procedure that takes an argument specifying the number of names, n, to be used and does a reset-dictionary! on names-by-last-name and then inserts the first n elements of names into names-by-last-name using the dictionary-insert! procedure you wrote, in conjunction with for-each (as described in section 13.6) and the first-elements-of procedure.

  3. Write binary-search-retrieve as described in 13.6, and then define red-black-retrieve as identical to it, i.e.
        (define red-black-retrieve binary-search-retrieve)
    
    Then define dictionary-retrieve to call red-black-retrieve on the dictionary's red-black tree using the dictionary's key comparator and extractor. Test this using names-by-last-name. You should be able to, for example, find all the users with your last name, or find all users with last names of the form "Hans*", i.e., starting with Hans.

  4. Now do timing tests comparing dictionary-retrieve with list-retrieve as the number of entries being searched is increased. You will get the most distinct results if the search is one that matches very few names. However, you may also want to experimentally see what effect the number of matches has on the time for each of the two retrieval methods.

Postlab

As usual, write up a lab report that explains all of what you did to an audience generally knowledgeable about Scheme. You should present any data graphically (by, for instance, plotting running time as a function of number of entries being searched) to make it easy to interpret. If you want the reader to be able to compare two sets of data, they should be shown in a single graph, using two different styles of marker for the points. (Do not interpolate lines or curves between the points.) If some of the numerical values are much smaller than others, then you should use a logarithmic scale on your graph, so that the small values are not all crammed together at the bottom or left edge of the graph. (If both the x and the y values vary widely, then you should use a logarithmic scale for both axes.) You can use graph paper found in http://www.gustavus.edu/~wolfe/courses/graph-paper/. Be sure not only to address the programming aspect of your work, but also the asymptotic performance comparison you did between retrieval from a list and a dictionary. In particular, be specific about what you think the asymptotic performance is, and back up your assertion with
Analysis:
Why would you suspect that asymptotic performance before collecting data?
Data:
How does (or does not) the data support your analysis?
Note that when we say "asymptotic" performance, we mean performance in the sense of a big-Theta or big-O analysis, where we look at the general order of growth of the running time as the size of the data structure grows larger.