Red-Black Tree Dictionaries
Prelab
In this lab you will be using the dictionary data-type defined in
section 13.6 of the text, which builds on the red-black tree data-type
from section 13.5. Therefore, you will need to be familiar with that
material. It will also be important to review the material from
chapter 8 (page 219) on using ``onto'' list arguments to avoid
append in doing tree traversals.
In lab
You will build a dictionary that allows a user to find the first and
last names and login name for all users with a given last name, or
with a given ``wild-carded'' last name. A wild-carded last name is
one that for which only the first part is specified, with a *
indicating that more can follow. For example, the wild-carded last
name Hans* matches not only the name Hans, but also
Hanson and Hansen. The names will always be specified as strings,
i.e., between double quote marks.
The lab assignment is to become familiar with the provided code, make
the few additions to it that are needed, and make timing comparisons
between it and the (provided) alternative of simply filtering through
a list. The details of what you are supposed to do are given in a
comment at the end of the file containing the provided code. That
file is in
~mc28/labs/red-black-dictionaries/rb-dicts.scm. Here is
another copy of what you are supposed to do:
-
You should write the
dictionary-insert! procedure and test it
by inserting some entries (such as the first few from names)
into names-by-last-name and then displaying the dictionary's
red-black tree by doing
(display-ranked-btree (red-black-tree names-by-last-name))
Note that the dictionary-insert! procedure should not be
large or complex: it just calls red-black-insert! with the
appropriate arguments.
-
Since you'll want to do tests with varying numbers of names
in the red-black tree, you should write a procedure that takes
an argument specifying the number of names, n, to be used and
does a
reset-dictionary! on names-by-last-name and then inserts
the first n elements of names into names-by-last-name using
the dictionary-insert! procedure you wrote, in conjunction with
for-each (as described in section 13.6) and the first-elements-of
procedure.
-
Write
binary-search-retrieve as described in 13.6, and then
define red-black-retrieve as identical to it, i.e.
(define red-black-retrieve binary-search-retrieve)
Then define dictionary-retrieve to call red-black-retrieve
on the dictionary's red-black tree using the dictionary's
key comparator and extractor. Test this using names-by-last-name.
You should be able to, for example, find all the users with your
last name, or find all users with last names of the form "Hans*",
i.e., starting with Hans.
-
Now do timing tests comparing
dictionary-retrieve with list-retrieve
as the number of entries being searched is increased. You will
get the most distinct results if the search is one that matches
very few names. However, you may also want to experimentally see
what effect the number of matches has on the time for each of the two
retrieval methods.
Postlab
As usual, write up a lab report that explains all of what you did to
an audience generally knowledgeable about Scheme. You should present
any data graphically (by, for instance, plotting running time as a
function of number of entries being searched) to make it easy to
interpret. If you want the reader to be able to compare two sets of
data, they should be shown in a single graph, using two different
styles of marker for the points. (Do not interpolate lines or curves
between the points.) If some of the numerical values are much smaller
than others, then you should use a logarithmic scale on your graph, so
that the small values are not all crammed together at the bottom or
left edge of the graph. (If both the x and the
y values vary widely, then you should use a logarithmic
scale for both axes.) You can use graph paper found in http://www.gustavus.edu/~wolfe/courses/graph-paper/.
Be sure not only to address the programming aspect of your work, but
also the asymptotic performance comparison you did between retrieval
from a list and a dictionary. In particular, be specific about what
you think the asymptotic performance is, and back up your assertion
with
- Analysis:
- Why would you suspect that asymptotic performance before
collecting data?
- Data:
- How does (or does not) the data support your analysis?
Note that when we say "asymptotic" performance, we mean performance in
the sense of a big-Theta or big-O analysis, where we look at the
general order of growth of the running time as the size of the data
structure grows larger.