More specifically, I am giving you several files which will help you in your work, all of which are located in the following directory (which you should probably copy to your work space):
~karl/www-docs/courses/mcs394-s00/labs/lab2/decision-trees/
tennis-data.scm
contains the tennis data from Table 3.2 on page 59 of
Mitchell. You should read through this file carefully so you
understand the form the data must be in.
utility.scm
contains various Scheme procedures you should find helpful. In
summary, here are the procedures you will need:
filter and delq.
random-fractional-list, which will
seful when partitioning the data into training and testing sets. This
procedure is well-documented in the final large comment in utility.scm.
house-votes-data.scm
is the Scheme form of the House voting data I briefly described in
class. The data files I used to create this Scheme datafile are in
the following two files:
house-votes-84.names
house-votes-84.data
id3 that takes a list of training examples and returns
the decision tree that ID3 should return for that list. In
particular, if you use all of the tennis-data in tennis-data.scm,
you should get the tree in Figure 3.1 on page 53 of Mitchell.
One remark: I recommend that you use a fairly simple, concrete representation for your decision tree. For example, my representation for the decision tree in Figure 3.1 is:
(outlook (sunny (humidity (high no)
(normal yes)))
(overcast yes)
(rain (wind (strong no)
(weak yes))))
classify which
takes a decision tree and an example, and returns the target
classification that the decision tree gives for the example.
random-fractional-list in utility.scm will be
helpful for this task.