Finding the most frequent line

Your goal in this lab is to work on a program for finding which textual line appears most frequently in an input file; for example, this can be used to process the list of web pages Gustavus's server served in a given month to find the most popular page. The key data structure needed is a table for keeping track of the occurrence counts for the lines. My thanks to Max Hailperin, who wrote the original version of this lab. Any errors were undoubtedly introduced by the current author, David Wolfe.

Feel free to draw diagrams on the board as you and your fellow students work out the details, but try to come up with the actual code individually.

Programming strategy

We recommend that you work through the program in the following order:
  1. Copy over the requisite files by going to the directory where you want them located and typing
       cp -rd ~mc38/labs/mostFrequentLine/files mostFrequentLine
    
    The -r flag says copy the whole directory. The -d flag says copy symbolic links instead of copying whole files.

  2. We have include a compiled version of the program called karls-solution, as well as links to all three data files called small.log, medium.log, and large.log which contain (roughly) 10, 100, and 10000 lines, respectively. These were taken from the monthly log of our web server this month. To try out the program, type
       ./karls-solution < medium.log
    

    Note: since the third file (large.log) is so large, you should be sure that you have linked, rather than copied, the data files, as we instructed you to do.

  3. Read through the code files so that you understand the program's structure. Note that I inserted the words IMPLEMENT THIS as comments in order to indicate where you need to do your coding.

  4. First implement the constructor and destructor for Node.

  5. Next implement the constructor and destructor for StringToIntTable.

  6. Next implement the method findNodePtr in the StringToIntTable class. Done correctly, this will not require much code, so be sure to think before coding.

  7. Finally, implement int& operator[](const string str). Given findNodePtr, this should not be too hard.

  8. Test your program on the data files by typing
       ./mostFrequentLine < small.log
    
    for example (assuming small.log is in your directory).

  9. Optional: if you are looking for something more to do, try to changing mostFrequentLine.cc so that it prints out the 10 most frequent lines.

  10. Even more optional: if you are still looking for something more to do, try to changing the binary search trees to red-black trees. (If you haven't yet taken MCS-178, you will have to ask David or me about this.)