CSci2110 - Problem Set 4

Due: Tuseday March 25th by 11:59pm

You are to implement the Set of String ADT. One natural use of a set is to hold the words in a spelling dictionary. This makes looking up the words very efficient. In this assignment, you will implement and use such a dictionary.

Part 1: Write a class that implements an a closed Hash Table of strings. You class should use the double hashing method with

where

To help you I have created a initial partial implementations that defines a constructor and the hash functions.

For the hash code (ie. String --> Int mapping) you should use the built in Java hashcode() method on Strings. Here is an example of how NOT to generate a hash function.

Your class should be an implementation of the ADT Set of Strings. The class must include:

Be sure to create a JUnit test class and good documentation.

Part 2: Write a main program that uses your hash table class. The main program will behave like the standard Linux utility program ispell, when that program is used interactively to check individual words. To try it, enter the command ispell on the command line of any Linux system. You will be prompted to enter words. When you type a word, there are three possible responses: An output of "ok" means that the word is in the dictionary. An output of "not found" means that the word is not in the dictionary and neither are any near misses. Otherwise, the output is a list of near misses, that is, words that are in the dictionary and are similar to what you typed. The ispell program recognizes several types of near-misses: single-letter omissions, single-letter additions, transpositions of two neighboring letters, substitution of one letter for another, and two words run together. Your program should do the same.

The file 2of12.txt contains about 41236 commonly used words, with one word per line. (This is not the same dictionary used by ispell.) Your program will use this file for its dictionary. It should create a hash table of an appropriate size for the number of words. It should read the words from the file and add them all to the hash table. Then it should prompt the user to enter words. For each word, it should check whether the word is in the dictionary. If so, it should say "ok". If not, it should look for all possible near misses. If the program finds any near misses in the dictionary, it should print them. If not, it should say "not found". One issue that you will have to settle is what to do with upper-case letters. Note that both the words in the dictionary and the user's input can contain upper case letters. I suggest that you simply convert all letters to lower case.

Searching for near misses will require a significant amount of programming. You will have to generate every possible near miss and look for it in the dictionary. Specifically:

You should list the near misses in alphabetical order, as ispell does.

Be sure to create a JUnit test class and good documentation.

Note: No late assignments will be accepted. Discuss ideas with your friends, but write and hand in your own code. Plagiarism will be dealt with severely.


Based on an assignment created by David J. Eck.