Project

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Data Structures and Algorithms (NTU, Class 01/02, Spring 2011)

instructor: Hsuan-Tien Lin

RELEASE DATE: 05/19/2011 DUE DATE: 06/30/2011, noon ON CEIBA

Final Project

Unless granted by the instructor in advance, no late submissions will be allowed. Also, the gold medals cannot be used on the nal project.

Introduction
The main theme of the nal project is a spell checker program. The spell checker program helps check whether each word in the article appears in a prescribed dictionary. If not, the word is considered misspelled. Of course, we expect the checker to e ectively use the computational and storage resources of your computer. So the data structure (and the associated algorithm) used for representing the dictionary can be crucial.

Problem Description
The spell checker needs the following basic functionality. You can use your own function prototype for implementing the functionality.
 build(text le, dictionary le) converts a text le to a binary dictionary le that matches your data

structure

 dictionary = load(dictionary le) loads the dictionary le to the memory  check(dictionary, word) checks whether a given word is within the dictionary in the memory  words = basic suggest(dictionary, word) returns a list of suggested words from the dictionary

For the function basic suggest, consider the following common typos:
 missing one character from a word, like dictionary 3 dictonary  adding one character to a word, like dictionary 3 dicktionary  replacing one character in a word, like dictionary 3 dicsionary  switching the order of two consecutive characters in the word, like dictionary 3 ditcionary

The function should try to see if the mis-spelled word comes from a typo of any word in the dictionary. If so, it adds the word to the suggestion list.

Survey Report
You are asked to study at least TWO data structures for dealing with the dictionary. Then, you should make a comparison of those data structures according to some di erent perspectives, such as average speed, worst speed, space, implementation, popularity, etc.. Based on the results of your comparison, you are asked to recommend the best one for the spell checker program, and provide the \cons and pros" of the choice. The survey report should be less than or equal to ten A4-pages with readable font sizes and formats. Criteria for evaluating your survey report would include, but are not limited to, clarity, strength of your reasoning, \correctness" in using the data structures, and the work loads of team members.

Competition
We will hold a mini-competition for the project. Each team is asked to submit the source code of your spell checker to be automatically compiled on the CSIE Linux machines for the mini-competition. The details will be announced in the week of May 23, 2011. 1 of 3

Data Structures and Algorithms (NTU, Class 01/02, Spring 2011)

instructor: Hsuan-Tien Lin

The mini-competition tests the basic functionality. In particular, you need to provide a program named project such that
./project -b text file dictionary file

reads the text file and outputs a dictionary file. Furthermore, the spell checker is invoked with the following command:
./project -d dictionary file article file

For every occurrence of a word in the article file, your program should check whether the word is in the dictionary. If not, please output the word and the basic suggestions in a line like:
aet: at eat pet

Note that the suggestions should be ordered by lexicographically with repeated entries removed. To make things simpler, your program should only output the suggestions for the rst occurrence of a mis-spelled word. For the latter occurrences, please only output the word itself. The exact formats and the sample les will be announced online. Three things will be tested in the competition:
 the accuracy of the spell checker  if the accuracy is 100%, the size of the dictionary le produced  if the accuracy is 100%, the speed of your spell checker

Every team is asked to submit to the mini-competition at least twice (with the two data structures used) and list the results in the survey report. Of course, more submissions are encouraged and welcomed.

Submission File
Please upload a single ZIP compressed le (.zip) to CEIBA. The ZIP le should contain ONLY the following items:
 the source les of your nal spell checker, including any package you use (see below); those source

les can be di erent from what you submitted to the mini-competition (1) (2) (3) (4) (5) (6) (7) (8)

 the report with at most ten A4 pages in PDF format. The report should contain the following items:

the team members' names and school IDs how you divide the responsibilities of the team members the data structures you compared, including the results submitted to the mini-competition site the data structure you recommend the advantages of the recommendation the disadvantages of the recommendation how to compile your code and use the spell checker the bonus features you implement and why you think they deserve the bonus

You do not need to submit a printed version of your report.

Misc Rules
Report Teams

: No, you do not need to submit a hard-copy.

: By default, you are asked to work as a team of size three. A one-person or two-people team is allowed only if you are willing to be as good as a three-people team. It is expected that all team members share balanced work loads. Any form of unfairness in a multi-people team, such as the intention to cover 2 of 3

Data Structures and Algorithms (NTU, Class 01/02, Spring 2011)

instructor: Hsuan-Tien Lin

the other member's work, is considered a violation of the honesty policy and will cause both members to receive zero or negative score.
Data Structures and Algorithms

whether they were taught in class.


Packages

: You can use any data structures and algorithms, regardless of

: You can couple your spell checker with any software package (as long as you are not violating any copyright) but you need to clearly cite where you get the code and clearly describing what the source code does in your report.
I/O restrictions

: Your program should only open the text le, dictionary le, article le. Your program cannot open any other les nor access any other things through the Internet.

Platform and Language: As usual, you can only use C/C++ to design your main program. If you use packages from other languages, you still need to call them from C/C++. You can either use Linux or Windows as the running platform of your spell checker. But the submission to the mini-competition needs to be Linux-compatible with a Make le (details to be announced).

: The grading TAs would grade qualitatively with letters: A++[105], A+[98], A[93], B+[88], B[83], C+[78], C[73], D+[68], D[63], F+[58], F[38], F-[18], Z[0]. The score of the team would be the average of all the grading TAs. We reserve the possibility to adjust individual scores in the team based on performance/workload if necessary. The nal project is equivalent to four usual homework sets; the midterm exam is equivalent to another four. Your raw score in the class would be calculated by
Grade

best homework 1:5 + worst homework 0:5 + other homework + midterm 4 + nal 8 : 14 If your spell checker meets the basic functionality and you write down every item reasonably in the survey report, you will get at least B. : We encourage everyone to think about making your spell checker better. The room between B[83] and A++[105] is basically left for bonus. To get bonus points, you need to justify that the additional features/functionality of the spell checker is worth being the bonus in your report. For instance, you can try to compare more data structures/algorithms or add your creativity in designing some good data structures/algorithms for the spell checker. A fancy GUI may be another possibility, but not the only way and very likely not an important way. After all, we are seeking for better data structures and algorithms in this class, not just better GUI.
Bonus

: The general collaboration policy applies. In addition to the competitions, we still encourage collaborations and discussions between di erent teams.
Collaboration

3 of 3

You might also like