Data Mining Using Existing Dichotomous Keys


Jalpa P. Trivedi

Oral Defence Date: 

Friday, May 23, 2008 - 16:00


SCI 241


4:00 PM


Professors Marguerite Murphy & Hui Yang


A dichotomous key, also known as an identification key, is used for identifying individual species amongst a given collection of species. In Botany, various plants differ from each other according to their physical properties, which are used to determine their universal name (family, genus, and species) in the Linnaean system. The goal of the work presented in this report is to design and implement an interactive user interface for a dichotomous key that will support an existing key reduction algorithm and a simple form of data mining. Key reduction is the process of eliminating unnecessary decision nodes from an existing dichotomous key, when only a subset of the original organisms are to be distinguished. As part of the key reduction algorithm, all of the properties which the designated subset share (positive properties) and do not share (negative properties) are identified, as well as properties that may (or may not) be common (possible properties). This information is the basis for our proposed data mining strategy. The interactive user interface that was designed and implemented for this project allows the user to select a group of organisms from an existing dichotomous key (either based on stored property values or by arbitrary selection from a list) and to generate the associated reduced key with lists of positive, negative and possible properties. Subsets and their associated keys and properties can be stored for later reference. A prototype system is fully functional and exhibits good performance over small to moderate sized existing dichotomous keys.

Jalpa P. Trivedi

Dichotomous key, subset of organisms, positive, negative and possible properties