Machine rule induction was examined on a difficult categorization problem by applying a Hollandstyle
classifier system to a complex letter recognition task. A set of 20,000 unique letter images was generated
by randomly distorting pixel images of the 26 uppercase letters from 20 different commercial fonts. The parent
fonts represented a full range of character types including script, italic, serif, and Gothic. The features of each
of the 20,000 characters were summarized in terms of 16 primitive numerical attributes. Our research focused
on machine induction techniques for generating IF-THEN classifiers in which the IF part was a list of values
for each of the 16 attributes and the THEN part was the correct category, i.e., one of the 26 letters of the alphabet.
We examined the effects of different procedures for encoding attributes, deriving new rules, and apportioning
credit among the rules. Binary and Gray-code attribute encodings that required exact matches for rule activation
were compared with integer representations that employed fuzzy matching for rule activation. Random and genetic
methods for rule creation were compared with instance-based generalization. The strength/specificity method
for credit apportionment was compared with a procedure we call "accuracy/utility."
1