Review of

Machine Learning

Ryszard S. Michalski

Jaime G. Carbonell

Tom M. Mitchell

Tioga publishing company, Palo Alto, CA, 1983, 572 pages, ISBN 0-935382-05-4.

Machine Learning is in the enviable yet difficult position of being the first published collection of papers on the subject. By ``machine learning'' the editors refer to three distinct areas: the engineering of computer systems to learn in particular application areas, the theoretical analysis of learning algorithms, and the simulation of human learning processes. However, there is very little material on the third topic, and almost no mention of neurological models or biological theories of adaptation. The word ``machine'' is as important as ``learning.''

The book is divided into sixteen chapters arranged into six parts. The editors use the first chapter of Part One to present us with their outline of the field, and of the book. The remainder of Part One is an informal introduction by Herb Simon. He calls for more research into human learning, in understanding why it takes humans so long (years or decades) to learn, in natural language processing, and in learning by discovery. Finally, he warns that researchers should have well-defined approaches to learning; working on the right problem for the wrong reason rarely leads to good results. The chapter is definitely a personal viewpoint; in the span of 10 pages, Simon manages to start 20 paragraphs with `I' or `my.'

Part Two consists of two chapters by Michalski on learning by example. The first (co-authored by Dietterich) is an overview that presents and compares Winston's work on the blocks world, Hayes-Roth's and Vere's inductive learning research, Buchanan's META-DENDRAL program, among others. Each is analyzed and contrasted according to the criteria set forth in this chapter, but unfortunately, there is no attempt to relate the analysis in this chapter to that in the introduction. This is true of most of the other chapters as well.

The editors suggest that the book fulfills several needs. As an easily accessible collection of state-of-the-art papers for the AI researcher, it is appropriate and useful. But as a textbook, or an introduction for readers from other fields in the cognitive sciences, I feel it will prove to be difficult reading.

Part Three, the longest part, contains four chapters on learning in problem-solving and planning. The first, by Carbonell, starts from Schank's theory of memory organization and reminding, and draws the conclusion that almost all problems can be solved by analogy to similar, previously-solved problems, rather than resorting to general, weak methods. This is one of the more ambitious chapters in its goals to model, in a unified manner, human memory, learning, and problem-solving.

The next chapter describes Mitchell et al.'s LEX program, which learns heuristics for solving symbolic integration problems. I believe this work is fairly well-known (for example, it was presented as the Computers and Thought lecture at last years' IJCAI), and will not discuss it further here. The following chapter also presents well-known work, namely John Anderson's ACT theory of learning. This time it is applied to learning how to prove high school geometry problems.

In the final chapter of this part, Frederick Hayes-Roth makes the assumption that in the real world, knowledge is incomplete and contains errors. He addresses the problem of learning how to correct these errors, and how existing theories can be used to construct better ones.

Part Four is concerned with learning from observation and discovery. The first chapter contains yet another retrospective on AM and EURISKO, but they are presented in Lenat's usual entertaining manner. Lenat ends with a conjecture that biological evolution might be guided by heuristic rules, rather than by random mutations.

Switching from biology to chemistry, Langley et al. describe the BACON production system, which has rediscovered regularities such as Ohm's law. It does this with data-driven heuristics, in contrast to the theory-driven heuristics of AM. In other words, the program can analyze numerical data and derive conclusion from it. This may seem more like statistics than AI. The authors conclude that noisy data will lead to longer search times and they accept that, reasoning that the history of science has also progressed very slowly. However, it seems that when science has stagnated, it is because of a need for a Kuhnian paradigm shift which would introduce new concepts to be measured, not because of a need of inadequate search for relations between existing measured concepts. BACON does not address the problem of deciding what to measure, only the easier problem of deciding what to do with given measurements.

Another chapter by Michalski (co-authored with Stepp) covers the problem of classification.

Part Five consists of three chapters on learning from instruction. Mostow discusses operationalization, the refinement of advice into effective procedures. The idea is to develop domain-independent techniques for refining domain-specific knowledge. Although not presented as such, the work can be thought of as an exercise in automatic programming.

Haas and Hendrix answer Simon's call for an extensible natural language interface that can participate in mixed-initiative dialogues with a user to learn new concepts. Rychener gives a retrospective of the instructible production system.

Finally, Part Six discusses two applications of learning techniques. Quinlan, like Michalski and Stepp, discusses classification procedures as a theoretical problem. He applies this to the K-R vs. K-N chess end game, and shows that from most positions, it is a win for the side with the rook. Chess experts had previously thought that most such positions would lead to draws, so this is a new result in chess.

Sleeman attacks the problem of defining buggy algorithms. At first glance this may not seem useful, but it is in the context of discovering why students are making particular mistakes in solving algebra problems. The difficulty is in determining what model the student is using, without requiring the student to provide a large number of trials.

The book includes a very complete bibliography of machine learning (572 entries cross-indexed by categories), a glossary, a subject index, and an author index. In addition to the bibliography, each chapter has its own list of references.