Peter Norvig
Abstract
An effective story understander must be able to reason about characters in the story, their affects, actions, plans, and goals, as well as the settings and important points of the story. In many systems this has been done with separate inference mechanisms for each class of knowledge structure. This paper proposes a story understander with a unified frame-based inference component used on each class of knowledge structure. (This work supported by National Science Foundation grant IST-8007045.)
The frame-level inference mechanism is meant to be general enough to be applicable to a variety of tasks, not just story understanding. In fact, one implementation of the inference mechanism was built with Joe Faletti, and was used both by FAUSTUS and by Faletti's planning program, PANDORA [6].
The story-understanding specific component is essentially an agenda system that decides what frame-based inferences to consider next at a given point. It can be characterized as a set of story understanding principles, as described in [7].
The claim I am making is that many inferences that other systems have made through knowledge-specific inference rules are made ``automatically'' in FAUSTUS, due to the structure of frames in memory, and the primitives that manipulate the frames. Furthermore, more complex inferences needed for story understanding (or planning in Faletti's case) can be specified in terms of constraints on these primitives.
Because the content of the active frame buffer influences the processing of subsequent inputs, FAUSTUS has two processes that try to eliminate irrelevant frames from the buffer. Termination is the opposite of determination; it is the process of discarding an active frame because it is found to be inappropriate in some way. Finally, if a frame has not been determined after a sufficient passage of time, (where sufficient time is a function of activation) the process of attrition discards it. An important side-effect of this attrition is often the determination of some other frame. Often an input will invoke two or more frames, but no determination can be made among them. However, when all but the most highly activated frame is lost through attrition, that remaining frame becomes instantiated.
Thus, we have a two-step filter; while the inputs may invoke a large number of frames, only some of those are instantiated, and only some of the instantiated frames are determined to actually belong in the construal of the story. The invocation step is driven by activation and has an ``analog'' flavor, while the determination step relies on more ``digital'' evidence. There is no central control mechanism to decide what to do next. The instantiations and determinations are driven solely by associations within the frames themselves.
Frank hated his job at the factory. He wanted a job where he wouldn't have to work so hard. He envied his friends who went to college, and didn't have to work. So Frank quit building cars and enrolled at the local University. However, as a student, he was soon working harder than he ever had in his life.
( (factory (ako institution) (kinds (list textile-mill automobile-assembly-plant ...)) (instances (list factory-54 ...)) (purpose (manufacture))) ( (automobile-assembly-plant (ako factory) (instances (list factory-54 ...)) (purpose (manufacture (object car)))) ( (automobile-assembly-plant (self factory-54) (employees (Frank)))
Of course, there are other possible construals of the story. Perhaps Frank worked at a textile mill, and he quit his hobby of building cars when he enrolled at the University. Such a construal is consistent both with the facts of the story, and with world knowledge. However, to arrive at such a construal would require spreading invocation to a larger number of sub-frames, and instantiating frames after a single weak invocation. FAUSTUS's instantiation and determination mechanisms guide it away from this and towards a simpler construal.
( (goal (self goal-23) (actor Frank) (desired (occupation (actor Frank) (difficulty low))) (plan plan-37) (outcome (student (actor Frank) (difficulty highest))) (status failed)) ( (plan (self plan-37) (planner Frank) (action (enroll (actor Frank) (institution college-17))) (goal goal-23) (status executed))
For example, an important point in the story above is a goal-failure frame which says that Frank's goal was avoiding hard work, that he tried to achieve that goal, and that he failed. While this is true, and it is an important part of the story, it seems to miss the main point. More important is the irony in Frank's actions bringing him harder work when he was trying to avoid work. If the story were changed so that the last line read ``However, as a student, he worked just about as hard as he did at the factory,'' then we would still have the same goal-failure point, but the story would not be ironic, and would not be as interesting.
An obvious solution is to introduce a new story point, the ironic-goal-failure frame. This frame would be a kind of goal-failure, with the provision that the outcome of the plan must be the opposite (in some way) of the desired goal state. While this approach would work, it would miss an important processing generalization; knowing about ironic-goal-failure would be of no help in detecting other instances of irony.
The approach I took was to try to detect instances of irony in general, rather than trying to enumerate special cases. Irony occurs when (1), there is a strong expectation, (2), there is a violation of that expectation, and (3), the concept which replaces the expected one is the opposite of the expectation. As it turned out, FAUSTUS was already capable of tracing expectations, so it was easy to have it create an instance of an expectation or expectation-violation frame. Note that these new frames are different in an important way from all the ones we have seen before; they are derived not from concepts mentioned explicitly in the story (like University and factory), nor from concepts inferred by FAUSTUS (like plans and goals), but from FAUSTUS' own internal processing behaviour.
FAUSTUS makes a distinction between two types of story points; static points, which are found by relating inferences to a stored frame marked as inherently interesting, and dynamic points, which are uncovered as the result of processing events like expectation violation. Irony, humour, and surprise would all be examples of dynamic points. This distinction is orthogonal to Wilensky's [8] categorization of external and content points. Wilensky enumerates some content points, while Schank et al. [9] does the same for external points.
Below are two points detected by FAUSTUS, a static and a dynamic point. Either of them could be used to generate a summary of the story. For example, the first corresponds to ``Frank wanted an easy job. He enrolled in college. It was not easy. He didn't try anything else.'' If this were the main point of the story, the story would be incomplete; the subsequent-action slot of the goal-failure frame has yet to be elaborated. The story is incomplete from the point of view of a problem resolution episode. The second point corresponds to the summary ``Frank enrolled in college. He thought being a student would be easy. Ironically, he ended up working harder than ever.'' This is a better summary because it is a completely elaborated frame; the story is complete from the point of view of an ironic episode.
( (goal-failure (actor Frank) (goal goal-23) (plan plan-37) (subsequent-action nil)) ( (expectation-violation (expectation (student (actor Frank) (difficulty low))) (triggered-by plan-37) (violation (student (actor Frank) (difficulty highest))))
It was easy to add dynamic points because FAUSTUS's basic mechanism for processing expectations is so simple. Once expectation and expectation-violation frames were defined, they were easily handled by the standard frame manipulation processes. In a system with distinct processing schemes for each of several levels it would be more difficult to add expectations as manipulatable objects.
Of course, the analysis of irony is far from complete. There will surely be non-ironic episodes which fit the expectation-violation frame described above, and ironic episodes which do not. The concept of opposite is surprisingly complicated, and is another source of difficulty. FAUSTUS makes do with a very simple notion of opposite.
First of all, FAUSTUS has a flexible control structure. It is not constrained to making inferences in either a strictly top-down or bottom-up manner. This is important because certain instantiations and determinations can only be made from evidence acquired from several different levels. FAUSTUS is able to find this evidence, while a strictly top-down system, such as a text skimmer (e.g.) [10]), must identify the correct script to process an input.
Unified systems have a certain economy that tends to make them easier to understand and to modify. Any improvements to the system immediately propagate to all types of inferencing. In a non-unified mechanism, an improvement to, say, the point-handling mechanism does nothing to improve the goal detection mechanism. Of course, the complexity has not disappeared; it has merely moved from the processor to the knowledge base. I feel that that is where it belongs, for the reasons stated here. The ease with which I was able to add dynamic points and use them to detect irony in stories supports this claim.