A Framework for Model-Based Adaptive Training

R1-SOAR: A Research Experiment in Computer Learning

To examine a possible correspondence of human learning with a computer expert model this section presents an experiment of learning.

This experiment is also reviewed in [Laird et al. 1987]. R1-SOAR is an experiment in the use of a general, learner problem solver as a basis for knowledge-intensive programming.

Experimentation with R1-SOAR demonstrates learning performance for computer configuration on a set of 15 typical (real) orders.

The table below gives an indication of the size of the order (column 2) although not shown is how these items break down into a large number of smaller components for the configuration task.

To make learning more difficult quite distinct test configurations were chosen.

The columns on the right show the number of decision cycles taken to complete the order.

A decision cycle is a set of steps to change the problem solving state (e.g., proposing and evaluating candidate operators and executing an operator to change the state).

In brackets are the number of learned productions.

The 15 configuration tasks were run in sequence. This sequence was run 5 times with the learning mechanism alternating off and on. The total number of productions in R1-SOAR after running each sequence of tasks is shown below each column.

For these tests, a bottom-up learning method was used; that is, chunks were only learned for terminal subgoals (subgoals that do not have any further subgoals). Once a goals subgoals are all chunked then it becomes a terminal subgoal, and is eligible to be chunked.

With the bottom-up method, learning is spread over a number of trials. It takes longer to achieve optimal task performance, but there is a real trade-off as well. As chunking proceeds higher in the goal hierarchy, the chunks acquired become increasing larger and more specialised [Rosenbloom 1983]. This means that not only do they put more of a burden on the production matcher, but they are useful in increasingly fewer situations.

At the extreme, a chunk learned for the top goal is only useful if an essentially identical problem reoccurs – a chunked rule for the top goal contains a complete solution to a particular configuration problem. A good indication of how effective the learning of just the low-level rules can be is shown in the third column in the table (just before learning is switched on for the second time).

With the addition of 83 rules most of the tasks showed an improvement of about 50%. Further improvement is possible after another sequence with learning (the fifth column). With the addition of another 47 rules, most of the tasks could be done with very few decision cycles. Nine decision cycles is the actual minimum possible for these configuration tasks due to the major subassemblies involved in the configuration process.


Table 2-1 R1-SOAR Learning [van de Brug, Rosenbloom & Newell 1986]


In these tests the generality of the chunks learned was often limited by the inclusion of exact numeric values in their conditions, rather than having less restrictive (but still correct) conditions based on predicates such as greater-then and less-then. The resulting productions could only be used in very similar situations. However, other chunked rules did exhibit their generality by being useful in later problem solving. As shown in [van de Brug, Rosenbloom & Newell 1986] and [Laird et al. 1987], this facilitation occurred in three ways:

  • as across-trial transfer of learning;
  • as within-trial transfer of learning;
  • as across-task transfer of learning.

Across-trial transfer is a straightforward caching of results – a chunk learned while solving a problem can be reused when the same task is attempted again. Caching is the term used by [Laird et al. 1987]. Within-trial transfer of learning occurs when a chunked production that is learned during one part of a problem is reused for some other part of the same problem. In R1-SOAR, within-trial transfer occurs most often during the process of evaluating a set of operators that are competing to be the next operator applied. I.e., when there is a choice-set of competing operations for achieving a goal. Part of the evaluation process involves a look-ahead search in which each operator is tried to see how good its result is.

During this process many situations are similar, therefore a production learned when evaluating one operator can be used again when evaluating some other operator. Within-trial transfer of learning occurred between two and eight times for tasks that performed search by using a selection problem-space. The search-free configurations exhibited no within-trial transfer of learning. Across-task transfer of learning occurs when a chunked production from one task is used in another task. If similar configurations would have been chosen for testing R1-SOAR, across-task transfer of learning would have been better.

However, for this configuration problem most customer orders are not exactly the same. Across-task learning was in the order of 4 to 10 (the number of times a chunk was used, that was learned during a previous problem solving session).

The computer model of learning in R1-SOAR shows a gradual move from a relative knowledge lean problem solver to a very skilled problem solver. The execution of search patterns during Pass 1 is transferred during Pass 2 into well coordinated patterns. The coordinated patterns from Pass 3 are transferred during Pass 4 into fixed patterns of problem solving in Pass 5. These patterns of problem solving are called problem solving methods in this work.
Flower Show
© | | Sitemap