CPDT Study Session #2: Schedules of Reinforcement

I’m in the middle of the two week period of time I’ve set aside to study for the learning theory part of the exam. I actually haven’t done much reading yet, both because I’ve been busy and because I’m pretty confident about my knowledge in this section. One thing I did want to firm up was my understanding of basic schedules of reinforcement. These schedules specify the timing and frequency of reinforcement, and each type can be useful in the right situation.

Continuous Reinforcement Schedules (CRF)

Most training starts here, with the continuous rate of reinforcement. This means that every time the dog does the behavior, he gets reinforced. It works best during the teaching phase, and it helps establish a strong contingency between the behavior and the reinforcer.

If you use a continuous reinforcement schedule, keep in mind that these behaviors are quite susceptible to “extinction,” which means that if you stop reinforcing the behavior, the dog is going to stop the behavior. Since it can be difficult to be sure that you reinforce every instance of a behavior, this schedule is a bit impractical. This is why most trainers switch to some kind of variable reinforcement schedule, but it is possible to use a continuous rate for the life of an animal (indeed, it’s what the Baileys- arguably some of the best animal trainers of the 20th century- used most of the time).

Partial (or Intermittent) Schedules (PRF)

There are several types of partial (sometimes called intermittent) reinforcement schedules. Although each type is distinct from the others, they do have several things in common. These are used when a continuous schedule is simply too cumbersome, for whatever reason. They are more resistant to extinction, and they typically feel more “natural” to people. You do need to be cautious that you don’t “thin” the schedule too quickly as this will cause “ratio strain” and degrade the quality of the behavior.

Fixed Ratio (FR)

A fixed ratio is when the reinforcer is given after a certain number of behaviors. The number after the abbreviation informs you how many behaviors need to be done before reinforcement is earned. For example, an FR5 means the dog must do five sits (or whatever) before receiving his treat.

Fixed ratios will produce high, steady rates of responding due to their systematic, consistent, and predictable nature. That said, fixed ratios also have a “post reinforcement pause” where the dog will briefly stop doing the behavior immediately after being reinforced. Their response time will increase as they approach the next opportunity for reinforcement. If your ratio is very high (such as an FR400), the post reinforcement pause will be longer.

Variable Ratio (VR)

In a variable ratio, the frequency of treats given is variable from trial to trial and should happen after an unpredictable number of times. It’s typically done around an average number of times. For example, a VR4 would mean that the treats are given approximately 1 out of 4 responses. During a series of behaviors, the treat may be given on the 2nd repetition, the 6th repetition, and then the 4th repetition.

Variable ratios yield high, steady rates of responding, and there is a much lower rate of post response pauses. This schedule is also more resistant to extinction and useful for fading out a fixed ratio schedule. That said, a truly variable ratio is difficult to achieve as we humans tend to be pattern dependent.

Fixed Interval (FI)

In a fixed interval, reinforcement is given after a certain period of time. An FI5 would indicate that reinforcement is given for the first correct behavior after 5 seconds (or minutes, depending) has passed since the last reinforcement.

Interval schedules (both fixed and variable) are great for teaching duration behaviors. A fixed interval is prone to extinction, though, and has a pronounced post reinforcement pause. In this case, the pause is “scallop-shaped;” the behavior levels off in the first bit of time, and then increases in frequency as the time for reinforcement comes due. This is similar to a student checking the clock more frequently when class is almost over.

Variable Interval (VI)

In this schedule, reinforcement is given on an average amount of time, which means the first correct behavior after an unpredictable amount of time has passed is reinforced. Like the variable ratio, a VI4 would mean that the reinforcement happens approximately every 4 seconds (minutes, etc.), but that the amount of time elapsed will change from trial to trial.

This schedule produces a slow, steady rate of responding, although you don’t tend to get a particularly high rate of behavior. It has good resistance to extinction, making it particularly good for fading out a fixed interval schedule. Like the variable ratio, it can be difficult to be truly unpredictable.

When I get around to it, I’ll post about the differential reinforcement schedules. There are quite a few of these, and they are arguably more interesting than these more basic schedules. But for now- what are you guys studying?

