Thursday, March 10, 2011

Learning Theory 101: Operant Conditioning

As I mentioned earlier in this series, according to the behaviorism branch of learning theory, learning takes place through two types of conditioning. I’ve already told you about classical conditioning, so today it’s time to talk about operant conditioning!

Although the term “operant conditioning” was coined by B.F. Skinner in the 1930s, it actually has its roots much further back. In 1905 (only two years after Pavlov presented his first paper on classical conditioning), Edward Thorndike published his Law of Effect. This law basically says that behaviors that have good consequences will happen again in the future, while those that have bad consequences will be less likely to happen again. Despite the fact that the Law of Effect really sums up operant conditioning quite well, Skinner expanded so much upon Thorndike’s work that it is sometimes called Skinnerian conditioning (it is also sometimes called instrumental conditioning), and Skinner is widely referred to as the father of operant conditioning.

Like classical conditioning, operant conditioning is the process of forming associations between two things. However, while classical conditioning forms a direct and reflexive association between two stimuli, operant conditioning works by forming an association between a voluntary behavior and the subsequent consequences.

In other words, the dog learns that his behavior causes stuff to happen, so he either repeats it or avoids it in the future. Although classical conditioning creates an automatic response, operant conditioning implies that the dog thinks about his behavior, and then deliberately and voluntarily acts in his own best interest.

It is called “operant” conditioning because the dog is operating on the environment (the technical terms for “doing stuff”).

Consequences for (Almost) Every Action
Since operant conditioning is defined by the idea that consequences drive behavior, we need to know what the possible consequences are. Logically, there are three options: something good will happen, something bad will happen, or nothing will happen at all. It’s rare that nothing happens at all. For example, if you are standing and choose to sit, it may look like nothing has happened as a result, but in reality the stress on your joints has changed. Whether that’s a good thing or a bad thing is another matter entirely, but the point remains: most behaviors are followed by some sort of consequence. As a result, we will discuss the other two consequences today.

This means that the core tools of operant conditioning are the good consequences and the bad consequences that happen. When something good happens following a particular behavior, the dog is more likely to repeat that behavior. This is called reinforcement. When something bad happens after the behavior, the dog is likely to avoid doing that same behavior in the future. This is called a punishment. All of dog training- heck, all learning by any species- happens because a particular behavior was either reinforced or punished.

The Good Lord Giveth, and the Good Lord Taketh Away…
Both reinforcement and punishment can happen in two ways: something can be added to the situation, or something can be taken away. These two eventualities are respectively referred to as “positive” and “negative.” Since we tend to think of these words as value judgments, this is probably the hardest part of learning theory to understand. It is therefore important to understand that in this case, “positive” doesn’t mean good, and “negative” doesn’t mean bad. Instead, think of these two terms in the mathematical sense: something has been added or something has been removed from the situation.

As a result, there are four possible consequences following a dog’s behavior:

Positive Reinforcement (R+): This is where something is added to the situation (positive), and the behavior increases as a result (reinforcement). The behavior increases because whatever added was desirable, and the dog wants it to happen again. You can think of R+ as a reward.  
Example: You ask your dog to sit. He does, so you give him a treat. Next time you ask, he sits again. Because you added a treat and the sitting behavior increased, it is R+.

Positive Punishment (P+): This is where something is added to the situation (positive), and the behavior decreases as a result (punishment). The behavior decreases because the thing that was added was unpleasant, and the dog doesn’t want it to happen again.  
Example: You are walking your dog, and he pulls on the leash, so you give a collar correction. Next time you walk with him, he doesn’t pull. Because you added a collar correction and the pulling behavior decreased, it is P+.

Negative Reinforcement (R-): This is where something is taken away from the situation (negative), and the behavior increases as a result (reinforcement). The behavior increases because whatever was taken away was annoying or unpleasant, and the dog is glad it’s gone. You can think of R- as escape or relief from something unpleasant.  
Example: You ask your dog to sit. When he doesn’t, you pull up on the leash, putting pressure on the collar around his neck. You maintain this pressure until the dog sits, then release it. The next time you ask him to sit, he does. Because you took away the pressure and the sitting behavior increased, it is R-.

Negative Punishment (P-): This is where something is taken away from the situation (negative), and the behavior decreases as a result (punishment). The behavior decreases because whatever was taken away was awesome, and the dog wants to keep it next time. You can think of P- as a fine or penalty.
Example: Your dog jumps up on you because he wants your attention. You ignore him and walk away. Next time your dog wants attention, he doesn’t jump on you. Because you took away your attention and the jumping up behavior decreased, it is P-.

Putting it all Together
It is important to remember that the dog defines if the consequence is reinforcing or punishing. While you may think you’re giving the dog something awesome, he may disagree. For example, some dogs love carrots. Maisy doesn’t. If I offered her a carrot as a positive reinforcer, I might be surprised when she walked away instead of doing what I asked. On the other hand, Maisy loves to be squirted in the face with water- something that is often recommended as a punishment. If I squirted her in the face to get her to stop barking, it’s very likely that the barking behavior would actually increase. The ultimate test of whether something was a reinforcer or a punisher is if the behavior increases or decreases in the future.

I’ve also found that the quadrants tend to work in tandem. For example, the standard advice for pulling on leash is to “be a tree” when the dog pulls. By stopping the forward movement, we are hoping to decrease the pulling by taking away the dog’s ability to get where he wants to go (negative punishment). When he stops pulling and instead moves back to loosen the leash, we start moving forward again, thus increasing the behavior of keeping the leash slack by adding in movement (positive reinforcement).

Because more than one quadrant is being used for the same skill, it can be difficult sometimes to figure out which one is at work. Therefore, while understanding operant conditioning will make you a better trainer, I don’t see the need to over-think things. Don’t worry about which quadrant you’re in, and whether or not it’s “acceptable” to use it. Instead, focus on the results you’re getting.

If you aren’t getting what you want, consider why that is. Are the consequences that you’re providing actually reinforcing or punishing? Is there something else at play? Unfortunately, you cannot control every consequence. Some behaviors are self-reinforcing, and you’ll need to think critically and act creatively. By understanding operant conditioning, you’ll be able to more easily parse out what’s going on and then get the results you want.

Finally, timing counts. No matter which quadrant you're using, your timing must be good. The consequences need to happen as soon after the behavior as possible so that the dog can make the association between the two. The process works best when the consequence comes immediately after the behavior. This is part of why clicker training has become so popular: it allows the trainer to bridge the time between the behavior and the consequence, making it clear to the dog why he's being rewarded.

Still, no matter how you train, no matter what methods you're using, operant conditioning is at work. If you like what your dog is doing, reward him for it! If you don't, punish him. Just remember that punishment doesn't need to be painful or scary. I much prefer negative punishment- removing things the dog likes- and have found it very effective.

I hope this post has helped you understand operant conditioning better. It's a complicated subject, but I find it fascinating!

In addition to the links in this post, you may find the following websites interesting:
This site has a nice summary of operant conditioning, plus a bonus video of a pigeon in an operant conditioning chamber (also called a Skinner Box).
A Brief Survey of Operant Behavior, by B.F. Skinner.
An in-depth historical look at operant conditioning.
Learn more about Thorndike here.
Two great posts by Patricia McConnell: Are you all positive? and The Positives of Negatives and the Negatives of Positives.


Ettel, Charlie Poodle, and Emma Pitty said...

Again, I love these posts!

I remember having a really hard time keeping R- and P- straight when I first learned the theory. I still get them mixed up sometimes.


Crystal said...

Thanks, Ettel. I agree- the terminology is confusing, and it took me a long time to feel like I understood it.