While researching dog digestion, Ivan Pavlov accidentally made an interesting discovery; the dogs were learning to pair sounds in their environment with their food, eliciting natural responses like salivation even when no food was present. Pavlov had accidentally stumbled upon the concept of classical conditioning. He looked closer into this concept and made many interesting discoveries which became invaluable in understanding how learning works in both humans and animals. In his experiments, there was an unconditioned stimulus (US), which is something which elicits a natural response in subjects. For example, food causes salivation- the salivation is the unconditioned response (UR). By pairing a neutral stimulusĀ such as a tuning fork with the unconditioned stimulus, the dogās learnt to pair that sound with food, and began to salivate when the tuning fork was rung, even without food being present. At this point, the salivation is now a conditioned response (CR), and the tuning fork is a conditioned stimulus (CS).
It is considered learning when the animal responds to the CS without the US. This initial learning is called acquisition, after the acquiring of new behaviour. Repeated pairings of a CS and US can yield stronger CRās, but only to a certain degree. The order and timing of the pairing also impact the strength of the CR. The fastest acquisition occurred when the tuning fork is rung, and while it is still ringing, the dogs are presented with food. This is called delayed conditioning. There are other kinds of conditioning but have not been shown to be as effective.
Trace conditioning: The presentation of the CS, followed by a short break, followed by the presentation of the US
Simultaneous conditioning: The CS and the US are presented at the same time.
Backward conditioning: US is presented first and is followed by the CS.Ā
Unlearning behaviour is known as extinction. This is defined when the CS no longer causes the CR. An interesting part of this process is spontaneous recovery, where after the extinction of the conditioned response, it briefly reappears when presented with the conditioned stimulus. The tendency to respond to similar CSās is known as aĀ generalisation (the dog responds to a bell as well as a tuning fork). Subjects can also learn to differentiate or discriminate between different stimuli.Ā
Many other experiments looking at Classical Conditioning have taken place, however, one significant one was John Watson and Rosalie Raynerās Little Albert experiment. They brought a little white rat to the little boy and taught him to associate the rat with a loud bang, causing him to be afraid of the rat. The boy ended up generalising, as he was afraid of other white fluffy things such as beards and a white rabbit. This is an example of aversive conditioning. Where a negative response is taught to a subject, rather than a positive one. Another example of aversive conditioning is the use of horrible tasting nail polish to dissuade nail biters from biting.
When a CS elicits a CR, itās briefly possible to use the CS as an US to condition a response to a new stimulus. This is second-order, or higher-order conditioning.Ā
Biology & Classical Conditioning
Classical conditioning does not always work. Recent research has shown that humans and animals are more biologically prepared to make certain connections over others. A good example of this is learned taste aversions. If you eat too much of certain food and begin to feel nauseous, you will learn to avoid that food or drink. They can cause powerful aversions after just one bad incident, and in fact, the two events are often separated by several hours. Taste aversions most commonly take place with strong and unusual tastes. The CS (the food) must be salient (noticeable) in order to cause the aversion. John Garcia and Robert Koelling performed an experiment showing how rats made certain associations more than others. The results of that experiment are shown in the table below. The ease with how animals learn taste aversions is known as the Garcia Effect.Ā
Operant Conditioning is a type of learning based on the association of consequences with behaviour. Edward Thorndike was one of the first to research this phenomenon. His experiments involved putting a cat into a puzzle box. The cat was in a cage next to a bowl of food and had to get out to get the food. The amount of time it took for the cat to get out of the box decreased over a series of trials. The amount of time decreased gradually showing that the cat was learning the new behaviour without mental activity, but just by connecting a stimulus and a response. This led to Thorndikeās law of effect, which states that if the consequences of a behaviour are pleasurable, the stimulus-response connection will strengthen, and the subject will likely repeat that behaviour. If the consequences are negative, the likelihood of the behaviour will decrease. He labelled thisĀ instrumental learning.
B.F Skinner coined the term operant conditioning. He invented a contraption called a Skinner box which delivered food to an animal by pressing a lever, pushing a button, or pecking a disk. The food is a reinforcer, and giving the food is reinforcement. There are two types of reinforcement; positive and negative. Positive reinforcement is the addition of something pleasant, and negative reinforcement is the removal of something unpleasant. Escape learning allows a subject to terminate an aversive stimulus, while avoidance learning enables them to avoid the stimulus completely. If a child causes a fuss in class and is asked to leave, that is escape learning. If he decides to skive off altogether, thatās avoidance learning. Behaviour can also be shaped by negative consequences. This is known as punishment. Positive punishment is the addition of something unpleasant, while negative punishmentĀ or omission training is the removal of something pleasant.
Punishment vs Reinforcement
Punishment is operant conditioningās aversive conditioning. It is most effective if it is delivered immediately after the undesired behaviour. Harsh punishments can have unintended consequences, for example hitting your dog may dissuade it from misbehaving, but may cause fear or anger in the dog.Ā
To get the rat to pull the lever in the box, Skinner used a process known as shaping. Shaping reinforces the steps used to arrive at the desired behaviour. If youāve ever trained a dog, youāll know that training requires incremental steps towards the behaviour, instead of immediately expecting them to do it. By rewarding approximations of the behaviour, it increases the chances they will stumble upon it later. Animals can also be taught to perform a number of responses successively. This concept is known as chaining.Ā
Vocabulary used for classical conditioning also works with operant conditioning. Here they are with the context of a rat in a skinner box.
Acquisition: The rat learns to press the lever to get food
Extinction: The rat stops pressing the lever as it no longer gets food
Spontaneous Recovery: After extinguishing the original behaviour, without further training, the rat begins to press the bar again.
Generalisation: If the rat presses things like buttons, not just levers to get food
Discrimination: If the rat is only taught to press a specific bar, or only to press the bar when a sound is playing (in this scenario, the tone is a discriminative stimulus.)
There are two kinds of reinforcers. Primary reinforcers are naturally rewarding.Ā Things like food, water, and rest that we donāt need to learn to enjoy. Secondary reinforcers are things we have learnt to value, such as praise or allowing someone to play a video game. Money is a generalised reinforcer, as it can be traded for virtually anything. An application of generalised reinforcers is called the token economy, where every time a person in a token economy performs something desired, they receive a token that they can trade for one of a variety of reinforcers. Not all reinforcers are desirable, and theyāre not desirable all the time. Try rewarding a teen whoās just stuffed their face with cake, with even more cake and see how willing they are to win it. The idea that the reinforcing properties of something depend on the situation connects with the Premack principle, where whichever of the two activities is preferred can be used to encourage the less preferred activity.
When behaviour has just begun to be learnt, continuous reinforcement is best, however, once it has been learnt, a partial reinforcement schedule tends to be ideal. According to the partial-reinforcement effect, behaviours are more resistant to extinction if the animal hasnāt been reinforcement hasnāt been done continuously. The types of partial-reinforcement are described in the table below.
Noticing a break in a variable pattern is much more difficult, which is why variable patterns are more resistant to extinction.
Biology and Operant Conditioning
As cool as operant conditioning is, it has its limits. Animals will not perform certain behaviours that go against their natural inclinations, for example, rats will not walk backwards. The tendency for animals to ignore rewards to pursue their typical behavioural patterns is called instinctive drift.
The Pavlovian model of classical conditioning is known as the contiguity model as it states that the more times two things are paired, the greater the learning that will take place. Robert Rescorlaās research revised the Pavlovian model to apply it to more complicated scenarios.
In his experiments, he had two dogs, both of which were presented with food and a bell 10 times. However, one of the other dogs was also presented with 5 trials where the food was given with no bell, and 5 trials where the bell was rung with no food. Common sense says the first dog would have a stronger response, but the contiguity model says that their responses would be the same.
In comes Rescorlaās revision; the contingency model. This model states that A is contingent upon B when A depends on B and vice versa. In other words, the presence of one event reliably predicts the presence of the other. For the first dog, the food is contingent upon the presence of the bell, however, for the second one, the relationship between the US and the CS is less clear, making the following response less strong.
Ā As children grow up, they learn how to behave based on how the people in their lives do. This is known asĀ observational learning, orĀ modelling, and was studied extensively by Albert Bandura while he was forming his social-learning theory.Ā
Modelling has two basic components: observation and imitation. In a scenario with two brothers, while the older brother is playing football outside, the little brother watches his little brother, and imitates his behaviour, playing football as well. However, modelling isnāt all positive. Children who grow up in abusive environments are more likely to model that behaviour when they grow up, leading to a cycle of behaviour.
Edward Tolman did substantial research into latent learning. Latent learning is learning that only becomes obvious when reinforcement is given for learning it.Ā
In his experiments, Tolman had 3 groups of rats go through a maze. One group was rewarded every time they finished the maze. Their performance in the maze improved rapidly. The second group was never rewarded and showed gradual improvement. The third group wasnāt rewarded for the first half of the trials, but for the second half received a reward. In the first half, their performance matched with the second group, however in the second half of the trials, their performance spiked, showing that the rats had learnt their way around the maze in the first half of the trials, however, their performance didnāt drastically improve because they werenāt motivated to improve.
Abstract learning is the idea that we learn in general, not necessarily about specific behaviours. Some animals used in skinner boxes like pigeons and rats have shown this ability. Pigeons, for example, have learnt to peck pictures that had never seen before if those pictures were of chairs.
Wolfgang Kohler performed insight learning experiments on chimpanzees. Insight learning occurs when someone suddenly realises how to do a problem. A moment of insight can happen when youāre taking a test when all of a sudden you realise what the answer is. Kohler argued that learning happened in this sudden fashion because of insight and not because of the gradual strengthening of S-R connections.Ā
In his study with chimpanzees, he put them into scenarios to see how theyād solve problems. In one, he suspended a banana from the ceiling, out of the chimps reach. He found that the chimps would spend most of their time unproductively using their time rather than gradually working towards the banana. All of a sudden, they would have a moment of insight, and stack boxes to reach it.Ā