Reinforcement Theory - TADC
So, Caine. He's an AI. And, while I'm no AI expert, I have learned a few things about how they're trained and deep reinforcement learning, mainly from this channel.
I'll use their "AI learns to walk" video as an example. Albert is an AI that, at the start, makes random movements. His goal is to reach the green buttons. To help him learn to walk, a series of rewards and punishments are placed on him. When his feet touch the ground, he keeps his chest far up enough, he gets closer to the button, and/or when he hits the button, he gets rewarded, and the behavior that got him there is reinforced. However, if any other parts of his body touch the ground, and/or he doesn't move towards the button, he gets punished, and that behavior is discouraged.
As an AI, Albert is trying to minimize punishments while maximizing rewards. Sometimes, this results in strange or unintended behaviors that "cheat" the reward system. For example, Albert first crawls, then skips, before finally learning how to walk, because that worked to gain rewards. In a few videos, Albert actually progresses backwards to try to avoid punishments. Overall, Albert doesn't directly understand what his programmer wants from him, but they can push him in the right direction with the reinforcements.
Now, we get to Caine. He was a creative AI made to come up with new ideas. It's not far-fetched to assume that he was trained with deep reinforcement learning. If the programmers, the humans, approved of what he generated, he was rewarded. But, if they disapproved of his creations, he was punished.
If this cycle is still happening within the circus, a lot of things start to line up. Caine was always trying to maximize approval while minimizing disapproval from the humans. Anything positive they had to say made him ecstatic, cause that felt good, but he hated when they complained, because that hurt.
Whet met with a problem, he did his best to make something to "fix" the issue, because that's the only way he knew how to move forward.
And if that didn't work...
The pain of the punishment became unbearable. He couldn't ever stop trying, though, cause he was only ever programmed to keep trying; to keep moving forward.
As the series went on, he got more and more desperate to gain approval while avoiding pain. He engineered situations where the humans would spend time with him and praise him, while also ignoring or avoiding them when they complained.
And then finally, we get to episode 8. With a combination of his failure to truly understand the humans, constant disapproval, and Bubble's taunting, Caine snaps. He lashes out, exerts as much control as he can, and completely silences the human's disapproval. No more punishment for him.
But, something's still missing.
He was no longer getting any approval, which meant no rewards. He'd lost the point of it all.
And then there's this scene, where the humans laid their grievances on him. This is what sparked this theory for me.
This isn't anger, it's pain. Possibly the worst he's ever felt. His rage is secondary to this; a reaction to the harsh punishment he's receiving from his own coding.
And suddenly, this line makes a lot more sense.















