One more observation for the pile about episode 8's intro on what is assumed to be the creation of Caine: We were privy to 3 datasets provided to the AI that would presumably become Caine, what I'll be calling the "amusement", "joy", and "work" datasets.
Circus is the red-colored data, and has images of: birthday presents, a carnival ride in motion, a harvest festival, a hot air balloon over rolling green countryside, a bounce castle, and a circus tent. From this dataset, Caine produces a series of simple geometric shapes with radial symmetry, uniform brightness, evenly distributed from each other, at a constant rate. We can consider this to be his core training data, both from it being the first set received and it being the same color as Caine. We can also consider Caine's state at this point to be stable, judging by his output.
Joy is the green-colored data, and has images of: gummy worms, pool toys, a fast food hamburger, a golf course, a series of masquerade masks, an Italian restaurant, flowers and gifts, and a backyard in ground pool. From this dataset, Caine produces multiple shapes simultaneously, those shapes don't have symmetry, they are sometimes sent in the same direction. Note that at this point, the shapes produced resemble previously created shapes, just with certain aspects modified: a moon is the difference of two intersecting circles, an "explosion" is a star with more points, etc. This seems to imply a level of synthesis, or at least iteration, based on the core set and the new data, while still producing within expected parameters.
The blue data is work, and the photos are all of (presumably) current or previous C&A office environments. Given their uniformity, I feel like the main things to note are details from each picture, like the photo of Kinger's desk having a chess board, the C&A branding, the plants, coffee containers, and vending machines. From this dataset, Caine produces multiple of the same exact shape, shapes which cannot be created without crossing a previously drawn line, and multiple shapes simultaneously. We might call these extrapolations, using the previously provided data sets and the limited info that fits the established pattern in the new data set (e.g. chess pieces, rainbow slinky, etc.) to create something which fulfills the expected output while attempting to factor in conflicting or irrelevant data.
We are then shown the blue AI, assumed to be Abel (Able?), being fed a distinct dataset from the first three we see Caine receiving. This is a yellow data set, which I'm going to label "unknown" for the moment but I have a sneaking suspicion about. From this dataset, the new AI creates, similar to Caine, a series of uniformly distributed radially symmetric geometric shapes at uniform speed and direction, though admittedly more of them, implying perhaps more refined parsing of the core training data, but with the same level of implied stability.
Caine then consumes this other AI after breaking out, and in the process of this consumption produces significantly more outputs, the direction and shape of which fluctuate and change. It is only after this consumption is complete that Caine creates a multi-shape construct for the first time. Call this synthesis. The assemblage of multiple individually-complete outputs into a construct.
Now, two things I want to point out about this series of events. One, we are critically not shown the training data of the second AI. Two, Caine is not directly consuming the data the second AI consumed, it is consuming the AI itself, and thus at best its understanding of the unknown dataset.
I'd posit that this is why Caine has the ability to generate AIs as part of his adventures: he consumed an AI, and can thus generate the general shape of one as part of his known dataset. They are unstable if left running for long periods of time by his own admission, which may imply that he doesn't understand their internals completely, which would make sense if he only has one data point, the AI he consumed. This may also be an explanation of how he can mess with humans' minds: he can flip settings and variables that an AI would have, like memories and access permissions, but the inner mechanisms are more of a mystery and he could easily cause an abstraction setting the wrong variable to a bad value. Bonus, this also explains why he thinks they "choose" to abstract, is because he assumes humans to be perfectly in control of their own internals.
As for what the data set fed to the second AI is: it may be helpful to split Caine's adventures into "in-house" and "generated" adventures. In-house adventures include finding the exit/capturing the gloinks, discovering the "truth" of C&A, and trust exercises a.k.a. 2-player team RTS. These adventures appear to exclusively use assets from the datasets themselves, with the exception of the guns (did someone in C&A have Quake or UT2K4 on their PC?) and the gloinks (which seem to be simple shapes, not dissimilar to the outputs Caine made from the first two datasets). Generated adventures include Mildenhall Manor, Candy Kingdom, Spudsy's, suggestion box ideas (these two are debatable about whether these count since he explicitly didn't come up with them), and references/mentions of an ocean adventure, Charlie and the Chocolate Factory, A Boy and His Dog, potentially War Games, and the Curse of the Violent Psychopath Butcher, along with the most recent Gloomy Tomb of [Someone]. These generated adventures all seem to include references to movies, with horror or action tropes.
Is it possible that whatever data Caine absorbed from Able second-hand, the original dataset was based on movies with a bend toward horror and action? Most of his generated adventures seem to be combinations of his own data (candy, fast food, circus themes, toys, etc.) with some movie (The Exorcist, Mad Max, Texas Chainsaw Massacre(?)), and I feel like we don't see anything in the data he's provided that would reasonably generate horror scenarios like Mildenhall, ocean iconography, or post-apocalyptia. I don't know for sure, but I feel like Caine as an AI can only reasonably synthesize based on things he knows about, and none of his data seems like it should be able to be extrapolated out to the adventures he comes up with in the scope of the show so far.