Discover Top Posts Tagged with #agent design

Intelligence and Wisdom

I'm back!

I've (just about) finished transitioning from living in the middle of nowhere with a job, to living in what claims to be a city with a loosely defined programme of study that happens to pay me a stipend. I'm spending all day reading papers on game theory and multi-agent systems (multiplayer games, essentially), so to demonstrate to myself that I've properly digested the material it seems like a good idea to excrete refined turds of knowledge onto this here blog. I'm too proud of that metaphor to apologise for it. ONWARDS!

----------------------------------------------

Ye olde role playing game Dungeons and Dragons has two player attributes that represent a character's ability to think, labelled Intelligence and Wisdom. Hardened D&D enthusiasts will understand the difference between them, but to neophytes the distinction isn't immediately obvious. Intelligence is favoured by wizards and similar magic-users, and represents the ability to reason and solve problems. Wisdom is favoured by priests and druids, and represents common sense and intuition.

Interestingly, this duality is supported by cognitive science, through something called dual process theory. This essentially says that we as humans use two different but complementary forms of reasoning in our day-to-day lives:

System 1 reasoning is unconscious, associative, emotional, intuitive, low cost and very, very fast. This presumably evolved to help us avoid getting eaten by surprise tigers, but occasionally misfires and gives rise to things like arachnophobia and racism. It's present in both humans and animals.

System 2 reasoning is conscious, logical, intentional, expensive and slow. It's only really present in humans (and perhaps some other high-functioning animals). This allows us to operate at more than a purely instinctual level through abstract hypothetical thinking and reasoning about the future.

By these descriptions, one can broadly infer that "System 1" is essentially "Wisdom" and "System 2" is essentially "Intelligence".

[[ I should probably add the disclaimer that like a lot of the weaker sciences there's a reasonable chance this dual-view is wrong, but it's a useful approximation for what follows ]]

As this is allegedly a part-time AI blog, I'm now going to link these concepts to how we develop thinking machines (a term I will use interchangeably with "agent"). Note that I personally view people as very complicated thinking machines, so a lot of this stuff might reflect back onto the cognitive stuff. This post isn't going to be particularly game/design heavy (or indeed that AI heavy, really), but it's hopefully a useful preliminary to future discussion.

Domain

At any given point in time, from any given state, any thinking agent is going to have a set of actions which it's able to perform. In a game world this might be something like: "move my knight to square D3". In the real world this might be something like "KILL THE HUMAN". After performing this action, the agent's internal representation of the world is usually expected to be different (e.g. "my knight is in square D3" or "THE HUMANS ARE DEAD").

The point is that the action space of the agent almost completely defines the representation of the world the agent will use to reason and plan. This base form (decision theory) can be extended to take into account actions by other agents (game theory), randomness (stochastic events), or even encode gaps in the agent's knowledge (information sets / Bayesian games).

Goals

From the above, we assume that the agent is endowed with an internal view of the world, along with a mapping from states to action sets, allowing it to mentally transition from one state to another by pretending to do things. But why should it even consider doing anything? Life is meaningless.

Game theory works on the assumption that rational agents act to maximise utility, which as I've mentioned before on this blog is just a way of expressing a preference ordering over world states - for example, a world containing alive humans might be less preferable than a world containing dead humans. Specifying such an ordering on reachable states (perhaps through a numeric function) gives the agent a purpose - that is, to change the world so that the agent's utility is as high as possible.

For us squishy meatbags, in the absence of a mysterious creator who will unconditionally provide us with an afterlife, I believe that our utility function is entirely determined by the chemicals swirling around our brains that make us feel happy. World states where we believe we will be happy (and continue to be happy, through some time-discounted filter), are the only motivator we have to do anything. Boom. Meaning of life solved.

There's an important distinction here between "selfish" and "self-interested" - a "selfish" agent only cares about itself, but a "self-interested" agent only cares about maximising its utility, which can easily include the utility of other agents or even have no term relating to the original agent itself (though if all agents do this it ends up in infinite recursion and breaks the universe. Hippies).

Intelligence

So, given the agent has a method of comparing preferences over world states...what action should the agent perform now?

Each action the agent is capable of taking will usually change the state of the world - even doing nothing often allows the world to evolve without the agent's input. This means that the set of world states and actions can be viewed as a game tree, with the states as nodes and the actions as edges connecting them - the root node is the state the agent is currently in (i.e. the present), and all the subsequent nodes represent possible futures.

The agent's task is therefore to search this game tree as efficiently as possible to locate the immediate action that is most likely to lead to the highest utility reward. Conceptually, this is (hopefully) fairly straightforward.

The main problem is unfortunately combinatorical. Trees grow in size at a rate exponential to their depth, so searching the entire tree is rarely a feasible option as decisions usually need to be made in a reasonable time. Evaluating large trees also has a large working memory requirement, which limits how deep the search can go. In addition, all the other complexities in the model (other agents, random events, imperfect information) make it hard to simplify the tree and add further complications to searching it.

These problems can be tackled in three different ways:

Brute force Throwing more processing power at a problem naturally makes it solvable in less time - though this might be more costly overall. As tree search problems grow exponentially, you're likely to only get logarithmic speedup for a linear processing increase (read: usually not worth it). Similarly, increasing working memory capacity allows only logarithmic improvements in reachable tree depth.

Better algorithms Certain properties of tree search are independent from the domain. These can be exploited to "prune" branches from the tree corresponding to actions the agent would never take. Other generic optimisations include a "correct" implementation of rationality in the face of uncertainty (which for me is Bayesian objectivism - other incorrect definitions of rationality are available), and the heuristic-independent witchcraft of Monte-Carlo Tree Search (which leverages the ability to quickly try out a lot of hypothetical scenarios and average the results).

Make use of heuristic approximations OK, I'm getting ahead of myself, but this really does need to go here. The search algorithm needs to link up with the heuristics somehow. But what are heuristics? *pauses for effect*

Wisdom

So our agent is now able to reason about the world by simulating actions, has a preference over world states to give it some direction, and is capable of searching possible futures to identify the best decision in the present. The only problem is, the world in its raw form - even for relatively simple domains like Chess - is likely to be extremely complicated in practise, and the amount of processing power available is likely to be extremely limited. This means only a very small fraction of the tree will usually be searched in a reasonable time, resulting in a terrible immediate decision.

This, as you may have surmised from the section header, can be solved by "wisdom", in the form of heuristics. The gist of these techniques are that they compress complex features of the domain down to easily digestible chunks, which allows cheap reasoning at the expense of complete accuracy. However, they require specific knowledge about the domain and goals, which means they usually have to be learned through experience or specified by a designer.

Depending on the processing resources available to achieve a particular goal or subgoal, these can be relied upon heavily (resulting in cheap and dirty "system 1 reasoning"/"wisdom") or mostly ignored (resulting in high quality but expensive "system 2 reasoning"/"intelligence"). The degree to which this occurs is, again, either something that can be learned through experience, or is (more likely) a conscious decision by a designer. There are a LOT of ways of implementing heuristic methods, but essentially they all boil down to modifications to the tree search. Examples include:

Abstraction The size of the universe is clearly bigger than the size of our brains (or other computational artefact), by the simple observation that one is embedded within the other. Thus, when we think about the future we're not actually manipulating a copy of the world itself, but an abstracted representation of it - and we still manage to (just about) function as intelligent entities. Dealing with a more compressed representation is cheaper, but less accurate, so it's important to find the correct level of abstraction for the task at hand. In a general domain, we can reason at the base level of actions and reactions, but if the goal is very deep in the tree it's unlikely we'll ever reach it. For example, if the domain is at the level of "move arm 10 degrees upwards", "tighten pincers by 1cm", then a goal of "crush that human who's standing over there" requires a huge chain of actions to accomplish. In this case, it's a good idea to view the world in several levels of detail: <> At the top level will be actions like "move next to human", "crush human" and so on. This allows us to create a high level plan consisting of a small number of steps to reach our goal. <> At an intermediate level will be things that break down the higher level actions into smaller ones, such as "move one step north", "raise arm to optimal crushing height", "crush". Each action in a higher level plan can be expressed as a plan in an intermediate level. There can be any number of these, depending on the complexity of the problem. <> Finally, at the bottom level will be the raw domain-defined actions themselves, which will be used to construct a plan for each step of the intermediate level above. In general, this approach won't find an optimal plan, but it will typically find one that achieves the goal relatively efficiently in a reasonable amount of time.

Evaluation functions If there are a small number of goal states, but a large number of intermediate states, it's easy for a search algorithm to cover a large portion of the tree without ever getting near a goal. For example, without any further guidance a robot capable of moving in any direction might first investigate a route directly away from the goal location, and only stumble upon the goal once it had exhausted all other options. This is clearly bullshit. This isn't quite expressed by the utility function given in the goals section - intuitively, an agent is only happier being closer to a goal node if it knows that it's closer to a goal node, and if it knew that with perfect accuracy then there would be no need to form a plan to get there. What an agent could have is a rough idea of how close a certain state is to a goal - this is known as an evaluation function, and its purpose is to concentrate the search along the most promising parts of the tree, even if an optimal solution may be missed (though in some cases, an optimal solution can be guaranteed). If the goal is sufficiently far away, the tree can be truncated at an arbitrary depth and the evaluation function used to provide the "utility" at the leaf nodes, hopefully guiding the agent along the right path initially and then searching again later.

Opponent modelling This point only applies to simulation of other agents, so isn't applicable in all domains. Accurately modelling how other agents will behave in certain situations increases the validity of the simulation, as you won't waste time considering actions they'd never perform. However, doing this well is costly. This is type of empathetic reasoning is complicated - most animals can't do this, and humans slowly develop this during early childhood (see Theory of Mind). It's unlikely there will ever be a way of doing this that's independent of knowledge about the agents you're modelling - even the two-player zero-sum case makes the assumption that your opponent gains exactly as much utility from your misfortune - so I'm including it here.

Forward Pruning / Reflexes These are essentially the same thing. Given a world state, it's often possible to extract salient features and categorise it somehow. Once categorised, it's often possible to ignore some or all-but-one of the actions available to the agent at this point, because it's "usually" incorrect to perform them. For example, all positions next to a human could be categorised as "human_adjacent". In such positions, with the motivation of liking world states with fewer humans, it's clearly incorrect to do anything other than destroy them - thus actions that generally don't result in the destruction of said human (such as "run away", "rotate slowly", and "whistle") should usually be ignored. Empirical testing may have shown that nearly all humans aren't destroyed by whistling, so even if this particular human has a weakness to high-pitched noise, it's probably not worth trying over more mainstream butchery. Reflex actions are a special case of this where only one action is left. An agent can be constructed using nothing but reflex actions, but in this case the decision tree turns into a decision line, and it ceases to be intelligent. (Reflexes in humans/animals are a similar idea, except I suppose a more accurate definition would be that the action space is heavily reduced rather than reduced to exactly one option, mostly due to it being continuous rather than discrete)

Conclusion/Summary

The domain is the action space the agent lives in.

The goals specify where the agent wants to live.

The ability to perform tree search efficiently (intelligence) allows the agent to reason about the future, in order to reach its goals.

The ability to recognise patterns in the world (wisdom) allows the agent to simplify the search process and reach near-optimal conclusions with considerably less effort.

This post turned into something of a monster. I'm intending to refer back to it a lot in the future, so it's likely to undergo excessive revision as I change my mind about things and generally restructure.

Hopefully this was a useful way of thinking about thinking :)

#ai #cognitive science #overview #agent design