Why is that the definition of a group?
if you’re anything like me, you were probably a little mystified when you first started reading about groups. groups are quite possibly the nicest algebraic objects; they're simple enough that their axioms fit on a postcard, but they're rich enough to have interesting structure theorems (cfsg, for instance). but there's an important question that never seems to get answered in intro resources:
why those axioms over anything else? why is it that a(bc) is equal to (ab)c, and not something like (ba)c?
when you start to develop some mathematical maturity, the answer starts to come into clearer focus. but i don't think you should need to wait! the underlying idea is actually fairly simple, and it teaches some great lessons about how modern mathematics is organised. so that's the point of this post: to explain the point of the group axioms in the purest sense possible.
Groups as symmetries — but not really
alright, so the textbook motivation for groups is that they model the concept of symmetry. but what does this mean, really? the problem is that mathematicians mean something much broader than what we usually mean by symmetry in this context. here are some examples:
the textbook example is a polygon of some kind. the symmetries of an equilateral triangle are 120° rotations and flips, for instance. when we say symmetry, we are really talking about a function which takes the polygon to itself, and preserves lengths and angles.
now take a line of dots (not necessarily evenly spaced), extending infinitely in either direction. the relevant "symmetries" here are the functions taking dots to dots which preserve the order of the dots.
take a finite collection of objects of any kind whatsoever; i'll call it X. i can define a "symmetry" of this collection to be a function f: X -> X which doesn't take any two objects to the same object. in this situation, the "symmetries" aren't preserving anything but the size of X: by this i mean that f(X) is the same size as X.
for a more complicated example, take a vector space V and consider all the bijective linear maps from V to V. the "symmetries" are preserving not just the size of V, but the sum and product operations in V as well.
with these examples in mind, i'll define the word "symmetry" in the way it's meant here. given some object X, a symmetry of X is a function f: X -> X which "keeps X the same" in a way we don't care to specify. in each of the examples above, the meaning of "keeps X the same" changed to "preserves lengths and angles", "preserves order", "preserves size", and "preserves operations".
so, this is the point of groups! groups model the things you can do to something while "keeping it the same" in some sense. and from this perspective, it might make sense why we'd want to codify and systematically understand this idea; sameness appears a lot in maths, so if we can prove something about sameness that's always true then that should be really useful.
Defining groups (the nonstandard way)
now that we have a proper motivation, a definition of groups is not too far away. however, it turns out to be different to the usual one. it basically follows from these 3 observations:
in all of the examples above, the symmetries took the form of functions from a set to itself. (in the case of polygons, the function took the points of the polygon to itself.) in addition, if a function keeps any set X the same, it should at the very least preserve the size of the set. therefore, a group should be a collection of bijections from a set to itself. the collection of all such bijections is usually denoted Sym(X), so a group G is a subset of Sym(X).
if you have two functions which keep X the same, then obviously doing one after the other should also keep X the same. this means a group should be closed under composition: if f and g are in G, then f o g should also be in G.
finally, whenever you do something that keeps X the same, you should always be able to undo it. this means a group should contain all its inverses: if f is in G, then f^-1 should also be in G.
so that leads us immediately to a definition of a group!
A group is a subset G of Sym(X) for some set X, which is closed under composition and contains its inverses.
if you know anything about groups, you'll know that this doesn't really look like the textbook definition at all. in fact, the group definition doesn't even make mention of a set X in the first place, just the set G! it turns out that we can massage this definition and show that it's equivalent to the usual one; but first, let’s talk a bit about why you might want a different definition.
put simply, this definition has too much baggage due to the set X. to elaborate, consider the symmetries of a square and of a square pyramid: these correspond to different sets X, so functions will look very different on them, but the underlying geometric operations are the same! eliminating the set X from the definition would "purify" things: symmetries which were genuinely the same could be treated on equal footing.
ideally, then, we'd like to define a group as something independent of a set on which it acts. but how could we do this? well, it comes down to the 3 observations we made when defining them the first time:
"a group is a set of bijections G from a set X to itself." if we want to forget the set X, we can just take G to be a set.
"if two functions in G are composed, then that should also be in G." if G is just a set, then its elements aren't necessarily functions and composition might not make sense. how to fix this? the answer is basically inventing our way out of the solution: we just give G an operation which takes two elements and returns another element (which is exactly what conjugation does for functions). such an operation is called a binary operation.
"if a function is in G then so is its inverse." again, if G is just a set, inverses don't make sense. so we fix this in the exact same way as before: we just give G an operation which takes an element and outputs another element (like inversion does). this is called a unary operation.
so there we go! a group is just a set G equipped with a binary operation (which is composition) and a unary operation (which is inversion). done!
That's still not a group!
...except not quite. the problem with our definition is that not all binary operations behave like composition, and not all unary operations behave like inversion. let's think through some examples to show how this might be the case:
consider a rock-paper-scissors binary operation: the set is {r, p, s}, and the binary operation is defined to take in two elements and return the winner element. for example, r*p = p because paper beats rock, p*s = s because scissors beats paper, r*r = r because that's a tie, and so on. now imagine that r, p, s could be interpreted as functions from some set to itself, and that * could be interpreted as composition. if that were true, then r(p(s(x))) would need to be equal to both (r*p)(s(x)) and r((p*s)(x)), meaning (r*p)*s would need to equal r*(p*s). but this isn't the case: the former is equal to s, while the latter is equal to r! this means the rock-paper-scissors operation * can never be interpreted as composition.
consider another three-element set {0, 120, 240} corresponding to rotations by a given angle, and give it the usual operation of composition (so 0 o 120 = 120, 120 o 120 = 240, and so on). now, give this set a unary operation which takes each element and gives back the same element: 0^-1 = 0, 120^-1 = 120, 240^-1 = 240. this doesn't properly represent the "inverse" concept, because it's meant to be the same as undoing; however, doing 120 and then 120^-1 is the same as doing 240, which isn't doing nothing!
luckily, these examples give us ideas on what we should add to our definition to make it work.
the problem with the rps-operation is that function composition is always associative, meaning the bracketing doesn't matter, while the rps-operation isn't. so we'll add to our definition that the binary operation needs to be associative.
the problem with the {0, 120, 240} inversion was that it didn't correspond to undoing. we can make this precise by saying that x composed with x^-1 should be the same as doing nothing. of course, since G is just a set, it's not clear what "doing nothing" is supposed to mean; we can fix this by choosing a particular element of G, say e, to be the symmetry which does nothing. we codify its "doing nothing" property by saying that x composed with e is the same as just x.
and that's it! finally, we have arrived at the textbook definition:
A group is a set G equipped with an associative binary operation *, an identity element e for which x*e = x, and a unary operation ^-1 for which x*x^-1 = e.
all well and good! well, except...
How do we know that's enough axioms?
after all, we just looked at two examples and said "huh, functions do this but they don't do that, so we'll just add it to the definition". how do we know there aren't any examples which fit our new definition, but don't correspond to functions?
the answer is in the form of a beautiful result:
Cayley's theorem: Every group G can be viewed as a subset of Sym(X) with the usual composition and inversion operations, for some set X.
the proof? just take the set X to be G itself! then interpret each element g of G as a function from G to itself by taking any other element h to g*h. these functions have inverses given by g^-1*h, meaning they are bijections. also no two elements are represented by the same function, since e gets taken to a different element by each g. (it is an instructive exercise to figure out where exactly this argument breaks if you relax associativity!)
in any case, this theorem tells us that our search is done. we have removed the set X from our definition, but we can always put it back and still get nice functions if we like. we've shown that this can be done when X is taken to be G, but the general concept is so important that it has been formalised:
Let X be a set and G a group. A G-action on X is a function φ: G -> Sym(X) such that φ(g*h) is the same as φ(g) o φ(h) in the output.
in other words, a G-action on a set is just an interpretation of the elements of G as bijections on X, in such a way that composition looks the same as the binary operation. in essence, this is why my initial definition is never presented anymore: we don't want to deal with the extra baggage of X, and we don't even need to because it can be encapsulated as a G-action instead, which we know is always possible by cayley's theorem!
What was the point of all that?
now that that's done, the natural question is this: why present a worse definition first to just arrive on the usual one anyway? and here's the answer: because this kind of thing happens all the time in maths, it only becomes clear to you when you've been dealing with it for years.
rings? they're just collections of endomorphisms of abelian groups, and their abstract definition allows them to be thought of that way.
banach algebras? they're just collections of bounded linear operators, and their abstract definition allows them to be thought of that way.
lattices can be defined completely algebraically, but once again, this definition essentially serves to abstract away the concrete notion of a lattice in terms of a partial order.
even categories themselves fall on this list! the definition of a category is bluntly abstract, but it's essentially an abstraction of a place where functions from one thing to another can be chained together.
if you are a beginner then you might not understand a whole lot of that list, and that's okay because you'll get to it eventually. but if you're not quite a beginner then i'm sure you'll see what i mean! all of this can be viewed through the lens of the yoneda lemma from category theory if you're interested in that kind of thing.
tl;dr: if you defined groups purely in terms of what they're used for, then they'd be subsets of Sym(X) for some set X. but since the notion of a group shouldn't depend on the set that it acts on, we can just insist that composition is associative with identity and inverses. these axioms are necessary and sufficient to prove cayley's theorem, which tells us that we can always think of a group in terms of some Sym(X) if we want to. this pattern of definition-making is all over maths.