>If the dual problem is completely infeasible–there’s no way to satisfy the constraints–then the primal problem is unbounded, which means you can get infinite profit. I’m confused— if you can’t make costs such that you receive >$5 per thing, why can you get infinite profits? Wouldn’t the problem just, not work?
First, I should confess that I did make an error in my original post on linear programming. If the primal is unbounded, then the dual is infeasible. But it’s possible for both to be simultaneously infeasible.
Still, even after that correction, your question is interesting, so I’ll try to answer it. The unhelpful answer is that the Strong Duality Theorem tells us that we can convert an optimal solution to the primal into an optimal solution for the dual, and vice versa.
Proving this would require being a bit more precise about exactly how you construct the dual, but it’s a fundamental fact about the equivalence in the linear case; but basically, because of the way the constraints are set up, a solution that gives you a finite maximum profit corresponds exactly to a solution that gives you a finite minimum cost.
Probably the best way to intuitively understand this is to think about a competition. (This isn’t even really a metaphor; basically any two-player constant-sum game can be translated pretty trivially and straightforwardly into a linear programming problem, and vice versa).
Suppose you’re the primal problem player---you’re trying to maximize your profit, but you have to take into account the fact that your opponent is trying to minimize your profit. The constraints here represent your opponent’s optimal strategy.
Then if you have have a best possible outcome, that means your opponent has a strategy that holds you to that outcome. So if you have a finite feasible solution, so do they.
In this framework, the dual problem is instead asking to find your opponent’s optimal strategy, assuming that you play optimally. And if your opponent has a best possible result, that implies a strategy on your part that means they can’t do any better.
And framed this way, you can see why a solution on one side basically implies a solution on the other. If I want to know what my optimal play is giving optimal play on the part of my opponent, I basically need to know what optimal play on the part of my opponent is.
The thing we’ve lost in moving to the games framework is the possibility of unbounded payouts and infeasible setups---because games always have legal moves, and games with finitely many moves will always have a maximum payout. But not all linear programming problems are set up that way.
If my side has an unbounded payout---if I can pick a number and earn that much---then conceptually, my opponent has no choices at all. This corresponds to their problem being infeasible. (Recall, any solution on their part represents their ability to impose some upper bound on my winnings). Similarly, if they have an unbounded payout, that means I have no valid moves.
The possibility I left out in the original piece is the last one: neither of us has any legal moves at all. I don’t think it comes up much in practice but it’s definitely possible.
@raginrayguns said: So I just learned a geometric trick (in the sense of, a trick you can do by drawing a particular picture) for finding rational points on the unit circle. Is that a good way of thinking of this or should I not be thinking of it as a trick? I mean am I supposed to be thinking “so clearly to find rational numbers, you draw this vertical line, etc etc”
This is a really good question (in response to this post), and so I had to think a while to figure out how to answer it.
I think there are two different "big ideas" you could get out of that post. One is I think the one you're getting at. You can find new points by intersecting lines and curves.
I associate this mostly with algebraic geometry. One of the foundational results in algebraic geometry is Bezout's theorem, which says that (up to some very important technicalities) the intersection of a degree $n$ curve and a degree $m$ curve contains exactly $nm$ points. We can see that in the example I gave; we intersect a line (degree 1) and a circle (degree 2) and get two points.
But $x^4+y^4 =1$ is degree 4, so we expect our lines to intersect it four times. Instead there's only the one intersection. What's wrong?
The answer is that one of the requirements for Bezout's theorem to hold is that we're working over an "algebraically closed" field--i.e. every polynomial equation has a root. The complex numbers are algebraically closed; the reals and the rationals are not. So there are four complex points of intersection between $x^4 + y^4 = 1$ and $y = 2(x+1)$, but there's only one rational intersection.
So this idea is very important to algebraic geometry, which tends to just work over algebraically closed fields. But it's not very helpful in number theory, which wants to ask questions about the rational numbers.
The other idea is the one I want to think of, which is the idea of parametrization. (This is important in both algebraic geometry and in algebraic number theory, and eventually leads to moduli spaces which are really cool and I don't understand very well).
We didn't just prove that there are infinitely many pythagorean triples. We gave a way of finding all of them. Contrast with the case of prime numbers: we can fit a proof that there are infinitely many into a haiku, but we don't have any way of finding more that's much better than brute force.
But we actually have something even better than that. Not only can we generate all the reduced triples, we have them aligned with some other structure--the reduced triples are in a natural correspondence with the rational numbers. And the naturalness of this correspondence means that information goes both ways--e.g. a larger rational number corresponds to a more obtuse measured angle in the corresponding right triangle, which you can see from the picture.
A lot of the work of number theory involves showing that all your points are in correspondence to "something else" or have some "extra structure" on them. For instance, one of the reasons we like elliptic curves so much is that the points form a group, meaning we can add together two points (in a non-obvious geometric way) and get a new one. So we can ask, not only how many points an elliptic curve has, but what their structure is.
So that's the big idea I'd draw from this. If we want to understand something complicated and apparently irregular, it helps if we can use geometry to put them in a useful correspondence with a structure we understand really well, like "a vertical line" (for pythagorean triples) or "a lattice of integer points" (for an ellpitic curve).
I wrote up a thing for my upcoming number theory class, and I thought some people on here might be interested. This is an attempt to give the flavor of how arithmetic geometry (my subfield of number theory) works, and what it looks like. Reminder that the math should render on my blog.
We’ll start with the pythagorean theorem, which states that if a right triangle has side lengths of $a$ and $b$ and a hypotenuse of length $c$, then $a^2 + b^2 = c^2$. Most of you can probably think of a set of integers that solve this equation; two common examples are $(3,4,5)$ and $(5,12,13)$.
Definition: A pythagorean triple is a triple of integers $(a,b,c)$ that satisfies the equation $a^2 + b^2 = c^2$.
However, it’s not actually that easy to find these pythagorean triples. If you pick a random pair of integer side lengths, the hypotenuse will usually not be an integer; famously, the Pythagoreans discovered irrational numbers by considering the right triangle with side lengths $(1,1, \sqrt{2})$—and then found the concept of irrational numbers so religiously disturbing that revealing it to outsiders punishable by death.
So integer solutions definitely exist, but are not trivial to find. So we might ask how many solutions there are.
Question: How many pythagorean triples exist?
It turns out that as phrased, this question is actually really easy to answer. Stop and think about it for a minute. I’ll wait.
We know that $(3,4,5)$ is a pythagorean triple; we can see that so is $(6,8,10)$ and $(9,12,15)$ and so on. In general, if $(a,b,c)$ is a pythagorean triple, and $n$ is an integer, then $(na, nb, nc)$ is also a pythagorean triple.
But this seems like cheating! We really only have one solution, we’re just “rescaling” it a bunch. We’d like to not “double-count” solutions like this.
Definition: A reduced pythagorean triple is a pythagorean triple $(a,b,c)$ of integers with no common factors.
This prevents double-counting: $(3,4,5)$ is a reduced pythagorean triple, but $(6,8,10)$ is not. So we now ask a better question:
Better Question: How many reduced pythagorean triples exist?
Let’s return to the equation $a^2 + b^2 = c^2$. To take into account the “no common factors”, we can divide through by $c^2$, and we get the equivalent equation $(a/c)^2 + (b/c)^2 = 1$. We can no longer ask for integer solutions to this equation since we’ve divided by $c^2$, but we can ask for rational solutions.
Thus we see that every reduced pythagorean triple gives us a rational solution to the equation $x^2 + y^2 =1$ (that is, a solution where $x$ and $y$ are both rational numbers); it also turns out that every rational solution to $x^2 + y^2 = 1$ gives us exactly one reduced pythagorean triple.
(If $(p/q)^2 + (m/n)^2 = 1$, then $(pn)^2 + (qm)^2 = (qn)^2$, and if $p/q$ and $m/n$ are in lowest terms then $pn, qm, qn$ will have no common factors so we get a unique reduced triple).
Even Better Question: How many rational number solutions does the equation $x^2 + y^2 = 1$ have?
You may notice that this equation is the equation for a circle, and thus we can rephrase this by asking how many rational points there are on a circle. (You might think a circle obviously has infinitely many points. It definitely has infinitely many real points, but that doesn’t mean it has infinitely many rational points; we’ll come back to this idea later).
Above, we changed our original question into a question about geometry! (Hence the name “arithmetic geometry”). And since we’re studying geometry, we can draw a picture, with four elements:
The unit circle, centered at zero. (This has the points we’re trying to count).
A point at $(-1,0)$, the leftmost point of the circle.
A vertical line $x = 1$ tangent to the rightmost point of the circle.
A line connecting the point at $(-1,0)$ to a point $(1,p/q)$ on the vertical line.
For any rational number $p/q$, the line will intersect the circle in exactly two points: $(-1,0)$ and one other. Since there are infinitely many rational numbers, this seems like it should answer our question, and tell us that there are infinitely many reduced triples. But we need to make sure that we have a rational point.
There are two ways to see that we do in fact get a rational point. The “fast” way is to see that the set of points where the line intersects the circle are the solutions to a quadratic equation; if a quadratic with rational coefficients has one rational solution, it must have two.
The other method is to crank through the algebra; if you don’t care about the details, go ahead and skip to the end of this section. For everyone who’s still here, let’s see where the second intersection happens. The equation of this line is: $\begin{align} y - y_1 & = \frac{y_2 - y_1}{x_2-x_1} (x-x_1) \\\\ y - 0 & = \frac{p/q}{1 - (-1)} (x- (-1)) \\\\ y & = \frac{p}{2q} (x+1) \end{align} $
We can substitute this in to the equation for the circle and get
One of these possibilities is just $\frac{-p^2 - 4q^2 }{p^2 + 4q^2} = -1$, which makes sense, since we knew $(-1,0)$ was a solution to the original equation. The other solution is $\frac{4q^2 - p^2}{4q^2 + p^2}$, which is clearly a rational number. Further, we can solve for $y$:
which is clearly a rational number. Thus for every rational number in simplest terms $p/q$, we have that $\displaystyle \left( \frac{4q^2 - p^2}{ 4q^2 + p^2 }, \frac{4pq}{4q^2 + p^2} \right)$ is a rational solution to $x^2 + y^2 =1$, and thus we have our result:
Proposition: For every rational number $p/q$ in lowest terms, there is a pythagorean triple $$ \left( 4q^2 - p^2, 4 p q, 4q^2 + p^2 \right) . $$ If $p$ is odd this triple is reduced; if $p$ is even it is reduced after dividing each term by 4. Thus there are infinitely many distinct reduced pythagorean triples.
People often use the formula $$ \left( q^2 - p^2, 2pq, q^2 + p^2 \right) $$ which is equivalent after a substitution of $q$ for $2q$ but is a bit harder to link up with the pictures, I think. Thanks to various responses that kept calling me on sloppiness here until I get it right. Always prove your claims, kids!
Now let’s generalize a bit. What if we instead look at equations of the form $a^3 + b^3 = c^3$? Or $a^4 + b^4 = c^4$, or some other exponent? You might think we could make the same sort of geometric argument, and draw a line and see where it intersects our new shape; here are the corresponding pictures for $x^3 + y^3 = 1$ and $x^4 + y^4 = 1$:
Looking at these pictures, it would be surprising if those curves never hit a rational number. But famously, this is exactly what happens! In fact, this is the content of Fermat’s Last Theorem, which was proven by Andrew Wiles in 1994.
Theorem: Suppose $x$ and $y$ are rational numbers, and $n$ is a positive integer, with $x^n + y^n =1$. Then either $n=1$, $n=2$, $x=0$, or $y=0$.
That is there are no non-trivial solutions to this equation for $n >2$. In other words, the blue curves in the pictures above never hit any rational-valued points. You can stretch them by any integer factor–or, indeed, any rational factor–and they will never pass through any points whose values are integers. Somehow they manage to dodge all of them.
Here is my semi-long promised post on the theory of Linear Optimization or Linear Programming. About three thousand words under the cut, but there are pictures and I think it's readable. Please ask if you have questions!
Summary: Optimization problems are a way of maximizing some output function (say your profit) subject to some constraints. If the output function and all the constraints are linear, solving this is surprisingly easy; I explain some of the method with pictures. We can view the process in a few different ways, and introduce the concept of duality. I talk about generalizations, including convex optimization (which, surprisingly, is almost as easy as linear optimization) and integer programming (which, surprisingly, is not).
Also, there are pictures of cats.
From a mathematical perspective, optimization is the question of maximizing some "objective function" subject to "constraints." The objective function is just a function that describes how much of something we get--it can measure profit or utility or fuzzy bunnies, whatever our goal is. (It's an "objective" function in the sense of "mission objective," not in contrast to a subjective function). The constraints are restrictions that tell us what we're allowed to do. They can describe prices of inputs, or the breeding rate of rabbits, or the rule that we aren't allowed to restrict speech, or any other limitations we want or need to put on things.
An optimization algorithm is a technique for finding this maximum. It's not too hard to state a problem where you can tell that a maximum has to exist, but it's not at all obvious what you need to do to get that best result. (If your boss comes to you and asks what decision will maximize your firm's profit, the answer "some decision, definitely" is not terribly helpful). So we're talking about ways of not just showing that some optimization problem has a solution, but of actually finding the solution.
In this post we're specifically discussing linear optimization, which means that the objective function is a linear function and the constraints are all given by linear inequalities (that is, they look like $f(\mathbf{x}) \leq 0$ where $f$ is a linear function). (We actually can relax this a bit and ask about convex functions and reasons; I'll come back to this thought at the end).
These techniques were originally developed by Leonid Kanotorovich, a Soviet mathematician who was trying to solve the central planning problem. (You can read more about the history in slatestarscratchpad's mainblog post on Red Plenty, and even more if you go read the actual book).
Kantorovich's constraints were the inputs it takes to make each good (at each factory etc); and his objective function was...Uh, well, one of the fundamental problems of central planning is that it's not clear what you should be optimizing for. Kantorovich sidestepped this by actually starting with a fixed output and trying to minimize the amount of inputs necessary. This turns out to be equivalent, and we'll come back to this equivalence later.
(You might worry about whether the actual constraints in the real world are linear. The answer is probably "no, not really." Fortunately, we can generally get pretty good results by approximating non-linear functions by linear functions--this is in fact the main point of derivatives, see my post on linear functions for more. It is true that this only works as long as we stay away from extreme edge cases, but under the quite reasonable assumption that we don't want to devote our entire economy to making men's shoes, size nine, gray, this probably isn't too big a problem).
These techniques are currently used by almost every big logistical organization: FedEx, for instance, has a massive army of people using these and related techniques to optimize their shipping organization. So while the original goal of "saving the Soviet Union's economic planning" didn't really work out, the techniques do have tremendous practical applications. After all, a corporation is just a planned economy with a potentially well-defined objective function.
So how does linear optimization work? Let's start by drawing some simple pictures. Suppose we have an economy that can produce either guns or butter--
Actually, no, wait, that's boring. Suppose you run a company that makes two products: board game versions of Tetris and Rubik's cubes that deliver electric shocks to their users. (No, I don't know why you would make and sell either of those things. Or who's buying them from you. It's your company, you tell me). And suppose you make \$5 of profit for each pointless board game and \$3 of profit for every dangerous math toy.
Your objective, presumably, is to maximize profit. We can write $T$ for the number of tetris games you make, and $R$ for the number of Rubik's Cubes you make. So your objective function is $f(T,R) = 5T + 3R$, and we can check that this is a linear function. (Note: the function stops being linear if you assume the amount you make has an effect on the price, so these techniques are most useful when no one manufacturer is a monopolist or anything).
But with just this information, you obviously maximize your profit by making "as many toys as possible." In order for the question to be interesting, we need some constraints. First of all, you (sadly for everyone else) can't make negative toys, so we have $T \geq 0$ and $R \geq 0$. Then let's say that you can make at most a hundred stupid tchotchkes a month. We can render this with the constraint equation $T + R \leq 100$. At this point the problem is still easy--the Tetris game is more profitable than the Rubik's cube, so we should make 100 of the Tetris games for a total profit of $500.
So let's add another constraint. Maybe both toys require a certain amount of plastic, and it's hard to find sufficiently low-quality plastic so you can only get 300 ounces a month. The Tetris game takes 4 ounces of plastic to make, and the Rubik's cube only takes two ounces. We have a second constraint, now, that $4 T + 2 R \leq 300$. At this point you can probably still solve the problem in your head. We want to make as many Tetris games as possible, so what's the most we can make? A little experimentation or calculus will show that you can make fifty Tetris board games and fifty electrified Rubik's cubes, and we get a total profit of $f(50,50) = 5\cdot 50 + 3 \cdot 50 = 400$ dollars.
Let's add one more contstraint. You probably don't want to work more than, say, 55 hours a month, and it takes forty minutes to make a Tetris game and only thirty to make a Rubik's cube. This gives us a third constraint, which is that $40T + 30 R \leq 3300$. And now this is hard to think about in our heads. (I don't know what the answer is yet). So we need some tools.
The first tool, when you can use it, is to draw a picture. (This is why we're still working in two variables). First let's draw a picture of what solutions the constraints allow, ignoring the feasible region for now. We can draw a line for each constraint, and shade in the set of solutions it allows:
This allows us to sketch a picture of the precise feasible region. This is a picture of all the outputs you could choose to make. So under our constraints you could make zero of either thing, or you could make ten Tetris games and twenty Rubik's cubes. But you can't make seventy of each.
Everything we've done so far works with any functions, not just linear functions. But now we can introduce two facts that make our job much easier. First, that the maximum is always on the boundary. This should be intuitive from the example: if you can make more Rubik's Cubes without making fewer Tetris games, that will obviously give you more money. This is a general theorem about linear functions on regions defined by linear inequalities.
Second, and even better, the maximum is always on a vertex. And furthermore, if two vertices yield the same output, the entire line between them will also yield that same output. So at a first pass we can just check all the vertices. Looking at the picture above, there are five vertices; checking them all we get the following table: $$ \begin{array}{cc} (T,R) & f(T,R) \\\\ (0,0) & 0 \\\\ (0,100) & 300 \\\\ (75,0) & 375 \\\\ (30,70) & 360\\\\ (60,30) & 390 \end{array} $$ So we maximize profit by making sixty terrible Tetris games and thirty wretched Rubik's cubes, for a profit of $390. The new constraint cost us ten dollars.
In that problem we could just check all the vertices; in interesting problems there can be millions of vertices, so that's not terribly efficient. But we can think about the problem in a couple of other ways. The picture I like to have in my head is a picture of moving level sets. For any output of the objective function, we can draw a line that represents "all the solutions that give that output." So for instance this is the line of all solution that have an output of 300:
Just from looking at the picture we can see that we can do better. Here's the line for 400:
And we see that we can't quite do that well. But we almost can! Conceptually we can think about slowly moving the line further and further out until it just barely hits one vertex:
This is the line for 390, and thus it is our solution.
This isn't really a general purpose algorithm, although it's quite useful for seeing and really grasping what's going on. So keep those pictures in mind.
But also think back to how we intuitively solved the easy version of the problem. We could make at most 100 toys. So at any given point we could trade out a Tetris game for a Rubik's cube or vice versa. And it's clear that trading one Rubik's cube for one Tetris game will gain two dollars, so you should keep making that trade until you can't any more.
This is the basic idea behind the simplex algorithm which was invented by George Dantzig in 1947, when he was working for the US Air Force. (If you've ever heard the story about the math PhD student who solved a major open problem because he accidentally thought it was course homework--that was this guy and related to this problem).
The algorithm: start at a vertex of the feasible region, doesn't really matter which one. Compute the gradient (derivative) of the objective function there (this is really easy since it's linear). Figure out which adjacent vertex (that is, a vertex directly connected by a line segment) is in the direction of most rapid increase, and go to that vertex. (This is called a "pivot"). Repeat.
Sometimes there are no adjacent vertices that give better results. In that case you move to one of the vertices that is "just as good". This is called "stalling" and it's fairly normal for it to happen. But there's an obvious risk of winding up in a "cycle" where you get back to a vertex that you already hit earlier--in which case the program will never terminate. But fortunately there are rules that can ensure this never happens and the program always terminates.
Importantly, the entire problem can be encoded in a giant matrix, and then this entire algorithm can be encoded as simple matrix operations. This is great, because matrix operations are fast and efficient and easy to execute.
Consequently, the simplex algorithm is "usually" very efficient. In practice it works on most problems. Under any reasonable method of generating random problems, the simplex method works very well on average. However, there are problems where the simplex method is very slow; with a carefully designed problem (such as the Klee-Minty cube) you can force it to check every vertex.
There is a whole family of so-called "vertex" or "basis-exchange" algorithms that stay on the boundaries of the feasible region. Many of them are efficient on average but all of them are vulnerable to these sorts of pathological problems where they take exponential time.
More recently (starting around 1980) Khachiyan, Karmarkar, and others developed "interior-point" methods that use information from the interior of the feasible region, and have worst-case polynomial time performance. Both types of algorithms are in common use in industry today, in almost any industry that uses large-scale logistics operations.
Another perspective we can use to look at this is the perspective of the dual problem. Duality shows up in a lot of areas of math--the ability to, essentially, take a problem and look at it from the other direction. Or a slightly different problem that gives information about the original problem.
In this case we stated our original, or "primal" problem, as trying to maximize our profit subject to some constraints, which you can think of as resource costs. The number we can set is the "number of units produced." The dual version of the problem is to try to minimize the total cost of the resources used, subject to having certain minimum prices for finished goods. The number we can set in the dual problem is the price of resources.
So in our problem, we would minimize the function $f^\dagger (x,y,z) = 100x + 300 y + 3300 z$ (the total cost of resources) subject to the constraints $x + 4y +40 z \geq 5$ and $x + 2y +30z \geq 3$ (which say that you have to get at least \$5 for every Tetris game and at least \$3 for every Rubik's cube). That is, our dual objective function is the "cost of all the resources used," and the dual constraints reflect the observation that if resource prices are "too low" relative to their value then you'll have resource shortages.
Here's a picture of the feasible region for the dual problem. (You can see why we prefer drawing pictures of two-variable problems. )
The dual problem isn't the same as the primal, but rather a mirror. A feasible solution to the dual problem provides an upper bound for the primal problem--every solution satisfying the constraints will guarantee at least the maximum profit available to the primal problem. Conversely, every feasible solution to the primal gives a lower bound for the dual--it uses as many or more resources than the minimum possible.
This technique actually generalizes to all sorts of optimization problems. But in general, there is a "duality gap": the solution that minimizes the dual problem might not get the solution as low as the maximum to the primal problem. (Recall they are mirrors but not identical). But in the special case we're in, the linear case, that can't happen.
If there is an optimal solution to the primal problem, then there is an identical optimal solution to the dual problem. (If the dual problem is completely infeasible--there's no way to satisfy the constraints--then the primal problem is unbounded, which means you can get infinite profit. And vice versa).
This is highly related to the idea of duality and equilibria in game theory. Supposedly, Dantzig mentioned his work to von Neumann and von Neumann immediately conjectured that duality would hold because linear programming worked just like the game theory he was developing.
In fact, there's a sense in which you can think of a linear optimization problem as a two-player game played between a producer and a planning board--the producer wants to maximize profit, the planning board wants to minimize profit, and the optimum reflects the Nash equilibrium strategy. You can actually prove Nash's theorem using these linear programming techniques; the proof isn't difficult but is long and boring, and essentially reduces to "turn every game into a linear optimization problem, and then show that it has a solution."
I want to finish up by talking about two different ways we can make the problem harder.
One thing we can do is relax the linearity condition. The general problem of "optimize a function subject to constraints" is really hard; it's not that difficult to write down optimization problems that are harder than NP-complete (and thus definitely require exponential time), although most interesting ones are in NP in some form or another.
To keep things manageable, people often only search for approximate solutions--it's much easier to find a solution that's "probably close to optimal" than it is to prove that you definitely have the optimal solution. There's usually a tradeoff that you can get a more exact answer in exchange for a longer computational time.
But we can restrict the functions a bit and still get a manageable problem; this is the domain of convex optimization. A region is "convex" if you can pick any two points in the region, and the line segment connecting them is entirely contained in the region. So for example, a triangle or a circle is convex, but a star or a chevron is not.
It's not too hard to see that the regions defined by linear inequalities that we've been discussing are convex.
We say that a function is convex if the region above its graph is convex. Again, it's not difficult to see that linear functions are convex.
The simplex method does not work on convex regions--it needs to move among vertices, and convex regions (like circles) may not have any vertices, and if they do the vertices aren't necessarily special. But we still know that the optimum has to be somewhere on the boundary, and a simple modification of the interior point methods will still work. In particular, convex optimization problems still have the property that the dual problem will always overlap the primal, so the two problems have a feasible solution in common, which is optimal.
Cosma Shalizi has an excellent post on Red Plenty where he discusses some of the limits of convex optimization. Sadly the entire economy probably is not convex (which would essentially imply, among other things, that there was never increasing returns to scale). But we're fortunate enough that a number of important problems are convex and are tractable to these types of algorithms.
The other way of making this problem harder is more interesting to me personally, and actually why I originally brought this topic up. (Don't worry, you're near the end of this post!) When we solve the problem earlier, it asked for 60 Tetris games. If it had asked for 58, we still would have been happy. But what if it had asked for 58.5? We can't make half a Tetris game. What if it had told us we needed $19 \pi$ Tetris games?
The tools that allow us to find solutions will find the best real-number solution, but we might need to find a solution that's entirely made up of integers. For a lot of problems this isn't a big deal--if the algorithm calls for 1023.7 tons of concrete, you can round that off to 1024 tons without having huge problems.
But it turns out that just rounding off the best fractional solution is not a good way to find the best integer solution! In fact, finding integer solutions is an NP-complete problem that's very important to computer science, and related to problems like the Knapsack Problem. I will talk about this more in a future post.
This post is a quick (and not very computation-heavy) overview of (some basic) linear algebra. I'm making it for two reasons.
First, I promised a couple of posts about linear optimization and integer programming. But to talk about those I needed to talk about linear constraints and linear functions, and I want the post to be accessible to people who haven't taken linear algebra.
So if you read slatestarscratchpad's mainblog post on Red Plenty and are curious about a bit of the math underlying the system, you might want to read this.
Second, I made a post a couple weeks ago (responding to @alexyar that calculus is the art of translating hard problems into linear algebra problems, which are easy. If you were curious about that and wanted to learn more, read on.
So what is a linear function? Well let's start with the one-variable case. As we probably all remember via flashbacks to high school algebra, a line has equation $y = mx +b$. (We say $m$ is the slope and $b$ is the y-intercept). I prefer a variant equation $y = m(x-x_0) + y_0$, which conveys more information directly: the line has slope $m$, and goes through the point $(x_0, y_0)$.
Why is this nice? Well, primarily, it's easy to compute with. We don't have to do anything weird. We can solve it without taking any square roots or doing anything nasty. Assuming the slope isn't zero, it will hit every possible value exactly once, and we can easily tell where that happens.
(We also like lines because they're fundamental to most conceptions of geometry, but that's not going to be super relevant. It will be a little, though).
How do lines come up in calculus? Well, if you've taken calculus you might remember that people define the derivative as the "slope of the tangent line" to a function. There are a number of ways one could define the "tangent line," but relevant to our purposes, the tangent line is the linear function that best approximates your original function.
Basically, we like linear functions because they're easy to work with. When we look for the tangent line to a function $f$ at a point $a$, we're saying "okay, let's pretend $f$ is a line. What's the closest we can get to it being the same as the real $f$?"
And the answer is different depending on where you are on the graph. Below we have a graph with two tangent lines at two different points. We can see that each one looks a lot like the graph in a certain neighborhood, but they're very different from each other.
This means that we need a different derivative for every point on the graph. And the function $f'(x)$ is a function that tells you "If I look at the point $x$, then what is the line that looks the most like the graph of $f$ near that point?" And near a starting point $a$, we get the approximation $$f(x)\approx f'(a) (x-a) + f(a).$$
So with one variable, the derivative $f'(x)$ is a function that takes in a point, and outputs a line. What happens if we have more than one variable?
Well, first we can ask what the analog of a line is. If you ask most people what the two-dimensional version of a line is, after some hemming and hawing they'll probably come up with "a plane." And that's basically right.
And in fact we can see that the equation that gives a plane as its graph is something like $f(x,y) = ax + by +c$, which looks like our line equation except with an extra variable. And we can generalize this. A linear function on three variables is $f(x,y,z) = ax + by + cz + d$, and in general if we have a linear function on $n$ variables it looks like $$f(x_1, \dots, x_n) = a_1 x_1 + a_2 x_2 + \dots + a_n x_n +c= \sum_{i=1}^n a_i x_i +c.$$
Note: we can make things more compact with good notation. I'll use the convention that $\mathbf{x} = (x_1, \dots, x_n)$. You also may or may not remember that we can take the "dot product" of two vectors by $$\mathbf{x} \cdot \mathbf{y} = x_1 y_1 + x_2 y_2 + \dots + x_n y_n = \sum_{i=1}^n x_i y_i.$$ Thus a linear function is one where we have some vector $\mathbf{a} = (a_1, \dots, a_n)$ such that $f(\mathbf{x}) = \mathbf{a} \cdot \mathbf{x} + c.$
This basically just means that a linear function is one where you multiply each variable by some fixed number and then add them all together. You never multiply two variables together and you never do anything weird to them. So things are still really easy to calculate with.
So how does this relate to calculus and the idea of approximation? If you've taken a multivariable calculus course, you've probably spent some time dealing with "partial derivatives" and "directional derivatives" and such. These are all useful computational tools but are a bit beside the point for understanding what's actually going on.
Let's return to the two-variable case, because we can actually draw pictures of graphs and see what's going on. We have a function and we want the equation for a plane that looks the most like our function--the "tangent plane," which you can see below.
If you remember the details of your hypothetical multivariable calculus class, this might point you to the gradient. The gradient of a function at a given point is the normal vector to the tangent plane, and thus tells you the equation. In less technical language, the gradient at a point is a list of all the $a_i$ in the equation I gave above. And thus if the gradient of a function $f$ is $\nabla f$, then our linear approximation is given by $\nabla f(\mathbf{a}) \cdot \textbf{x} + c$.
So the "real" derivative here is the vector $\nabla f(\mathbf{x})$, which for any vector $\mathbf{x}$ gives us all the coefficients for the linear function that approximates our original function near $\mathbf{x}$. In particular, we get the approximation $$f(\mathbf{x}) \approx \nabla f(\mathbf{a}) \cdot (\mathbf{x} - \mathbf{a}) + f(\mathbf{a}).$$ Notice how similar this is to our earlier approximation! And we can use this to turn a hard-to-work-with function into an easy-to-work-with linear approximation.
But we can actually go one step further. Until now I've assumed that while my function might have many inputs, it only has one output. But for full generality we want to let our function have multiple outputs.
If a function has $n$ inputs and $m$ outputs we say it is a function from $\mathbb{R}^n$ to $\mathbb{R}^m$ or write $F: \mathbb{R}^n \to \mathbb{R}^m$. We usually break this up output by output; so we have $F(\mathbf{x}) = (f_1(\mathbf{x}), \dots, f_m(\mathbf{x}))$. Each component has many inputs but just one output, so we can compute the linear approximation just like we did before.
Since $F$ is a list of many-to-one functions, our linear approximation will be a list of vectors. And if you're clever, you might realize that a list of vectors is a matrix. The total derivative of a function $F$ at a point $a$ is a matrix $dF_\mathbb{a}$, and the approximation is $$F(\mathbf{x}) \approx dF_\mathbf{a} \cdot\mathbf{x} + F(\mathbf{a}).$$ So the best linear approximation of any function with many inputs and outputs is going to be given by some matrix the total derivative, or the Jacobian. This is basically the fundamental object of multivariable differential calculus--though it usually isn't taught that way.
Because I'm a mathematician, I can't resist abstracting one more step. (Going up one more meta-level? I dunno, I guess that's endemic around here).
When a mathematician says a function is linear, he generally means that it has the following two properties: $$f(\mathbf{x} + \mathbf{y}) = f(\mathbf{x}) + f(\mathbf{y})$$ $$f(c \mathbf{x}) = c f(\mathbf{x}).$$ In any finite-dimensional setup these functions correspond to matrices, and in the many-to-one variable setup these functions are dot products. (and in the typical single-variable case, every linear function is just scalar multiplication--or multiplication by a $1 \times 1$ matrix).
You might have noticed that in the preceeding exposition, I basically ignored the constant at the end of every equation. And that's because on technical points that makes the equation not linear in this sense. The actual linear function is the number/vector/matrix that we multiply by.
And this means we can give a very general definition of a (Frechet) derivative, for any function between two normed vector spaces. If $f: A \to B$ is a map of normed vector spaces, then $L$ is the derivative at $a$ if it is a (bounded) linear function from $A$ to $B$, and $$\lim_{x \to a} \frac{f(x) - f(a) - L(x-a)}{x-a} = 0.$$ If you're familiar with the usual single-variable derivative definition, you can write it out and see that this is the same thing.
So what's the takeaway? First, a linear function is a function that just looks like matrix multiplication. We like them because they're easy to compute, and just involve multiplying each variable by some fixed scalar. And even in the most general setting, we can view a derivative as being the linear function that approximates our starting function the best--which is what I mean when I say that (differential) calculus is the attempt to turn hard problems into linear problems.