Logo

12. Online Algorithms

An online algorithm is one that can process data one-by-one and make more informed decisions based on past data. Essentially, an adaptive learning algorithm.

Experts

Consider a set of NN experts that each make a prediction about a certain event for TT time steps. Let U\mathcal{U} be the set of possible predictions. The problem proceeds as follows.

  1. At time step tt, each expert makes a prediction; let EtUN\mathcal{E}^{t}\in \mathcal{U}^{N} be the vector of predictions.
  2. The algorithm makes a prediction ata^{t}, and then the actual outcome oto^{t} is revealed.

The goal is to minimize the number of mistakes, i.e. minimize M\lvert M \rvert, where M={tatot}M=\{ t\mid a^{t}\neq o^{t} \}.

Perfect Expert

Let us first consider the situation in which there exists a perfect expert, i.e. an expert that makes zero mistakes.

Claim: There exists an algorithm that log2N\lceil \log_{2}N \rceil mistakes.

Proof.
At each time step tt, the algorithm consider all experts who have yet to make a mistake, and predict the majority prediction amongst their predictions. Note that, for every mistake made, the set of experts that have yet to make a mistake reduces in size by at least a factor of 22. Since there exists a perfect expert, this algorithm can make at most log2N\lceil \log_{2}N \rceil mistakes before it the set of experts who have yet to make a mistake consists solely of the perfect expert; at which point no more mistakes will be made. Thus, at most log2N\lceil \log_{2}N \rceil mistakes will be made by this algorithm.

\square

Bounded Mistakes

Now, we generalize the situation a bit. There no longer exists a perfect expert, but there exists an expert that makes at most mm^{*} mistakes.

Claim: There exists an algorithm that makes at most Mm(log2N+1)+log2NM\leq m^{*}(\lceil \log_{2}N \rceil+1)+\lceil \log_{2}N \rceil.

We can essentially just extend the previous algorithm. We essentially divide the time duration TT into epochs: each epoch, we run the algorithm from the perfect expert scenario; once the set of experts is empty, we start a new epoch with all NN experts.

In each expert, each expert makes at least one mistake. Thus, the number of completed epochs is at most mm^{*}. In each completed epoch, we make at most log2N+1\lceil \log_{2}N \rceil+1 mistakes, and in the last epoch (which does not complete, since at least one expert will no longer make mistakes), at most log2N\lceil \log_{2}N \rceil (to determine the now-perfect expert). Thus, the algorithm makes at most m(log2N+1)+log2Nm^{*}(\lceil \log_{2}N \rceil+1)+\lceil \log_{2} N\rceil mistakes.

\square

Weighted Majority Algorithm

But, we can do better—with an algorithm known as the weighted majority algorithm.

First, we describe the structure of the algorithm. The intuition behind the algorithm is to bias our predictions towards experts that appear to perform better. We assign a weight wiw_{i} to each expert i{1,,N}i\in \{ 1,\dots,N \}. We denote the weight of an expert ii at the beginning of round tt as wi(t)w_{i}^{(t)}. Initially, all weights are 11. Also, let ei(t)e_{i}^{(t)} denote the prediction of expert ii in round tt.

  1. In round tt, predict the outcome that maximizes the sum of the weights of experts that predicted it. Formally, at=arg maxuUi: ei(t)=uwi(t)a^{t}=\mathrm{arg}\ \underset{ u\in \mathcal{U} }{ \mathrm{max} }\underset{ i:\ e_{i}^{(t)}=u }{ \sum } w_{i}^{(t)}.
  2. Then, after the outcome oto^{t} is revealed, we update the weights for the next round. If ei(t)=ote_{i}^{(t)}=o^{t}, wi(t+1)=wi(t)w_{i}^{(t+1)}=w_{i}^{(t)}. Otherwise, if the expert was wrong, wi(t+1)=wi(t)(1ϵ)w_{i}^{(t+1)}=w_{i}^{(t)}\cdot(1-\epsilon), where ϵ\epsilon is a parameter chosen earlier. We explain the choice of ϵ\epsilon in more detail later.

ϵ=1/2\epsilon=1/2

Claim: For this algorithm, if ϵ=12\epsilon=\frac{1}{2}, the number of mistakes made by the weighted majority algorithm (WM) is at most

2.41(m+log2N)2.41(m^{*}+\log_{2}N)

This is also commonly expressed as

2.41m+O(log2N)2.41m^{*}+O(\log_{2}N)

Proof.
Let ZtZ^{t} represent the aggregate weight of the set of experts at time step tt. That is, Zt=iwi(t)Z^{t}=\underset{ i }{ \sum }w_{i}^{(t)}. Note that

Also, consider the event where this algorithm makes a mistake in round tt. Then, the sum of weights of the wrong experts is higher than the sum of the weights of the correct experts. That is,

i: ei(t)otwi(t)i: ei(t)=otwi(t)\underset{ i:\ e_{i}^{(t)}\neq o_{t} }{ \sum } w_{i}^{(t)}\geq \underset{ i:\ e_{i}^{(t)}= o_{t} }{ \sum } w_{i}^{(t)}

Considering that

Zt=i: ei(t)otwi(t)+i: ei(t)=otwi(t)Z^{t}= \underset{ i:\ e_{i}^{(t)}\neq o_{t} }{ \sum } w_{i}^{(t)}+ \underset{ i:\ e_{i}^{(t)}= o_{t} }{ \sum } w_{i}^{(t)}

We can derive

Zt2i: ei(t)otwi(t)Z^{t}\leq 2\underset{ i:\ e_{i}^{(t)}\neq o_{t} }{ \sum } w_{i}^{(t)}

Now, consider the sum of the weights of the next round, Zt+1Z^{t+1}.

Zt+1=i: ei(t)otwi(t+1)+i: ei(t)=otwi(t+1)=12i: ei(t)otwi(t)+i: ei(t)=otwi(t)=Zt12i: ei(t)otwi(t)34Zt\begin{align*} Z^{t+1} &= \underset{ i:\ e_{i}^{(t)}\neq o_{t} }{ \sum } w_{i}^{(t+1)}+ \underset{ i:\ e_{i}^{(t)}= o_{t} }{ \sum } w_{i}^{(t+1)} \\ &= \frac{1}{2} \underset{ i:\ e_{i}^{(t)}\neq o_{t} }{ \sum } w_{i}^{(t)}+ \underset{ i:\ e_{i}^{(t)}= o_{t} }{ \sum } w_{i}^{(t)} \\ &= Z^{t}-\frac{1}{2}\underset{ i:\ e_{i}^{(t)}\neq o_{t} }{ \sum } w_{i}^{(t)} \\ & \leq \frac{3}{4}Z^{t} \end{align*}

Where the last step follows from the inequality we previously derived.

In other words, the sum of all the weights decreases by at least a factor of 25%25\% if the algorithm makes a mistake. Now consider any expert ii, such that it has made mim_{i} mistakes after tt rounds. Let the algorithm have made MM mistakes. Then,

(12)mi=wi(t+1)Zt+1Zt(34)M=N(34)M\left( \frac{1}{2} \right)^{m_{i}}=w_{i}^{(t+1)}\leq Z^{t+1}\leq Z^{t}\left( \frac{3}{4} \right)^{M}=N\left( \frac{3}{4} \right)^{M}

Rearranging gives

Mmi+log2Nlog2(43)2.41(mi+log2N)M\leq \frac{m_{i}+\log_{2}N}{\log_{2}\left( \frac{4}{3} \right)}\leq 2.41(m_{i}+\log_{2}N)

Since we are guaranteed an expert makes at most mm^{*} mistakes, we note that

M2.41(m+log2N)M\leq 2.41(m^{*}+\log_{2}N)

as desired.

\square

Arbitrary ϵ\epsilon

Now, we generalize our claim and proof to arbitrary* ϵ\epsilon. As in, ϵ(0, 1/2)\epsilon \in(0,\ 1/2).

Claim: For the weighted majority algorithm, if ϵ(0, 1/2)\epsilon \in(0,\ 1/2), we guarantee M2(1+ϵ)m+O(logNϵ)M\leq2(1+\epsilon)m^{*}+O\left( \frac{\log N}{\epsilon} \right).

Proof.
We leave out most of the analytical details for brevity, due to similarities with the previous proof. In short, we derive

Zt+1(1ϵ2)ZtZ^{t+1}\leq \left( 1-\frac{\epsilon}{2} \right)Z^{t}

Subsequently, we derive

(1ϵ)miZtZ1(1ϵ2)M=N(1ϵ2)MNexp(ϵM/2)(1-\epsilon)^{m_{i}}\leq Z^{t}\leq Z^{1}\left( 1-\frac{\epsilon}{2} \right)^{M}=N\left( 1-\frac{\epsilon}{2} \right)^{M}\leq N\exp(-\epsilon M/2)

Rearranging gives

Mmilog(1ϵ)+lnNϵ/2M\leq \frac{-m_{i}\log(1-\epsilon)+\ln N}{\epsilon/2}

Then, using the inequality

ln(1ϵ)=ϵ+ϵ22+ϵ33+ϵ+ϵ2, ϵ[0,1]-\ln(1-\epsilon)=\epsilon+\frac{\epsilon^{2}}{2}+\frac{\epsilon^{3}}{3}+\dots \leq \epsilon+\epsilon^{2},\ \epsilon \in[0,1]

gives us

M2mi(ϵ+ϵ2)ϵ+O(logNϵ)M\leq 2 \frac{m_{i}(\epsilon+\epsilon^{2})}{\epsilon} + O\left( \frac{\log N}{\epsilon} \right)

Which simplifies nicely to

M2(1+ϵ)mi+O(logNϵ)M\leq 2(1+\epsilon)m_{i}+O\left( \frac{\log N}{\epsilon} \right)

Given that one expert will make at most mm^{*} mistakes, this naturally gives us our tightest upper bound.

\square

Approximation Bound

Note that the above weighted majority algorithm can have a mistakes bound as close as possible to 2m2m^{*} as desired (by making ϵ\epsilon smaller). In fact, this is as close as we can get; at least, with a deterministic algorithm.

Claim: No deterministic algorithm A\mathcal{A} can do better than a factor of 22, compared to the best expert.

Consider a scenario with two experts E1E_{1} and E2E_{2}. Let the set of choices be U={0,1}\mathcal{U}=\{ 0,1 \}. Let E1E_{1} always predict 00 and E2E_{2} always predict 11. Given that A\mathcal{A} is deterministic, an adversarial set of outcomes may be prepared such that A\mathcal{A}'s predictions are wrong. For instance, if A\mathcal{A} is the weighted majority algorithm, if we always tiebreak by choosing E1E_{1}, a sequence of outcomes 1,0,1,0,1,01,0,1,0,1,0\dots will result in A\mathcal{A} never being right. Regardless, for all deterministic algorithms A\mathcal{A}, at least one of E1E_{1} and E2E_{2} will have an error rate of 12\leq \frac{1}{2}. (More precisely, their error rates should add to 11, since E1E_{1} predicts 00 always and E2E_{2} always predicts 11; therefore, at least one has an error rate 12\leq \frac{1}{2}). Yet, due to the adversarially chosen outcomes, A\mathcal{A}'s error rate is always 11. Therefore, A\mathcal{A} will make, at best, twice as many mistakes as the best expert. Thus, it's not possible to design a deterministic algorithm that, for all possible scenarios of experts and outcomes, makes less than twice as many mistakes as the best expert.

\square

Randomized Weighted Majority

Of course, the above proof does not apply to randomized algorithms. So, can we do better?

The randomized weighted majority algorithm (RWM) was designed precisely for this reason. In short, the weights evolve in the exact same way; however, the prediction at each time step is randomly chosen according to the distribution of the weights of the experts. You may imagine this as randomly choosing a single expert's opinion, weighted by the distribution. Succinctly,

P[at=u]=i: ei(t)=u wi(t)Zt\mathrm{P}[a^{t}=u]= \frac{{ \sum }_{i:\ e_{i}^{(t)}=u}\ w_{i}^{(t)}}{Z^{t}}

Claim: Let ϵ=12\epsilon= \frac{1}{2}. For any fixed sequence of predictions, the expected number of mistakes made by the randomized weighted majority algorithm is at most

E[M](1+ϵ)m+O(logNϵ)\mathrm{E}[M]\leq (1+\epsilon)m^{*}+O\left( \frac{\log N}{\epsilon} \right)
Regret

The gap ϵm+O(logNϵ)\epsilon m^{*}+O\left( \frac{\log N}{\epsilon} \right) between the algorithm's expected performance and that of the best expert is known as the regret of the algorithm. We will see soon that this algorithm has effectively negligible regret.

Proof.
Let FtF^{t} be the fraction of total weight assigned to incorrect experts at time tt, i.e.

Ft=i: ei(t)ot wi(t)ZtF^{t}= \frac{{ \sum }_{i:\ e_{i}^{(t)}\neq o^{t}}\ w_{i}^{(t)}}{Z^{t}}

Also, note that E[M]=t[T]Ft\mathrm{E}[M]=\underset{ t\in[T] }{ \sum } F_{t}, i.e. the expected number of mistakes made by the algorithm is the sum of the fractions of weight for the incorrect experts at each time step tt. This derives itself from linearity of expectation; the expected number of mistakes at time step tt is simply FtF_{t}, the probability of making a mistake at time step tt. Thus, the expected number of mistakes in total is the sum of all such FtF_{t}.

By the weight adjustment procedures, we derive

Zt+1=Zt((1Ft)1+Ft(1ϵ))=Zt(1ϵFt)Z^{t+1}=Z^{t}((1-F^{t})\cdot1+F^{t}\cdot(1-\epsilon))=Z^{t}(1-\epsilon F^{t})

Once again, consider an expert that has made mim_{i} mistakes after tt time steps.

(1ϵ)miZt+1=Z1t=1t(1ϵFt)NeϵFt=NeϵE[M](1-\epsilon)^{m_{i}}\leq Z^{t+1}=Z^{1}\prod_{t'=1}^{t} (1-\epsilon F_{t'})\leq Ne^{ -\epsilon \sum F_{t} }=Ne^{ -\epsilon \mathrm{E}[M] }

Taking logarithms of both sides, we get

miln(1ϵ)lnNϵE[M]m_{i}\ln(1-\epsilon)\leq \ln N-\epsilon E[M]

And using the approximation log(1ϵ)ϵ+ϵ2-\log(1-\epsilon)\leq\epsilon+\epsilon^{2} and rearranging gives

E[M]mi(1+ϵ)+lnNϵE[M]\leq m_{i}(1+\epsilon)+\frac{\ln N}{\epsilon}

And, of course, given that one expert makes at most mm^{*} mistakes, we derive that this algorithm is expected to make at most m(1+ϵ)+lnNϵm^{*}(1+\epsilon)+\frac{\ln N}{\epsilon} mistakes.

\square


Finally, with careful choice of ϵ\epsilon, we can reach "negligible regret." Let ϵ=logNT\epsilon= \sqrt{ \frac{\log N}{T} }. Then,

E[M](1+ϵ)m+logNϵm+ϵm+TlogNm+ϵT+TlogNm+2TlogNE[M]TmT+2logNT\begin{align*} E[M] &\leq (1+\epsilon)m^{*}+ \frac{\log N}{\epsilon} \\ &\leq m^{*}+\epsilon m^{*}+\sqrt{ T\log N } \\ &\leq m^{*} + \epsilon T + \sqrt{ T\log N } \\ &\leq m^{*}+2\sqrt{ T\log N } \\ \frac{E[M]}{T} &\leq \frac{m^{*}}{T} + 2\sqrt{ \frac{\log N}{T} } \\ \end{align*}

And since limT2logNT=0\lim_{ T \to \infty }2\sqrt{ \frac{\log N}{T} }=0, this algorithm, with this choice of ϵ=logNT\epsilon=\sqrt{ \frac{\log N}{T} }, has effectively negligible regret for each time step.

Generalized Experts

Finally, we conclude with the generalization of the expert's problem, and a similar "negligible regret" algorithm that achieves E[Total Regret]O(TlogN)E[\text{Total Regret}]\leq O(\sqrt{ T\log N }).

In the general setting, there exists a set A\mathcal{A} of NN actions you can take at each step (i.e., like NN experts). On day tt, one action iAi\in \mathcal{A} must be chosen; let this action be denoted iti^{t}. Afterwards, the cost ci(t)[0,1]c_{i}^{(t)}\in[0,1] is revealed (i.e. like 11 means a mistaken prediction, 00 means correct; only, in this generalized problem, the cost can be 0.5,0.123,0.5,0.123, etc.). The goal is to minimize the cost function. Or, more specifically, the goal is to minimize the regret—the difference between the cost of the most optimal action (i.e., total cost of choosing this action across all days) and the actual cost.

t=1Tcit(t)=minaA{t=1Tca(t)}+Regret\sum_{t=1}^{T} c_{i^{t}}^{(t)} = \underset{ a\in \mathcal{A} }{ \mathrm{min} }\left\{ \sum_{t=1}^{T} c_{a}^{(t)} \right\} +\text{Regret}

Multiplicative Weights

The Multiplicative Weights algorithm is similar to the Randomized Weighted Majority algorithm. We initialize weights similarly, i.e. wi(1)=1w_{i}^{(1)}=1. For each time step tt,

  1. Let Zt=iwi(t)Z^{t}=\underset{ i }{ \sum } w_{i}^{(t)}.
  2. Select action iAi\in \mathcal{A} with probability wi(t)/Ztw_{i}^{(t)}/Z^{t}.
  3. Costs {c1(t),,cN(t)}\{ c_{1}^{(t)},\dots,c_{N}^{(t)} \} are then revealed.
  4. Set wi(t+1)=wi(t)(1ϵci(t))w_{i}^{(t+1)}=w_{i}^{(t)}\cdot(1-\epsilon c_{i}^{(t)}).

We won't prove this (mainly because the lecture notes don't have a proof...), but we can show that selecting ϵ=logNT\epsilon=\sqrt{ \frac{\log N}{T} }, like before, results in E[Regret]O(TlogN)E[\text{Regret}]\leq O(\sqrt{ T\log N }), or E[Regret]/TO((logN)/T)E[\text{Regret}]/T \leq O(\sqrt{ (\log N)/T }). (In fact, it results in the same upper bound on the number of mistakes made).

Multiplicative Weights: Applications

The generalization of the experts problem, and the corresponding "negligible regret" multiplicative weights algorithm, is powerful and can be applied to many different problems. We briefly discuss a few of these applications.

Minimax Theorem (Zero Sum Games)

Consider a zero sum game with the payoff matrix

M=[M11M1nMn1Mnn]M=\begin{bmatrix} M_{11} & \dots & M_{1n} \\ \vdots & \ddots & \vdots \\ M_{n_{1}} & \dots & M_{nn} \end{bmatrix}

Where 1Mij1-1\leq M_{ij}\leq1.

Provided mixed strategies p=[p1pn]p=\begin{bmatrix}p_{1}\\\vdots \\p_{n}\end{bmatrix} (row) and q=[q1qn]q=\begin{bmatrix}q_{1}\\\vdots\\q_{n}\end{bmatrix} (column), we calculate

Score(p,q)=i,jpiMijqj=pTMq\text{Score}(p,q)=\sum_{i,j}p_{i}M_{ij}q_{j}=p^{T}Mq

The Minimax Theorem states that

minq maxp pTMq=maxp minq pTMq\underset{ q }{ \mathrm{min} }\ \underset{p}{\mathrm{max}}\ p^{T}Mq = \underset{p}{\mathrm{max}}\ \underset{q}{\mathrm{min}}\ p^{T}Mq

In other words, the score of the game is the same, regardless of whether the column player plays first (LHS) or the row player plays first (RHS).

We briefly introduce some notation. Let v1=maxp minq pTMqv_{1}'=\underset{p}{\mathrm{max}}\ \underset{q}{\mathrm{min}}\ p^{T}Mq be known as the gain-floor—the minimum payoff the row player will receive given the column player's attempt to minimize their payoff. Let v2=minq maxp pTMqv_{2}'=\underset{q}{\mathrm{min}}\ \underset{p}{\mathrm{max}}\ p^{T}Mq be known as the loss-ceiling—the maximum loss the column player will experience given the row player's attempt to maximize their payoff.

Proof.
It's easy to show the \geq direction. In natural language, this direction implies that moving first is, at the very least, not an advantage (i.e. the situation of the column player playing first will not produce a higher score than the situation of the row player playing first). This should make sense, intuitively, but we will formalize this intuition.

Consider that, for any pair of strategies (p,q)(p',q'), we have

minq (p)TMq(p)TMqmaxp pTMq\underset{q}{\mathrm{min}}\ (p')^{T}Mq \leq (p')^{T}Mq' \leq \underset{p}{\mathrm{max}}\ p^{T}Mq'

Hopefully this makes sense, intuitively. The LHS is choosing a qq vector such that it minimizes the score, based on the value of pp'. This is clearly \leq the score of the pair of strategies (p,q)(p',q'), since qq is chosen to be score-minimizing. Meanwhile, the score of (p,q)(p',q') is \leq the RHS, since the RHS is choosing a score-maximizing pp vector, based on the value of qq'.

From the RHS of the inequality, we note that

(p)TMqmaxp pTMqminq (p)TMqminq maxp pTMq\begin{align*} (p')^{T}Mq' &\leq \underset{p}{\mathrm{max}}\ p^{T}Mq' \\ \underset{q}{\mathrm{min}}\ (p')^{T}Mq &\leq \underset{q}{\mathrm{min}}\ \underset{p}{\mathrm{max}}\ p^{T}Mq \\ \end{align*}

This inequality holds for all pp', since the previous inequality held for all pairs (p,q)(p',q'). Therefore, it clearly holds if pp is the score-maximizing value, i.e.

maxp minq pTMqminq maxp pTMq\underset{p}{\mathrm{max}}\ \underset{q}{\mathrm{min}}\ p^{T}Mq \leq \underset{q}{\mathrm{min}}\ \underset{p}{\mathrm{max}}\ p^{T}Mq

However, it is comparatively difficult to show the \leq direction. To do so, we must either use the strong duality of LPs, or use the multiplicative weights algorithm.

Intuitively, we can use online learning to allow these players to repeatedly play the game and converge on the optimal solution. Consider the following procedure. For t=1,,Tt=1,\dots,T time steps,

  1. The column player selects picks a strategy q(t)q^{(t)} using multiplicative weights.
  2. The row player chooses the "best response" strategy p(t)=arg maxp(t) {(p(t))TMq(t)}p^{(t)}=\underset{ p^{(t)} }{ \mathrm{arg}\ \mathrm{max} }\ \{ (p^{(t)})^{T}Mq^{(t)} \}.
  3. The cost vector c(t)=p(t)Mc^{(t)}=p^{(t)}\cdot M for the column player is then revealed.
  4. The column player's expected loss is then (t)=(p(t))TMq(t)\ell^{(t)}=(p^{(t)})^{T}Mq^{(t)}.

Let us define p=1Tt=1Tp(t)\overline{p}=\frac{1}{T}\sum_{t=1}^{T}p^{(t)} and q=1Tt=1Tq(t)\overline{q}=\frac{1}{T}\sum_{t=1}^{T}q^{(t)}, i.e. the averages of the chosen strategies over all time steps. We aim to show that p\overline{p} and q\overline{q} are "approximate minimax strategies." In other words, we aim to show that

maxp pTMqminq pTMq+regret (small)\underset{p}{\mathrm{max}}\ p^{T}M\overline{q}\leq \underset{q}{\mathrm{min}}\ \overline{p}^{T}Mq + \text{regret (small)}

Claim: The following inequality is true.

maxp pTMqminq pTMq+O((logN)/T)\underset{p}{\mathrm{max}}\ p^{T}M\overline{q} \leq \underset{q}{\mathrm{min}}\ \overline{p}^{T}Mq+O(\sqrt{ (\log N)/T })

Proof.
(Explanation of certain steps included below.)

maxp pTMq=maxp 1Tt=1TpMq(t)(1)1Tt=1Tmaxp pMq(t)(2)=1Tt=1Tp(t)Mq(t)(3)minq 1Tt=1T(p(t))TMq+O(logNT)(4)=minq pTMq+O(logNT)(5)\begin{align*} \underset{p}{\mathrm{max}}\ p^{T}M\overline{q} &= \underset{p}{\mathrm{max}}\ \frac{1}{T}\sum_{t=1}^{T} pMq^{(t)} && (1) \\ &\leq \frac{1}{T}\sum_{t=1}^{T} \underset{p}{\mathrm{max}}\ pMq^{(t)} && (2) \\ &= \frac{1}{T}\sum_{t=1}^{T} p^{(t)}Mq^{(t)} && (3) \\ &\leq \underset{q}{\mathrm{min}}\ \frac{1}{T}\sum_{t=1}^{T} (p^{(t)})^{T}Mq +O\left( \sqrt{ \frac{\log N}{T} } \right) && (4) \\ &= \underset{q}{\mathrm{min}}\ \overline{p}^{T}Mq+O\left( \sqrt{ \frac{\log N}{T} } \right) && (5) \\ \end{align*}

Finally, we can conclude the proof of the minimax theorem, knowing this inequality is true.

minq maxp pTMqmaxp pMq(1)minq pTMq+O((logN)/T)(2)maxp minq pTMq+O((logN)/T)(3)\begin{align*} \underset{q}{\mathrm{min}}\ \underset{p}{\mathrm{max}}\ p^{T}Mq &\leq \underset{p}{\mathrm{max}}\ pM\overline{q} && (1) \\ &\leq \underset{q}{\mathrm{min}}\ \overline{p}^{T}Mq + O(\sqrt{ (\log N)/T }) && (2) \\ &\leq \underset{p}{\mathrm{max}}\ \underset{q}{\mathrm{min}}\ p^{T}Mq + O(\sqrt{ (\log N)/T }) && (3) \end{align*}

Finally, we note, as before, that limT(logN)/T=0\lim_{ T \to \infty }\sqrt{ (\log N)/T }=0, and therefore

minq maxp pTMqmaxp minq pTMq\underset{q}{\mathrm{min}}\ \underset{p}{\mathrm{max}}\ p^{T}Mq \leq \underset{p}{\mathrm{max}}\ \underset{q}{\mathrm{min}}\ p^{T}Mq

as desired.

\square

Approximate Solutions to LPs

I... got too lazy to take notes on this part. Feel free to take a look at these notes from CMU's 15859 offering from Fall 2011, though, if you're interested.

Sources