Tuesday, November 13, 2012

Feedback

This year's course finished on 13 November 2011. I handed out the standard faculty feedback forms in the lecture on 8 Novembers. I obtained 63 forms back.

You can also give feedback using my equivalent on-line form. You may find it easier to tick the radio buttons and type into this form than to handwrite onto the paper form - and you have space to write interesting comments. 8 of these have been received so far.

This will be sent to my email anonymously. After reading the responses, I will forward printed copies to the Faculty Office.

Notation

Some students say that the notation is one of the most difficult things about this course. I recommend that you make for yourself a one-page crib sheet of all the notation:

\begin{align*}
&(X_n)_{n\geq 0},\ I,\ \text{Markov}(\lambda,P),\ P = (p_{ij}),\ P(n) = (p_{ij}^{(n)}),\\[12pt]
&h_i^A,\ k_i^A,\ H_i,\ T_i,\ V_i,\ V_i^{(n)},\ f_i,\ \lambda,\ \pi,\ \gamma_i^k,\ m_i,\ m_{ij} .
\end{align*}
Write a little explanation for yourself as to what each notation means, and how it used in our theorems about right-hand equations, recurrence/transience, left-hand equations, existence/uniqueness of invariant measure, aperiodicity/periodicity, positive/null recurrence and detailed balance. It should all seems pretty straightforward and memorable once you summarise it on one page and make some notes to place it in context.

Of course I could easily typeset a page like this for you — but I think that you'll learn more, and it will be more memorable for you personally, if you create this crib sheet yourself!

Lecture 12

The last lecture was on Tuesday November 13. I have enjoyed giving this course and I hope you have enjoyed it too. Thank you for your feedback.

The notes at the end of the course are slightly improved from those posted at the time of Lecture 1. I thank those of you who have spotted some typos.

You can now also look at the overhead projector slides that I sometimes used in lectures, such as those today that I used to summarise four Part II courses that you may like to study next year: Probability and Measure, Applied Probability, Optimization and Control, and Stochastic Financial Models.

In my discussion of random walk and electrical networks in Section 12.4 I appealed to Rayleigh's Monotonicity Law: " if some resistances of a circuit are increased (decreased) the resistance between any two points of the circuit can only increase (decrease)." A proof of this "obvious" fact can be constructed by (i) proving Thomson's Principle: "Flows determined by Kirchhoff's Laws minimize energy dissipation", and then (ii) showing that Thomson's Principle implies Rayleigh's Monotonicity Law. You can read the details of this in Doyle and Snell Random walks and electric networks, pages 51–52.

In Section 12.2 I mentioned Burke's output theorem (1956) which says that the output process of a $M/M/1$ queue in equilibrium is a Poisson process with the same rate as the input. In writing "$M/M/1$" the "$M$"s mean Markovian (i.e. a Poisson input process of rate $\lambda$ and i.i.d. exponentially distributed service times with parameter $\mu$ (where $\mu > \lambda$), and the "$1$" means a single server. In queueing theory this very useful notation is known as Kendall's notation. For example, a $D/G/m$ queue is one in which arrivals are separated by a deterministic time, there are general service times and $m$ servers working in parallel.)

I remarked (just for fun) that queueing is the only common word in the OED with five vowels in a row. Obscure words are ones like "miaoued" (what the cat did).

I once proved a generalization of Burke's output theorem that holds even when the queue has not reached equilibrium (see: The interchangeability of $\cdot/M/1$ queues in series, Weber, 1979). Suppose we have two single-server first-in-first-out queues in series, which we might write as $\cdot/M/1 \to /M/1$. The customers' service times in the first queue are i.i.d. exponentially distributed with parameter $\lambda$ and in the second queue they are i.i.d. exponentially distributed with parameter $\mu$. On finishing service in the first queue a customer immediately joins the second queue. Suppose the system starts with $N$ customers in the first (upstream) queue and no customers in the second (downstream) queue. My theorem says that all statistics that we might measure about the departure process from the second queue are the same if $\lambda$ and $\mu$ are interchanged. Thus by observing the process of departures from the second queue we cannot figure out which way around the two $/M/1$ servers are ordered. For example, the time at which we see the first departure leave the second queue has expected value $1/\lambda + 1/\mu$ (which is symmetric in $\lambda$ and $\mu$). Its moment generating function is $\lambda\mu(\theta-\lambda)^{-1}(\theta-\mu)^{-1}$. The expected time at which the second departure takes place is
$$
\frac{2 \lambda^2+3 \lambda\mu+2 \mu^2}{\lambda^2 \mu+\lambda \mu^2}.
$$All other statistics are also symmetric in $\lambda$ and $\mu$. It was by computing such quantities that I first formulated the conjecture that this might be true. Burke's theorem is a corollary of my result that can be obtained by thinking about $N$ tending to infinity (can you see how?)

Markov chains have been studied for about 100 years, but there are still interesting things to discover. An important on-going research area is Markov chain Monte Carlo.

Thursday, November 8, 2012

Fast retries lemma

There are many interesting consequences of reversibility. Here is one that I talked about at the end of today's lecture and which has a very practical application.

Theorem. If $P$ is the transition matrix of a reversible Markov chain on $N$ states then for all $i$,
$$
P_i(X_n=i)=p_{ii}^{(n)} = \sum_{k=1}^N a_k \lambda_k^n,
$$where all $a_k\geq 0$ and all $\lambda_k$ are real and $\lambda_k\leq 1$.

Hint. To prove this, let $\pi$ be the invariant distribution and consider $A=VPV^{-1}$, where $V$ is the diagonal matrix with $V_{ii}=\sqrt{\pi_i}$. Show that $A$ is symmetric, and think about what that implies.

Similarly, for the continuous time version of a Markov chain (a so-called Markov process)
$$
P_i(X(t)=i)=p_{ii}(t) = \sum_{k=1}^N a_k e^{-\lambda_k t}.
$$where all $a_k\geq 0$ and all $\lambda_k$ are real and $\lambda_k\geq 0$.

I have used this fact to prove what I call the fast retries lemma (Theorem 1 in the paper Optimal call routing in VoIP). Suppose that you ring up your friend and find that his phone is engaged. There is a minimum time that it takes to retry the call, say $\tau$, so you could try to reach him at times $0,\tau,2\tau,\dotso$ until you get through. But might the expected time to get through be less if you retry at $\rho,2\rho,\dotso$, for some $\rho>\tau$? You might get through on the second attempt rather than the third, and $2\rho< 3\tau$.

It turns out the answer is no. Your best policy is to attempt contact at $0,\tau,2\tau,\dotso$, provided that the state of your friend can be modelled as a state of a reversible Markov process, and that you get through to him as soon as you ring and find that he is not in the state (engaged) which he was at time $0$. For example, your friend might be performing a random walk on a graph whose vertices are "engaged on the phone", "cooking", "sleeping", "watching tv", etc.

The expected time to connect is $ET$ where
$$
ET = t + p_{00}(t)ET = \frac{t}{1-p_{00}(t)}.
$$
So the lemma is saying that, subject to the constraint $t\geq \tau$, $ET$ is minimized by $ t=\tau$.

Lecture 11

I started the lecture by mentioning the second law of thermodynamics, which says that entropy is a nondecreasing function of time.

For a distribution $p=(p_1,\dotsc, p_k)$ the entropy, $H(p) = –\sum_i p_i \log(p_i)$, is a measure of disorder, or how surprising on average would be an outcome that is chosen according to this distribution. For example, the outcome of the toss of a biased coin is less surprising on average than the outcome of a toss of a fair coin, and this is expressed by the inequality
$$
– p \log_2 p – (1–p) \log_2(1–p) \leq\ – (1/2) \log_2(1/2) – (1/2) \log_2(1/2) = 1.
$$
When the $\log$ is taken base 2 then $H(p)$ is a lower bound on the average number of binary bits that would be required to communicate an outcome that is chosen as one of $k$ possible outcomes according to this distribution. The bound can be achieved if and only if every component of $p$ is one of $1/2, 1/4, 1/8,\dotsc$. If that is not the case then we might consider taking $m$ i.i.d. samples from this distribution, and then try to communicate these m results optimally as one block. There exists a coding that does this and needs only $m\cdot H(p)$ bits, asymptotically as as $m$ tends to infinity.

I gave an example of a 4 state Markov chain which starts in state 1, and thus having entropy $H((1,0,0,0))=0$. As $n$ increases the distribution given by the first row of $P^n$ tends to $(1/4,1/4/,1/4,1/4)$, and the entropy increases monotonically to 2 (using log base 2). The point of this example was to motivate the idea that reversibility is only going to make sense once our Markov chain has reached its equilibrium. Otherwise the process will look different when reversed in time because it will appear that entropy is decreasing.

In fact, I cheated a little here, because it is not always the case that the entropy of $p_1^{(n)}=(p_{1j}^{(n)}$, $j=1,\dotsc,k$) is monotonically increasing in $n$. This is true if and only if the equilibrium distribution of the Markov chain is the uniform distribution (as it was in my example). Nonetheless the same point is valid. Unless we start the chain in equilibrium we will be able to detect some difference between the process run forward in time and the process run backwards in time.

There is a Part II course called Coding and Cryptography in which you can learn more about entropy as it relates to efficient communications. Not surprisingly, it also crops up in the Cosmology and Statistical Physics courses.

Tuesday, November 6, 2012

Lecture 10

We only proved the first part of Theorem 10.2 (Ergodic theorem). The second part is a simple corollary of the first part. For details you can look at page 3 in Section 1.10 of Jame's Norris's notes.

In this lecture I gave a demonstration how Engel's probabilistic abacus can be used to calculate the invariant distribution of a finite-state Markov chain in which all $p_{ij}$ are rational numbers. The puzzle is "why does this algorithm always terminate in a finite number of steps?" There is more about this algorithm in Example Sheet 2, #15 and Section 12.5 of the notes. I first heard about this problem from Laurie Snell circa 1975. At that time he said that the questions as why it works was still open.

The material on the random target lemma and Kemeny's constant is non-examinable, but I included it for fun. It is surprising (don't you think?) that the expected time to reach equilibrium (in the sense of this lemma) is independent of the starting state. (As well being a co-author with Laurie Snell of the book Finite Markov Chains, John Kemeny was President of Dartmouth College, and one of the inventors of the BASIC programming language.)

Of course there are other ways to think about the time that is required for a Markov chain to reach equilibrium. One is the mixing time, $\tau(\epsilon)$, which is defined for $\epsilon > 0$ as the the least time such that
\[
\max_i \sum_j | p_{ij}(n) - \pi_j | < \epsilon\quad\text{ for all }n \geq \tau(\epsilon).
\]This is closely related to the magnitude of the second-largest eigenvalue of $P$, say $\lambda_2$. The smaller is $|\lambda_2|$ then the smaller is the mixing time. In fact, one can prove bounds such as
\[
|\lambda_2| \frac{\log(1/(2 \epsilon))}{2(1−|\lambda_2|)} \leq \tau(\epsilon) \leq \frac{\log(1/(\pi^* \epsilon))}{1−|\lambda_2|}
\]where $\pi^*$ is the smallest component of $\pi$ (the invariant distribution).
Other quantities that are interesting are the coupling time (the time for two independent copies of the Markov chain to couple) and the cover time (the time for the Markov chain to visit every state at least once).

Recreational talk today

I will be giving the following talk today.

Hide and search games with multiple objects

Richard Weber

Talk at Sidney Sussex College (6.30pm, Old Library) November 6, 2012

Slides for this talk

You have mislaid $k$ objects (such as your keys, wallet, mobile phone, etc). Each is in lost in one of $n$ ($>k$) locations (such as coat pocket, desk drawer, under bed, gym locker, etc.). It will cost $c_i$ to search location $i$ and you want to minimize the expected cost in finding all your objects. “Sod’s law” predicts that the objects are distributed amongst the locations with whatever probability distribution makes your task most difficult. What is this “Sod’s law” distribution? In what order should you search the locations? I will discuss this and some similar problems.

Another cute one is this: “In how many places should a forgetful squirrel hide his nuts?”

Practical and worthwhile search problems include “rescue at sea”, “researching for a cancer cure”, and “oil exploration”.

Thursday, November 1, 2012

Lecture 9

Today's lecture was a high point in our course: the proof by coupling that for an ergodic Markov chain $P(X_n=i)\to\pi_i$ as $n\to\infty$.

A couple points raised by questions that students asked me about at the end:

If a finite state space Markov chain is aperiodic and irreducible then it is regular. I think I forgot to include the words "finite state space" when lecturing, but they are in this notes. This not true for an infinite state space chain. Consider the Makov chain on the non-negative integers with $p_{i,i+1}=p_{i,0}=1/2$. Then $p_{100,100}^{(n)}>0$ only if $n\geq 101$. So there is no $n$ for which $P^n>0$, i.e. $p_{ij}^{(n)}>0$ for all $i,j$.
In the proof of Theorem 9.8 the assumption of positive recurrence guarantees that there exists an invariant distribution $\pi$. From this we can see that $\pi\times \pi$ is the invariant distribution of the chain with state $W_n=(X_n,Y_n)$, and it existence implies that this chain is recurrent, and hence that $P(T<\infty)=1$. If we only knew that $X_n$ were recurrent (and so possibly null recurrent), we would only be able to claim the existence of an invariant measure $\lambda$, and $\lambda\times\lambda$ would be an invariant measure for $W_n$. But existence of an invariant measure does not imply recurrence (see Example 9.3).

I mentioned Vincent Doblin (1915-40) [also known as Wolfgang Doeblin] to whom is due the coupling proof of Theorem 9.8. There is a good article about his life in a 2001 article in The Telegraph: Revealed: the maths genius of the Maginot line. Some of Doblin's work was only discovered in summer 2000, having been sealed in an envelope for 60 years.

I quoted J. Michael Steele on coupling:

Coupling is one of the most powerful of the "genuinely probabilistic" techniques. Here by "genuinely probabilistic'' we mean something that works directly with random variables rather than with their analytical co-travelers (like distributions, densities, or characteristic functions.

I like the label "analytic co-travelers". The coupling proof should convince you that Probability is not merely a branch of Analysis. The above quote is taken from Mike's blog for his graduate course on Advanced Probability at Wharton (see here). Mike has a fascinating and very entertaining web site, that is full of "semi-random rants" and "favorite quotes" that are both funny and wise (such as his "advice to graduate students"). I highly recommend browsing his web site. It was by reading the blogs for his courses that I had the idea of trying something similar myself. So if you have been enjoying this course blog - that is partly due to him. Here is a picture of Mike and me with Thomas Bruss at the German Open Conference in Probability and Statistics 2010, in Leipzig.

The playing cards for the magic trick that I did in the lecture were "dealt" with the help of the playing card shuffler at random.org.

The following is a little sidebar for those of you who like algebra and number theory.

In the proof of Lemma 9.5 we used the fact that if the greatest common divisor of $n_1,\dotsc, n_k$ is $1$ then for all sufficiently large $n$ there exist some non-negative integers $a_1,\dotsc, a_k$ such that
\begin{align}
n &= a_1 n_1 + \cdots + a_k n_k.\tag{*}
\end{align}
Proof. Think about the smallest positive integer $d$ that can be expressed as $d = b_1 n_1 + \cdots + b_k n_k$ for some integers $b_1,\dotsc, b_k$ (which may be negative as well as non-negative). Notice that the remainder of $n_1$ divided by $d$ is also of this form, since it is $r =n_1 − m (b_1 n_1 + \cdots + b_k n_k$) for $m=\lfloor n_1 /d\rfloor$.

If $d$ does not divide $n_1$ then $r< d$, and $d$ fails to be the smallest integer that can be expressed in the form $d = b_1 n_1 + \cdots + b_k n_k$. So $d$ divides $n_1$. The same must be true for every other $n_j$, and so $d=\text{gcd}(n_1,\dotsc,n_k)=1$.

So now we know that it is possible to write $1 = b_1 n_1 + \cdots + b_k n_k$, and so we can also write $j = j (b_1 n_1 + \cdots + b_k n_k$), for all $j=1,\dotsc, n_1$. Finally, we can leverage this fact to conclude that for some large $N$ we can write all of $N, N+1, N+2,\dotsc, N+n_1$ in the required form (*), and hence we can also express in form (*) all integers $N + m n_1 + j$, where $m$ and $j$ are non-negative integers, i.e. we can do this for all integers $n\geq N$.

Tuesday, October 30, 2012

Lecture 8

In today's lecture we proved that if the Markov chain is irreducible and recurrent then an invariant measure exists and is unique (up to a constant multiple) and is essentially $\gamma^k$, where $\gamma_i^k$ is the expected number of hits on $i$ between successive hits on $k$. The existence of a unique positive left-hand (row) eigenvector is also guaranteed by Perron-Frebonius theory when $P$ is irreducible (see the blog entry on Lecture 2). This again points up the fact that many results in the theory of Markov chains can be proved either by a probabilistic method or by a matrix-algebraic method. Essentially, an invariant measure is a left-hand (row) eigenvector with eigenvalue 1.

In proving Theorem 8.5 today I raised a query as to whether (top of page 31) we should be saying
\[
p_{kj}+\sum_{i_0\neq k}p_{ki_0}p_{i_0,j}+\cdots\ =P_k(X_1=j\text{ and }T_k\geq1)+P_k(X_2=j\text{ and }T_k\geq 2)+\cdots
\]or
\[
p_{kj}+\sum_{i_0\neq k}p_{ki_0}p_{i_0,j}+\cdots\ =P_k(X_1=j\text{ and }T_k> 1)+P_k(X_2=j\text{ and }T_k> 2)+\cdots
\]In fact, the first is correct if we are allowing $j=k$, and the second will be true if we restrict to $j\neq k$ (which is what I was saying in the lecture). But either way of doing it is fine. We are really only trying to see that $\lambda_j\geq \gamma_j^k$ for $j\neq k$, since by assumption $\lambda_k=1=\gamma_k^k$.

I have been thinking a bit more about how better to explain why the assumption of recurrence is used in proving Theorem 8.4. This is rather subtle. Recurrence means that $P_k(T_k<\infty)=1$, which is equivalent to saying $P_k\left(\bigcup_{t=1}^\infty \{T_k=t\}\right)=1$, i.e. that we can cover the whole sample space by disjoint events of the form $\{T_k=t\}$ with $t$ finite. This justifies us writing
\[
\gamma_j^k = E_k \sum_{t=1}^\infty\sum_{n=1}^{t}1_{\{X_n=j\text{ and } T_k=t\}}= E_k \sum_{n=1}^{\infty}\sum_{t=n}^\infty1_{\{X_n=j\text{ and } T_k=t\}}=
E_k\sum_{n=1}^{\infty}1_{\{X_n=j\text{ and } n\leq T_k\}},
\]which is the first line of the multi-line algebra on page 30. I have now rewritten this part of the notes so as to spell out the details of this. Notice that in the case $j=k$ this formula is correct and gives $\gamma_k^k=1$ because $X_{T_k}=k$.

The final bits of the proofs of both Theorems 8.4 and 8.5 follow from Theorem 8.3. In Theorem 8.4, the fact of (iii) follows from Theorem 8.3 once we have proved (i) and (ii). And in Theorem 8.5 the fact that $\mu=0$ follows from Theorem 8.3 because $\mu_k=0$. However, in the notes the proofs of Theorems 8.4 and 8.5 are self-contained and repeat the arguments used in Theorem 8.3. (These proofs are taken directly from James Norris's book.)

I described a very simple model of Google PageRank (about which you can read a bit more at this Wikipedia link). This is just one example of a type of algorithm that has become very important in today's web-based multi-agent systems. Microsoft, Yahoo, and others have proprietary algorithms for their search engines. Similarly, Amazon and E-bay run so-called recommender and reputation systems. Ranking, reputation, recommender, and trust systems are all concerned with aggregating agents' reviews of one another, and of external events, into valuable information. Markov chain models can help in designing these systems and to make forecasts of how well they should work. Last year I set a Part III essay topic on Ranking, Reputation and Recommender Systems. Look at this link if you would like to read more about the topic, or get an idea of the sort of thing that students write about in Part III.

Thursday, October 25, 2012

Talk today: Maths use in the Computer Systems World

Some Maths of use in the Computer Systems World

Dr Milan Vojnovic, Microsoft Research Cambridge
Thursday 25 October 2012, 16:00-17:00
MR3.

CCA Industrial Seminar

In the talk, I will provide an overview of some of mathematical problems that arise in the context of computer software industry. This will include designing of online contests, information diffusion, and algorithms for big data. This context asks for a rich set of mathematical techniques including those from the game theory, applied probability, statistical learning, differential equations, and discrete mathematics. The talk should be accessible to anyone, providing a pictorial guide through real-world examples, and giving an idea about the techniques used and interesting end results.

Lecture 7

Today's lecture was an introduction to the ideas of an invariant measure and invariant distribution. I mentioned that the invariant distribution is also called a stationary distribution, equilibrium distribution, or steady-state distribution.

You might like to think about the implication of today's lecture for cards and card-shuffing (a subject which has facinated me from childhood, and in my undergraduate days when I was secretary of the Pentacle Club). It is always good to have a few standard models in your mathematical locker (such as card-shuffling, birth-death chains, random walk on $\mathbb{Z}$ and $\mathbb{Z}^2$, etc) with which you can test your intuition and make good guesses about what might or might not be true. The state space of a deck of cards is of size
\[
52! =80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000.
\]The equlibrium distribution is one in which each state has equal probability $\pi_i = 1/52$!.

I mentioned today that $m_i=1/\pi_i$ (which is intuitively obvious, and we prove rigorously in Lecture 9). This means that if you start with a new deck of cards, freshly unwrapped, and then shuffle once every 5 seconds (night and day), it will take you $1.237 \times 10^{61}$ years (on average) before the deck returns to its starting condition. Notice that this result holds for any sensible meaning of a shuffle, provided that it has the effect of turning the state space into one closed class (an irreducible chain).

A very important question in many applications of Markov chains is "how long does it take to reach equilbrium?" (i.e. how large need be $n$ so that the distribution of $X_n$ is almost totally independent of $X_0$?) You might enjoy reading about the work of mathematician/magician Persi Diaconis in answering the question "how many shuffles does it take to randomize a deck of cards?". Here is the paper Trailing the Dovetail Shuffle to its Lair.

On line bin packing

I talked for a few minutes about my research on on-line bin packing, in the paper "Markov chains, computer proofs, and average-case analysis of best fit bin packing". This provides an example of a multi-dimensional Markov chain in which we are interested in recurrence/transience properties, but such properties are much harder to pin down than they were for the random walks in $\mathbb{Z}^d$ that we considered in the last lecture.

In this research we consider items, of sizes which are uniformly chosen amongst the integers $1,2,\dotsc,j$, say $j=8$, that arrive in a stream, and as each item arrives it must be packed in a bin. Initially there are an infinite number of empty bins, of some size $k$, say $k=11$. In the Markov chain that models the process of on-line best-fit bin packing the state can be represented as $(x_1,x_2,\dotsc,x_{10})$, where $x_i$ is the number of bins that we have started, but which are not yet full, and which have a gap of $i$. One way to pack the bins is Best Fit. As each iterm arrives we place in a partially full bin that it most nearly completes, or in an empty bin if it fits in no partially full bin. So, for example, in state
$(x_1,x_2,\dotsc,x_{10})=(1,0,3,1,0,0,0,0,0,0)$ an arriving item of size 2 would be placed in the bin with a gap of 3 and take us to state $(2,0,2,1,0,0,0,0,0,0)$. An arrival of size 7 would start a new bin and take us to state $(1,0,3,2,0,0,0,0,0,0)$.

It is interesting to ask if infinitely there is a return to the state $(0,0,\dotsc,0)$ in which there are no partially-full bins present (i.e. if the Markov chain is recurrent). It turns out the the Markov chain with $(j,k)=(8,11)$ is transient, but one with $(j,k)=(8,10)$ is recurrent.

You might like to view these seminar slides for more details. (These were for a faculty colloquium and aimed at a general audience of mathematicians, and so it should be well within your knowledge of mathematics to understand these slides.)

There are applications of this theory to the design algorithms for building packets of Internet data. I also once did a project about on line packing of chicken pieces into supermarket packages of 4 chicken breasts, under the constraint that each package should be at least 500g. The idea is to do this so as to minimizes the average amount by which a pack exceeds 500g.

Tuesday, October 23, 2012

Lecture 6

This lecture marks the half way point of the course. We celebrated by proving George Polya's famous theorem (1921): that symmetric random walk is recurrent on $\mathbb{Z}$ and $\mathbb{Z}^2$, but transient on $\mathbb{Z}^d$ ($d>2$). In fact, we only proved transience only for $d=3$, but it is easy to see that this implies transience for all $d>2$ (Example Sheet 2, #2).

I mentioned and recommended Polya's famous little book "How to Solve It". I also quoted this story (from A. Motter):

While in Switzerland Polya loved to take afternoon walks in the local garden. One day he met a young couple also walking and chose another path. He continued to do this yet he met the same couple six more times as he strolled in the garden. He mentioned to his wife: how could it be possible to meet them so many times when he randomly chose different paths through the garden?

You can use the results in today's lecture to do Example Sheet 1 #15. It is easy to see from the hint that if we know that symmetric random walk on $\mathbb{Z}^2$ is recurrent then symmetric random walk on a honeycomb lattice must also be recurrent. But proving the converse is harder.

In fact, one can show that random walk on every sort of sensibly-drawn planar graph is recurrent. This includes the honeycomb lattice, and even graphs as strange as the Penrose tiling (which gives us a non-periodic graph). To prove this we must decide what we mean by a "sensibly-drawn planar graph" and then find a way to embed any such graph in in some other graph on which we know random walk is recurrent. You can learn more about the details of this in Peter Doyle and Laurie Snell's Random walks and electric networks, 2006 (freely available to download). The title of this book hints at a connection between random walks and electrical networks. I will explain the connection in Lecture in 12.

The material in Section 6.5 (Feasibility of wind instruments) was written after a short email correspondence with Peter Doyle. The story about cities which beggar their neighbours is my own fanciful invention, but it makes that same point without needing to brush up on the theory of fluid dynamics. It is topical in the light of recent controversy about companies taking advantage of the low corporation tax rates in Ireland and Luxembourg.

Corrections to notes

This page maintains a list of changes made to the notes since the first lecture, on 4 October, 2012. I continually update the M.pdf file to reflect these changes.

Lecture 11. Page 41. Proof. First we check that $\hat P$ is a stochastic matrix:

Lecture 10. On page 39 (10.2) I have changed a $E$ to $E_i$ and similarly in the next displayed line $E$s to $E_i$ and $E_j$.

Lecture 9. I have streamlined a bit the proof in step 2 on page 36.

Lecture 8. On page 31, $m_i=E(T_i)$ has been corrected to $m_i=E_i(T_i)$. See the blog post for some further comments on this lecture.

Lecture 6. In the final line of page 22 it should be

$p_{00}^{(2n)}=\cdots$

Lecture 4. Subsequent to the lecture I have made some improvements at the middle of page 14, to better explain where and how we use Fubini's theorem to interchange $\sum_{j\in I}$ and $\sum_{t\geq 1}$ in the proof of Theorem 4.3.

I also expanded the explanation of why $P_2(\text{hit 0})=P_1(\text{hit 0})^2.$ Notice that any path which starts in state $2$ and eventually hits state $0$ must at some point hit state $1$. That is why the strong Markov property gives us $P_2(\text{hit 0})=P_2(\text{hit 1})P_1(\text{hit 0})$.

Furthermore, $P_2(\text{hit 1})=P_1(\text{hit 0})$, This is because paths that go from $2$ to $1$ are in one-to-one correspondence with paths that go from $1$ to $0$ (just shifted $1$ to the right).

I made a change to Appendix C. Some students correctly pointed out to me at the end of the lecture that if in state $(2,3,3,0,0)$ we fire a chip from location $2$ then we reach $(3,0,4,1,0)$ and cannot subsequently fire $3$. So I should have said "If we fire 2 then we reach $(3,0,4,1,0)$ and we next must fire $0$ again, but ultimately the final configuration will be the same." This is not at all obvious! Its proof is on page 55 of the notes.

Lecture 3. I have made a few trivial changes at the end of page 12.

Lecture 2. Spotted by a student:

Page 5. for some constants $a_1,\dotsc,a_M$ $\to$ for some constants $a_1,\dotsc,a_m$

Page 7, line 1. indentify $\to$ identity.

Page 7. ~~In Theorem 2.4 (ii) we should say,"either $i=j$, or $\dots$". (The given condition does not cover the case of $i=j$. "Communicates" is supposed to be reflexive, i.e. $i\leftrightarrow i$.)~~

I have decided to revert Theorem 2.4 to a statement only about $i$ and $j$ that are distinct. So

Theorem 2.4. For distinct states $i$ and $j$ the following are equivalent.
\[
\begin{array}{cl}
(i) & i\rightarrow j;\\[6pt]

(ii) & p_{i_0i_1}p_{i_1i_2}\dotsm p_{i_{n-1}i_n}>0 \text{ for some states }i_0,i_1,\dotsc,i_n, \text{where }n\geq 1,\ i_0=i\text{ and }i_n=j;\\[6pt]

(iii) & p_{ij}^{(n)}>0\text{ for some }n\geq 1.
\end{array}
\]
Lecture 1. A student has spotted 3 mistakes in the notes for Lecture 1. I have uploaded a corrected file M.pdf. It is listed under RESOURCES in menu at the right of this page.

Page 1, Question (a): should be $\lim_{n\to\infty} p_{ij}^{(n)}=1/5$ instead of $\to 1/5$.

Page 2, final equation should start $P(X_n = i_n \mid X_0 = i_0, \dotsc, X_{n-1}=i_{n-1})=\dots$

Page 3, section 1.4: The second line should end with: "$\implies J = 1$".

Thursday, October 18, 2012

Tails You Win: The Science of Chance

I remind you about tv tonight. Here is a link where you will be able to see it on BBC iplayer.

SCIENTIFIC DOCUMENTARY: Tails You Win: The Science of Chance
On: BBC 4 (9)
Date: Thursday 18th October 2012 (starting this evening)
Time: 21:00 to 22:00 (1 hour long)

Documentary in which Professor David Spiegelhalter tries to pin down what chance is and how it works in the real world. A blend of wit and wisdom, animation, graphics and gleeful nerdery is applied to the joys of chance and the mysteries of probability, the vital branch of mathematics that gives us a handle on what might happen in the future. How can you maximise your chances of living till you're 100? Why do many of us experience so many spooky coincidences? Should I take an umbrella? These are just some of the everyday questions the film tackles as it moves between Cambridge, Las Vegas, San Francisco and Reading. Spiegelhalter discovers One Million Random Digits, a book full of hidden patterns and shapes, introduces us to the unit called the micromort (a one-in-a-million chance of dying), and uses the latest infographics to demonstrate how life expectancy has increased in his lifetime and how it is affected by our lifestyle choices - drinking, obesity, smoking and exercise.
(Stereo, Widescreen, Subtitles)

Lecture 5

Today's lecture was on recurrence and transience. Let me make a few comments and emphasise some points:

1. We make the definition that state i is recurrent if $P_i(V_i = \infty)=1$. It is defined to be transient otherwise, i.e if $P_i(V_i = \infty) < 1$. Later (in Theorem 5.3) we show that if $i$ is transient then actually $P_i(V_i = \infty)=0$ (but this is a consequence, not part of our starting definition of transience).

2. In the proof of Theorem 5.4 we use the fact that $p_{ii}^{(n+m+r)} \geq p_{ij}^{(n)} p_{jj}^{(r)} p_{ji}^{(m)}$. Please don't think that we are using any summation notation! (We never use summation convention in this course.) This inequality is a simply product of three terms on the right hand side and is a simple consequence of the fact that one way to go $i\to i$ in $n+m+r$ steps is to first take $n$ steps to go $i\to j$, then $r$ steps to go $j\to j$, and finally $m$ steps to go $j\to i$. There is a $\geq$ because there are other ways to go $i\to i$ in $n+m+r$ steps.

3. The proof of Theorem 5.3 can be presented in one line:
\begin{align*}
\sum_{n=0}^\infty p_{ii}^{(n)}&=
\sum_{n=0}^\infty E_i\left(1_{\{X_n=i\}}\right)
=E_i\left(\sum_{n=0}^\infty 1_{\{X_n=i\}}\right)=E_i(V_i)\\[12pt]
&=\sum_{r=0}^\infty P_i(V_i> r)
=\sum_{r=0}^\infty f_i^r=
\left\{\begin{array}{cc}
\infty, & f_i=1\\[6pt]
\frac{1}{1-f_i}, & f_i<1 \end{array}\right. \end{align*} Remember that if $A$ is an event then $P(A)=E(1_{\{A\}})$, where $1_{\{A\}}$ is the indicator random variable that $=1$ or $=0$ as $A$ does or does not occur.

4. In Theorem 5.5 we gave an important way to check if a state is recurrent or transient, in terms of the summability of the $p_{ii}^{(n)}$. This criterion will be used in Lecture 6. There are other ways to check for transience. One other way is explained in Theorem 5.9. This is to solve the RHE for the minimal solution to
\[
y_j = \sum_k p_{jk} y_k,\quad j \neq i,\quad \text{and } y_i=1.
\]So $y_j =P_j(\text{return to }i)$. Now check the value of $\sum_k p_{ik} y_k$. If it is $< 1$ then $i$ is transient. This is essentially the content of Theorem 5.9, which I have put in my published notes (in blue type) but am not going to discuss in lectures. However, you may find it helpful to read the Theorem. It's proof is simple.

Markov chain music

I played to you some Markov music, taken from the web site: Mathematics, Poetry, and Music by Carlos Pasquali. The tune is very monotonous and simple sounding because the distribution of $n$th note depends only on the $(n−1)$th note. More interesting music could be obtained by letting the state be the last $k$ notes, $k>1$. You might enjoy reading the Wikipedia article on Markov chains. It has a nice section about various applications of Markov chains to: physics, chemistry, testing, Information sciences, queueing theory, the Internet, statistics, economics and finance, social sciences, mathematical biology, games, music, baseball and text generation.

There are lots of other crazy things that people have done with Markov chains. For example, at the Garkov web site the author attempts to use a Markov chain to re-create Garfield comic strips.

A much more practical use of Markov chains is in the important area of speech recognition. So-called hidden Markov models are used to help accurately turn speech into text, essentially by comparing the sound that is heard at time $t$ with the possibilities that are most likely, given the sound heard at time $t−1$ and the modelling assumption that some underlying Markov chain is creating the sounds.

Tuesday, October 16, 2012

Lecture 4

Section 4.1 in today's lecture presents a general version of Example Sheet 1 #12. When doing #12, make sure you understand how the condition of "minimal non-negative solution" is being used.

You might be amused to know that Question #10 on Example Sheet 1 is a actually a tripos question from 1972, (Paper IV, 10C). I took IB in 1973 so I also saw this question as a student.

I made the comment that the the proof of Theorem 4.2 needs Fubini's theorem. This theorem says that we can reverse the order of sums (or integrals), i.e.
\[
\sum_i \sum_j a_{ij} = \sum_j \sum_i a_{ij}
\]when these are sums over countable sets and the sum is absolutely convergent. Fubini's theorem is presented in more generality in the Part II course Probability and Measure. Here we are using it to interchange the order of sums in the second line below. For $i\not\in A$,
\begin{align*}
k_i^A&=\sum_{t=1}^\infty P(H^A\geq t)=\sum_{t=1}^\infty\sum_{j\in I}P(H^A\geq t\mid X_1=j)P_i(X_1=j)\\[6pt]
&=\sum_{j\in I}\sum_{t=1}^\infty P(H^A\geq t\mid X_1=j)P_i(X_1=j)\\[6pt]
&=\sum_{j\in I} E_i(H^A \mid X_1=j)P_i(X_1=j)\\[6pt]
&=\sum_{j\in I}p_{ij}(1+k_j^A)\\[6pt]
&=1 + \sum_{j\in I} p_{ij}k_j^A
\end{align*}Notice that we did not need this extra subtlety when proving Theorem 3.4 in Section 3.3.

In today's lecture we had the definition of a stopping time. This brings to my mind a small riddle (which I think I heard from David Kendall). "How do you make perfect toast? Answer: Wait until it smokes – then 10 seconds less."

Stopping times play a large role in probability theory. One very important idea is the following. Consider Example Sheet 1 # 10, the gambler's ruin problem played on $\{0,1,\dotsc,10\}$. In in the fair game case or $p=q=1/2$ the gambler has $i$ and bets $j$ ($1\leq j\leq i$), then she is equally likely to next have $i−j$ or $i+j$. So $E(X_{n+1} \mid X_n)=X_n$, no matter how much she bets. This implies that $E(X_{n+1} \mid X_0)=X_0$ no matter how she bets. It is a theorem (Doob's optional sampling theorem) that $E(X_T \mid X_0)=X_0$ for any stopping time $T$ (such that $ET< \infty$, as indeed must be the case for any stopping time in this problem). Thus, we see that there is no way that the gambler can make any expected profit (or loss), no matter how clever a strategy she uses in choosing the sizes of her bets and when to stop gambling.

The optional sampling theorem also gives us a quick way to answer the first part of Example Sheet 1 #10. If $T$ is the first time that $X_n$ hits either $0$ or $10$, then $E(X_T \mid X_0=2)=2$ implies $P_2(\text{hit }0)0+P_2(\text{hit }10)10 = 2$. Hence $P_2(\text{hit } 10)=1−P_2(\text{hit }0)=1/5$. That's even easier than solving the RHEs!

The probabilistic abacus

I spent the final few minutes of this lecture describing Arthur Engel's probabilistic abacus (a chip firing game) for calculating absorption probabilities in a finite-state Markov chain in which all entries of $P$ are rational. My slides and commentary are in Appendix C, and also an exposition of Peter Doyle's proof that the algorithm really works. I first heard about this abacus in 1976 when Laurie Snell was visiting the Statistical Laboratory. Snell (1925-2011) was a student of Joe Doob, one of the 'greats' of probability theory, whose optional sampling theorem I have mentioned above. Snell is the author (with John Kemeny) of several classic textbooks (including one called "Finite Markov Chains"). He is founder of Chance News (which can be fun to browse). A particular memory that I have of Professor Snell is that he liked to go to London to play roulette at the casinos there. This struck me that is a very peculiar recreation for an expert in probability. But I think it was for fun - he never claimed to make money this way.

Thursday, October 11, 2012

The Birthday Problem

I was thinking this morning about the well-known Birthday Problem. (This is nothing to do with Markov Chains.) The Wikipedia article discusses a large number of interesting generalizations$^\dagger$, but it does not seem to cover the following.

How many students do we need in a lecture so that it is more likely than not that at least $k$ of them share the same birthday?

It is well-known that 23 suffices when $k=2$. But what about $k=3$ or $4$?

$\dagger\ $ I enjoyed reading this. Suppose $X_1,\ldots,X_n$ are i.i.d. random variables uniformly distributed on the integers from 1 to 1 million. How large must $n$ be so that with probability exceeding $1/2$ there exists a set $S\subset\{1,\dotsc,n\}$ such that the two sums $\sum_{i\in S}X_i$ and $\sum_{i\not \in S}X_i$ are equal, (or differ by no more than 1 in the case that $\sum_i X_i$ is odd)?

Apparently the correct answer is surprisingly small, $n=23$.

Lecture 3

You should now be able to do all of Example Sheet 1 (excepting that #12 will be easier after seeing Section 4.1 in the next lecture).

I mentioned that Theorem 3.4 in this lecture is similar to the result that you were taught in Probability IA, regarding the probability of ultimate extinction in a branching process. Remember that in a branching process each individual independently produces offspring in the next generation, according to a distribution in which there are $k$ offspring with probability $p_k$ ($k=0,1,\dots$). Given that we start with one individual, the probability of ultimate extinction, say $u$, is the minimal solution to
\[
u = G(u) = \sum_k p_k u^k
\]where $G$ is the probability generating function of the number of offspring.

Do you remember the proof that $u$ is the minimal solution to $u=G(u)$, and do you see how similar it is to the proof in Theorem 3.4?

I have called the equations of the form $x=Px$, i.e. $x_i = \sum_j p_{ij}x_j$, the right-hand equations, because $x$ appears on the right-hand side of $P$. Later in the course we will find a use for left-hand equations, of the form $x=xP$. So far as I can tell, this terminology does not appear in any modern books on Markov chains. However, it was language used by David Kendall in the course I took from him in 1972 and I have always found it to be helpful as mnemonic.

I hope you enjoy question #13 on Example Sheet 1. It contains a result that many people find surprising. In a Red-Black game in which $p < q$ the strategy of bold play is optimal (but not necessarily uniquely so). This fact is proved in the Part II course Optimization and Control (see Section 4.3 "Optimal gambling" in the Optimization and Control course notes.) Part of that course is about Markov Decision Processes, which are Markov chains in which we have some control over the transitions that occur, and we try to minimize (or maximize) costs (or rewards) that accrue as we move through states.

Question #14 on Examples Sheet 1 extends the idea of a gambling game between 2 players to that of a game amongst 3 players. I think it is remarkable that there exists such a simple formula for the expected number of games that will be played until one of the three players becomes bankrupt. No one has ever discovered a simple formula for this game with $\geq 4$ players.

Tuesday, October 9, 2012

Examples Sheet 1, #7 and Mathematica

In question #7 of Examples Sheet 1 you will explore how the form of $p_{11}^{(n)}$ differs, depending on whether the eigenvalues of the characteristic polynomial are real and distinct, complex, or repeated. Although it is intended that you do this question without the aid of a computer, you might like to check your answers by writing a small program to do the algebra.

I strongly believe that you should not only understand theory and proofs, but also be able to use modern computing tools to quickly work out examples.

I highly recommend that you install Mathematica and use it while you study and work on examples sheets (in many courses). I personally use it on almost a daily basis. You can download a free copy:

http://www.damtp.cam.ac.uk/internal/computing/software/mathematica/

It is very easy install and learn to use. The time you spend learning to use it (and similarly MATLAB) is a very good investment.

Below are some short Mathematica programs for Examples Sheet 1, #7. I think that a well-motivated student might like do this question by hand, and then subsequently check the answer using Mathematica. By the way, I would not expect an examiner to set a question in tripos that is as difficult to do by hand as doing all of (a) (b) and (c) in an example as hard as #7.

Clear[P,p,p11]

P={{0,1,0},{0,2/3,1/3},{p,1−p,0}};

(* Solution to (a) *)
mu=Eigenvalues[P /. p→1/16]
p11[n_]=a mu[[1]]^n+ b mu[[2]]^n+ c mu[[3]]^n;
Solve[{p11[0]==1,p11[1]==0,p11[2]==0},{a,b,c}]
p11[n] /. %[[1]] //Expand

Out[1]= {1,-(1/4),-(1/12)}
Out[2]= {{a->1/65,b->-(2/5),c->18/13}}
Out[3]= 1/65-1/5 (-1)^n 2^(1-2 n)+1/13 (-1)^n 2^(1-2 n) 3^(2-n)

(* Solution to (b) *)
mu=Eigenvalues[P /. p->1/6]
p11[n_]=a mu[[1]]^n+ b mu[[2]]^n+ c mu[[3]]^n;
Solve[{p11[0]==1,p11[1]==0,p11[2]==0},{a,b,c}]
p11[n] /. %[[1]] //ComplexExpand

Out[4]= {1,-(1/6)+I/6,-(1/6)-I/6}
Out[5]= {{a->1/25,b->12/25-(9 I)/25,c->12/25+(9 I)/25}}
Out[6]= 1/25+1/25 2^(3-n/2) 3^(1-n) Cos[(3 n \[Pi])/4]
+1/25 2^(1-n/2) 3^(2-n) Sin[(3 n \[Pi])/4]

(* Solution to (c) *)
mu=Eigenvalues[P /. p->1/12]
p11[n_]=a +(b+c n)mu[[2]]^n;
Solve[{p11[0]==1,p11[1]==0,p11[2]==0},{a,b,c}]
p11[n] /. %[[1]]

Out[7]= {1,-(1/6),-(1/6)}
Out[8]= {{a->1/49,b->48/49,c->-(6/7)}}
Out[9]= 1/49+(-(1/6))^n (48/49-(6 n)/7)

Lecture 2

Following this lecture you should be able to do Examples Sheet 1, #1−9 (except 8 (b), although you can probably guess what is meant).

This lecture was mostly about how to calculate the elements of $P^{(n)}$ ($=P^n$) by solving recurrence relations, or by tricks using symmetry. We ended the lecture with the definitions of communicating states, and closed and open classes. If the Markov chain consists of only one class (and so every state can be reached from every other) then the Markov chain is said to be irreducible.

The fact that we can find $p_{ij}^{(n)}$ by solving some recurrence relation follows from the fact that if $P$ is $m\times m$ and has characteristic polynomial
\[
q(x)=\det(xI−P) = c_0 +c_1 x+\cdots+c_m x^m
\]then $p_{ij}^{(n)}$ satisfies the recurrence relation
\[
c_0 + c_1 p_{ij}^{(1)}+ \cdots + c_m p_{ij}^{(m)} =0.\tag{1}
\]This follows from the Cayley-Hamilton theorem, which states that $q(P)=0$ (see IB Linear Algebra) and so in particular $(1)$. You know how to solve a recurrence relation like $(1)$ (from IA Differential Equations), and you also know that the form of the solution differs depending on whether the roots of this polynomial are real, complex or have any multiplicities greater than 1.

Notice that if $P$ is $m \times m$ and irreducible then
\[
Q=\frac{1}{m}(I+P+P^2+···+P^{m-1})
\] is a transition matrix and all its elements are positive. Can you see why? A hint is the pigeonhole principle.

Here now is a sidebar on some interesting results in matrix algebra that are related to today's topics. We said in this lecture that if $P$ is $m \times m$ and has $m$ distinct eigenvalues, $1, \mu_2,\dotsc, \mu_m$, then
\[
p_{ij}^{(n)} = a_1 + a_2 \mu_2^n + ··· + a_m\mu_m^n
\]for some constants $a_1, a_2,\dotsc,a_m$.

We would like to know more about the eigenvalues $\mu_2,\dotsc,\mu_m$. In particular, let $|\mu_j|$ denotes the modulus of $\mu_j$. If $|\mu_j| < 1$ for all $j > 1$ then $p_{ij}^{(n)}\to a_1$ as $n\to\infty$ (as we see happening in Example 2.1)

In fact, the following are true.

$|\mu_j| \leq 1$ for all $j > 1$.
Suppose there exists $n$ such that $P^n$ is strictly positive. Then $|\mu_j| < 1$ for all $j > 1$.

These facts are consequences of Perron-Frebonius theory. This theory (which is useful in many branches of mathematics) says the following.

Suppose $A$ is square matrix, which is non-negative and irreducible (in the sense that for all $i,j$, we have $(A^n)_{ij}>0$ for some $n$). Then there exists a positive real number, say $\lambda$, such that

(i) $\lambda$ is an eigenvalue of $A$,
(ii) $\lambda$ has multiplicity $1$,
(iii) both the left and right-hand eigenvectors corresponding to $\lambda$ are strictly positive,
(iv) no other eigenvector of $A$ is strictly positive,
(v) all eigenvalues of $A$ are in modulus no greater than $\lambda$,
(vi) if, moreover, $A$ is strictly positive then all other eigenvalues of $A$ are in modulus strictly less than $\lambda$, and
(vii) $\min_i \sum_j a_{ij} \leq \lambda \leq \max_i \sum_j a_{ij}$, i.e. $\lambda$ lies between the minimum and maximum of row sums of $A$.

So in the case that $A =P$, the transition matrix of an irreducible Markov chain, (vii) implies that $\lambda=1$ (and $1=(1,1,\dotsc,1)$ is the corresponding right-hand eigenvector).

Of the claims made earlier, 1 follows from (v) and (vii). To see 2, we have from (vi) that if $P^n$ is strictly positive then all its eigenvalues different to $1$ have modulus less strictly than $1$. But if $\mu$ is an eigenvalue of $P$ then $\mu^n$ is an eigenvalue of $P^n$. Hence we must have $|\mu| < 1$.

Thursday, October 4, 2012

MathJax

Thanks to a suggestion of Alex Chan I am now experimenting with MathJax, which claims to achieve "Beautiful math in all browsers". Interestingly, it is also used by the writers of CATAM news.

MathJax seems to work well. But if it does not look right, then you might need to refresh the page. Also, Javascript must be turned on. The only hiccup I have noticed is that the rendering is less good in Google Chrome for Windows than it is in other browsers.

Read beyond this point only if you want to learn more about some MathJax's additional features and what I did to tweak its installation into Blogger.

I began by adding some code to my Blogger template, so that Mathjax.js is loaded on each page.

<script type='text/x-mathjax-config'>  MathJax.Hub.Config({
    extensions: ["tex2jax.js"],
    jax: ["input/TeX", "output/HTML-CSS"],
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
      processEscapes: true
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });
</script>

<script src='http://cdn.mathjax.org/mathjax/2.1-beta/MathJax.js?config=TeX-AMS-MML_HTMLorMML-full' type='text/javascript'/>

</head>

I changed the body font for the posts from Arial to Georgia, so that the text fonts and maths fonts are better-matched.

I have also changed the line-spacing to 150% so as to leave more room for subscripts and superscripts in-line. I did that by adding to my template

.post {
  margin: 0 0 $(post.margin.bottom) 0;
}
.post * { line-height: 150%; }

There remains a problem with Google Chrome and Windows 7 in that the mathematics font is displayed too darkly. Firefox and Safari are OK; Chrome on a Mac and iPad looks good. The issue with Chrome is not solved. It seems there is bug in the way Chrome does its font anti-aliasing. There is some discussion of this bug here. We hope Google will eventually fix this bug.

If the maths it is not looking good for you then I suggest you clear your browser cache. I would be interested to know if anyone has trouble seeing the maths.

Another thing to try is to right click some maths and make sure that the rendering method is selected to be HTML-CSS

By right-click (or CMD right-click in OSX) you can also choose a setting to zoom-in the maths, or to see the LaTeX code that produced the maths.

Poll: give your view about printed notes

Here is a poll for you to complete. Are printed notes good or bad?

Lecture 1

Lecture 1 was on on Thursday October 4 at 10am. It was attended by almost all of the IB students.

Complete course notes are available. This is the second year I have lectured this course, so I hope these notes are mostly free of typos. However, I may make small changes to the notes during this term. I will be copying into the blog that I write this year some things I wrote in last year's blog, so not all of the following is original.

The example of a frog who jumping amongst lily pads comes from Ronald A. Howard's classic book, Dynamic Programming and Markov Processes (1960). I read this book long ago and the image has stuck with me and been helpful. It gets you thinking in pictures, which is good.

Markov chains are a type of mathematics that can be highly visual, by which I mean that the problems, and even the proofs (I'll give examples later), can be run through in the mind in a very graphic way --- almost like watching a movie play out.

I mentioned that our course only deals with Markov chains in which the state space is countable. Also, we consider only a discrete-time process, $X_0, X_1, X_2,\dots$ . It is not much different to address Markov processes whose state space is uncountable (such as the non-negative real numbers) and which evolve in continuous time. The mathematics becomes only mildly more tricky. An important and very interesting continuous-time Markov process is Brownian motion. This is a process $(X_t)_{t ≥ 0}$ which is continuous and generalizes the idea of random walk. It is very useful in financial mathematics.

Section 1.6 is about the calculation of $P^{(n)}$ ($=P^n$) for the 2-state case of
\[
P=\begin{pmatrix}
1-\alpha & \alpha\\
\beta & 1-\beta
\end{pmatrix}.
\]We found $P^n$ by writing $P=UDU^{−1}$, where $D$ is the diagonal matrix
\[
D=\begin{pmatrix}
1 & 0\\
0 & 1-\alpha-\beta
\end{pmatrix}.
\]So $P^n=UD^nU^{-1}$. We did not need to know $U$. But, as an exercise, we can easily find it. Notice that for any stochastic matrix $P$ there is always a right-hand eigenvector of $1=(1,1,\dotsc,1)^\top$ (a column vector of $1$s). This is because every row of $P$ sums to $1$. So there is always an eigenvalue of $1$. The other right-hand eigenvector of $P$ is $(\alpha,−\beta)^\top$, with eigenvalue $1−\alpha−\beta$. So we may take
\[
U=\begin{pmatrix}
1 & \alpha\\
1 & -\beta
\end{pmatrix},\quad\quad
U^{-1}=\frac{1}{\alpha+\beta}\begin{pmatrix}
\beta & \alpha\\
1 & -1
\end{pmatrix}.
\]The left-hand eigenvectors of $P$ are of course the rows of $U^{-1}$.

If $P$ has repeated eigenvalues we cannot diagonalize it as above. But we can write
\[
P= U J U^{-1}
\]where $J$ is a Jordan matrix. If $\lambda$ is an eigenvalue of $P$ with multiplicity $k$ then $p_{ij}^{(n)}$ will be of a form looking like
\[
p_{ij}^{(n)} = \cdots +\bigl(a_0+a_1 n+\cdots+a_{k-1}n^{k-1}\bigr)\lambda^n.
\] I will say more about this in Lecture 2.

In Section 1.6 I gave two methods of finding $p_{11}^{(n)}$. The first method obviously generalizes to chains with more than 2 states. The second method is specific to this 2-state example, but it is attractive because it gets to the answer so quickly.

If you find the first lecture is too easy-going, then there is something fun for you to think about in Appendix C.

Printed notes: Good or Bad?

I sometimes wonder whether it is helpful or not to publish full course notes. It is helpful in that we can dispense with some tedious copying-out, and you are guaranteed an accurate account. But there are also benefits to hearing and writing down things yourself during a lecture, and so I hope you will still do some of that.

I intend to say some things in every lecture that are not in the notes. In learning mathematics repeated exposure to ideas is helpful, so I hope that a combination of reading, listening, writing and solving problems will work well for you.

I would be interested to hear from you if you would like to tell me what you think about this, and how you find best to use the notes that I am putting here.

When I was a student I used to take the notes that I had made during lectures and then re-write them in the minimal number of pages that was sufficient for me to remember the essential content of the course. Here, for fun and some historical interest, are my notes for the 1972 course "Markov methods" as taught by David Kendall, and also my my summary revision notes. I condensed the course into 4 pages (on large computer line printer paper that we had in those days).

Looking at these notes now, I see flaws - and I expect many of you can do much better. Only the first 2 pages are on discrete-time Markov chains. The 1972 course also covered continuous-time Markov processes, which we now cover in the Part II course "Applied Probability" (but the 1972 IB course missed out many things that we are now doing, such as random walks and reversibility.) Incidentally, Question #10 on Example Sheet 1 is a tripos question from 1972, Paper IV, #10C.

Writing mathematics in this blog

What I say in this post is no longer correct, as I am now using Mathjax.

LaTeX is the language in which my lecture notes (and almost all modern mathematical papers) are written. In this blog I will use a pseudo-LaTeX notation for mathematics. So

x_i is "x sub i",
alpha is the Greek character α,

and so forth. (I say pseudo-LaTeX, because in actual-LaTeX one has to put \$ signs around things, e.g. \$x_i\$, and put a \ in front of a Greek character name, e.g. \alpha).

When I want to write in this blog some algebra or work with mathematical objects I sometimes use Mathematica's notation. So DiagonalMatrix[{1,1−alpha−beta}] is the 2x2 diagonal matrix which has 1 and 1−alpha−beta on the diagonal. Also, {{1,alpha},{1,−beta}} is the 2x2 matrix whose rows are {1,alpha} and {1,−beta}.

Monday, October 1, 2012

Course starts Thursday 4 October

The 2012 Markov Chains course of 12 lectures takes place on Tuesdays and Thursday at 10am in Mill Room 3, starting on 4 October.

Course timetable for your calendar

You can add the lecture timetable for the Markov Chains course to your Google calendar. Do

Other Calendars / Add by URL. You will see this pop-up.

Copy and paste the following into the URL box, and then click "Add Calendar":

https://www.google.com/calendar/ical/qkc3tv34usee04hs93mbhngvbc%40group.calendar.google.com/public/basic.ics

Do not simply click on the above link. Or if your do, then save the basic.ics file. You can then import the data in this file to your Google calendar with ""Other calendars" / "Import calendar" / "Choose file". You can also import this file to an Outlook or Apple calendar.

Alternatively, here is a complete IB lecture calendar

I am also providing a way for you to put any subset of the IB lecture timetable in your Google calendar. The information in this calendar is taken from http://www.maths.cam.ac.uk/lecturelists/PartIBWeb.pdf.

Step 1. Click on the link below and save the file to your desktop. It is called basic.ics.

https://www.google.com/calendar/ical/ca0kgd53k18ghrpat5l4k4a3m4%40group.calendar.google.com/public/basic.ics

Step 2. If you want to import the timetable to your personal calendar then you can skip this step. If you would like to import to a different calendar, then go to your Google calendar and on the left of the page select "My calendars" / "Create a new calendar" (giving it an appropriate name, such as "IB Lectures 2012-13").

Step 3. On the left of the page go to "Other calendars" / "Import calendar", choose the above file and import it to your IB Lectures 2012-13 calendar, or simply to your personal calendar if you skipped Step 2.

You will now have the entries for the dates and times of all the 17 IB lecture courses. With a few mouse clicks you can easily delete the series for the courses that you are not attending.