For this technical aside we will be looking at probability. While entire courses are written about probability and the various ways of examining it, this section is designed to give a brief refresher or training on basic probability. These ideas are used throughout statistics and so it is an important topic to be familiar with.


To start this refresher, first we look at proportions. Proportions are the fraction of the total items we are concerned with that have a particular characteristic. As with all parameters of the population we have similar statistics we can observe in our sample. The sample proportion can be found from the sample and used to infer the population proportion. Probabilities are similar to proportions in that there are the number of times an event happens in the long term behaviour of a system.

Before going any further we should define some terms that are going to be used in the rest of this aside.

Probability Definitions

  • A repeatable process that produces outcomes.
  • A run of the experiment
Sample Space (\(\Omega\) or S)
  • All possible outcomes that can be observed during the trials
  • A subset of the sample space
  • An event occurs when a particular outcome(s) is observered
  • A number experessing the likelihood that a specific event will occur.
  • Can be thought of as the proportion of the time the even happens in the experiment.

In this training we will be mainly looking at classical or frequentist statistics. The relative frequency probability of any outcome A is the long-term proportion of times that A is expected to occur when we observe a random process. The probability of an outcome/event denotes the relative frequency of occurrence of that outcome/event to be expected in the long run. The long run relative frequency of occurrence of outcome/event A over trials of the experiment should approach \(\Pr(A)\)

Relative frequency probabilities can be determined by either of these methods:

  • Making an assumption about the physical world and using it to define relative frequencies. e.g. coins are fair/balanced.
  • Observing relative frequencies of outcomes over many repetitions of the same situation or measuring a representative sample and observing relative frequencies of possible outcomes.

To illustrate this further, let us consider some examples.

Let’s Roll the Die

Lets consider an experiment where we take a fair die and roll it, recording the value on its upper face. The sample space for this experiment is
\Omega = \{1,2,3,4,5,6\}
Each of the values on the faces of the die are equally likely. So what is the probability of rolling a 4? Since there is only one way to obtain a 4 out fo the six possible outcomes from the die we get:

Animal survey

Ten small animal traps (Elliot Trap) are set for five sequential nights, and each trap can hold only one individual per night. The aim of the study is to capture native mice so they can be counted. If there are fourteen native mice captured in this study how would we define the trap success rate?

Lets define Trap success rate = Probability of trap containing a
native mouse
Total number of trap nights \(5\times 10 = 50\) trap nights
\Pr(\text{Trap filled with native mouse}) = \frac{14}{50} = 0.28

As Probability is a ratio of the number of outcomes in the
event by the number of outcomes in the sample space it has a
restricted scale between 0 and 1.
\Pr(\Omega) = 1
Assume \(\Omega = \{A,B,C\}\) then \(\Pr(A)+\Pr(B)+\Pr(C)\)

Visulising Probability

Venn Diagram

A\cap B \nonumber\\
A \text{ AND } B \nonumber

Restricted Scales

\(\Pr(\Omega)=1\), \(\Pr(A)=1\), \(\Pr(B)=0\)
Therefore \(0\leq \Pr(\text{Event}) \leq 1\)

Addition of Probability

\Pr(A\text{ OR } B)&= \Pr(A\cup B)\nonumber \\
&= \Pr(A) + \Pr(B) – \Pr(A \cap B) \nonumber

Mutually Exclusive (Disjoint)

\text{As } \Pr(A\cap B)&= 0 \nonumber \\
\Pr(A\cup B)&=\Pr(A) + \Pr(B) \nonumber



Alternative methods


  • Probability is only one measure of the likelihood of an event.
  • Another application of probability from Information Theory is the Surprisal (also called Self Information).
  • The Surprisal is a measure of how “surprised” we are of getting an outcome
  • The Expected Value of the Surprisal is the Information Entropy of a process is:
    I(w_n)&=-\log_2(\Pr(w_n)) \nonumber \\
    &= -\frac{\log_{base}(\Pr(w_n))}{\log_{base}(2)} \nonumber


Based on historical data, we can say that 40\% of males of a species of insectivorous marsupial die before reaching sexual maturity. Of sexually mature males, 30\% successfully breed. What is the suprisal that newborn male will successfully breed?
\Pr(\text{Mature} \cap \text{Breed}) &= 0.6\times 0.3 \\
&= 0.18 \\
I(\text{Breed}) &= -\log_2(0.18)\\
&\approx 2.4739312 \text{ bits} \nonumber


  • Alternatively we can calculate the odds of an event occurring.
  • Odds work best for binary variables (yes or no, 0 or 1, true or false)
  • In statistics odds of an event are given to 1.
  • E.g. The odds of getting a five on a fair die is \(0.2:1\)
  • Odds range from 0 to \(\infty\) (which can be useful for other techniques, such as regression)
  • We can also transform these odds by taking logarithms to extend the range from \(-\infty\) to \(\infty\).
  • How to Calculate Odds

    To calculate the odds from a known probability:

    • If the odds\(<1\), \(p<0.5\)
    • If the odds are odds\(=1\), \(p=0.5\)
    • If the odds are odds\(>1\), \(p>0.5\)

    Note, if we know the odds in favour of an event, the we can use that to work out the probability of it occurring via the inverse


    • Suppose we start out with 100 insects exposed to an experimental pesticide. After 5 days there are 17 insects left alive.
    • What are the Odds of an insect dying?
      \text{Odds}(\text{Death})&=\frac{0.83}{1-0.83}:1=\frac{83}{17}:1 \\
      &\approx 4.8823529 : 1

    Odds Ratio

    • An alternative way of analysing two groups in terms of how likely some outcome is to occur is through an odds ratio.
    • Odds ratios are more commonly quoted in medical research, as opposed to odds.
    • The odds ratio is simply the ratio of the odds in two different categories of an explanatory variable.
      \text{Odds ratio} = \frac{\text{Odds in category 1}}{\text{Odds in category 2}}
    Nicotine Placebo Total
    Reduction 52 18 70
    (26%) (9%) (17.5%)
    No Reduction 148 182 330
    (74%) (91%) (82.5%)
    Total 200 200 400
    (100%) (100%) (100%)

    The odds for a reduction in the nicotine group are
    \frac{0.26}{0.74}= 0.3514\text{ to 1,}
    while in the placebo group the odds are
    \frac{0.09}{0.91}= 0.0989\text{ to 1,}
    This gives an odds ratio of
    OR =\frac{.3514}{.0989}= 3.55.
    That is, the odds of sustaining a reduction in smoking after 4 months are 3.55 times higher if someone is using a nicotine inhaler.

    Probability (Function) Axioms}

    • Axiom — A self-evident or universally recognised truth
    • In other words, the rules of probability functions.
    • We can create a function that assigns a probability to an input value, e.g. \(\Pr(\text{Height} > 160\text{cm})\)
    • As we have seen in the Venn diagrams there are certain rules we must follow
      1. \(\Pr(\Omega)=1\)
      2. \(\Pr(A) > 0\) for all \(A\subseteq \Omega\)
      3. \(\Pr(A\cup B) = \Pr(A)+\Pr(B)\) if \(\Pr(A\cap B)=\emptyset\) (Empty Set)

    Conditional Probability

    • Lets say we want to calculate the probability of Event B happening if Event A has already taken place.
    • We need to calculate the conditional probability of Event B given A
    • Basically we take the probability of Event A and B happening and divide by the probability of Event A

      \Pr(B|A)=\frac{\Pr(A\cap B)}{\Pr(A)}=\frac{\Pr(B\cap A)}{\Pr(A)}

    Independent Events

    • Two events are independent if the occurrence of one event is not dependent on another event
      • For example: Your having blue eyes and it raining today
    • Given two independent events A and B, the conditional probability of A given B is equal to the probability of A
      \Pr(A|B) = \Pr(A)
    • So if two events are independent then their intersection is just the probabilities multiplied together
      \Pr(A\cap B) &= \Pr(A|B)\Pr(B) \nonumber \\
      &= \Pr(A)\Pr(B) \nonumber

      The Gambler’s fallacy or Monte Carlo Fallacy

      • The false belief that if deviations from expected behaviour are observed in repeated independent trials of some random process, future deviations in the opposite direction are therefore more likely.
      • e.g. if a fair coin is tossed repeatedly and tails comes up a
        larger number of times than is expected, someone may incorrectly believe that this means that heads is more likely
        in future tosses.
      • e.g. Casino gambler playing the big wheel and believing they can come up with a “system” to predict what will fall next – the house always wins!

      Sensitive Questions

      • When you ask questions about sensitive topics (such as Sex, Drug Use, Criminal History) to avoid non-response bias you use a Randomise Response Technique.
      • In these techniques, responses to the question are determined by a random process before given an answer.
      • For example
        • If you flip a coin and get heads you answer truthfully, if you get tails you answer yes regardless of your true answer.
        • Warner in 1965 proposed a method where standard responses were given with a set probability and then truthfully answers were given otherwise.
      • A group of 120 students were asked the sensitive question “Have you ever cheated on an exam?” using a random response technique.
      • Before answering the question each student secretly rolled a die. If the die indicated 1, 2, 3, 4 then they were asked to answer the question honestly. If the die indicated 5 then they were asked to answer “yes” regardless of their true answer. If the die indicated 6 then they were asked to answer “no”.
      • Of the 120 students, a total of 76 answered “yes” using this technique. Based on this sample, give an estimate of the true proportion of students who have cheated on an exam.

      Draw a tree


      • Each stage (START, ROLL, ANSWER) is independent of the other stages
      • However the final probabilities have to be calculated by following the “branches” through each stage.
      • When following through a branch, as the events are independent we multiply the probabilities
      • The branches are mutually exclusive (disjoint) with each other.
      • To combine two branches you add the probabilities.
      • REMEMBER: The total probability of all branches combined should equal 1.

      \text{Number of Yes Answers}=n\left(\frac{1}{6}+\frac{2}{3}p\right)


      YA &= n\left(\frac{1}{6}+\frac{2}{3}p\right) \nonumber \\
      n&=120 \nonumber \\
      YA&=76 \nonumber \\
      76 &= 120 \times\left(\frac{1}{6}+\frac{2}{3}p\right) \nonumber \\
      &= 20 + 80p \nonumber \\
      56&= 80 p\nonumber \\
      p&= \frac{56}{80} = \frac{7}{10}=0.7 \nonumber


      • Lets say we have a strain of wheat has a probability of germinating of 0.8
      • Given the wheat germinates it has a probability of 0.6 of maturing for harvest?
      • What is the probability of a seed maturing for harvest?

      Lets draw the tree


      \Pr(\text{Seed reaches maturity}) &= \Pr(\text{Seed Germinates}\cap \text{Matures}) \nonumber \\
      &=\Pr(\text{Matures}|\text{Germinates})\Pr(\text{Germinates}) \nonumber \\
      &= 0.6 \times 0.8 \nonumber \\
      &= 0.48 \nonumber


      2 Responses

      Leave a Reply

      This site uses Akismet to reduce spam. Learn how your comment data is processed.