Foundations of Probability

Probability and Statistics

Ihor Miroshnychenko

Kyiv School of Economics

Flipping a Coins

Flipping a Coins

50% chance of heads

50% chance of tails

Let’s a head as a success (1) and a tail as a failure (0).

Flipping multiple coins

  • Simulate flipping a 10 🪙 \(\times\) 1 times:
0 0 1 0 1 0 1 1 0 0
  • And again:
1 0 1 0 0 1 0 0 0 0
  • Now, let’s flip a 🪙 \(\times\) 10 times:
5
  • And now 10 🪙 \(\times\) 10 times:
6 6 7 6 4 4 6 7 6 6

Unfair coin

  • Let’s simulate flipping a biased coin:
    • 80% chance of heads
9 9 8 9 9 5 9 8 9 9


  • And now 10 🪙 \(\times\) 10 times:
    • 20% chance of heads
1 3 2 2 2 3 4 0 3 0

Binomial distribution

A probability distribution is a mathematical description of the possible outcomes of a random variable.

\[X_{1 \times n} \sim \text{Binomial}(\text{size}, p)\]

  • \(X_{1 \times n}\): vector of \(n\) random variables
  • \(\text{size}\): number of trials
  • \(p\): probability of success

Simulating a Binomial Distribution

\[X \sim \text{Binomial}(10, 0.5)\]

\[\text{Pr}(X = 5)\]

Let’s simulate this:

  • n = 10000
  • size = 10
  • prob = 0.5

Finding probability mass function w/ simulation

\[\hat{\text{Pr}}(X = x) = \frac{1}{n} \sum_{i=1}^{n} I(X_i = x)\]

  • \(X_i\): \(i\)-th random variable
  • \(I\): indicator function

Calculating exact probability density

\[\text{Pr}(X = x) = \frac{n!}{x! \times (n - x)!} \times p^x \times (1 - p)^{n - x}\]

\[\text{Pr}(X = 5) = \frac{10!}{5! \times 5!} \times 0.5^5 \times 0.5^5 = \]

0.2460938

Factorial

\(n! = n \times (n - 1) \times \ldots \times 1\)

Cumulative distribution function

\[X \sim \text{Binomial}(10, 0.5)\]

\[\text{Pr}(X \leq 4)\]

Calculating cumulative density with sim.

\(\hat{\text{Pr}}(X \leq 4) =\) 0.3791

\(\text{Pr}(X \leq x) = \sum_{i=0}^{x} \text{Pr}(X = i)\)

\(\text{Pr}(X \leq 4) = \sum_{i=0}^{4} \text{Pr}(X = i) =\) 0.3769531

Expected Value and Variance

Properties of a distribution

Expected Value

\[X \sim \text{Binomial}(\text{size}, p)\]

\[\text{E}(X) = \text{size} \times p\]

The mean of our sample distribution is 4.9956.

If we try to find mean of the sample with size = 100 and prob = 0.2: 20.0495.

Variance

\(X \sim \text{Binomial}(10, 0.5)\)


\(\text{Var}(X) = \text{size} \times p \times (1 - p)\)


\(\text{Var}(X) = 10 \times 0.5 \times (1 - 0.5) = 2.5\)

\(Y \sim \text{Binomial}(100, 0.2)\)




\(\text{Var}(Y) = 100 \times 0.2 \times (1 - 0.2) = 16\)

Rules of Expected Value and Variance

\[X \sim \text{Binomial}(\text{size}, p)\]

\[\text{E}(X) = \text{size} \times p\]

\[\text{Var}(X) = \text{size} \times p \times (1 - p)\]

Laws of probability

Event A: Coin is heads

\(A = 1\)

\(A = 0\)

Events A and B: Two Different Coins

\(A = 1\)

\(A = 0\)

\(B = 1\)

\(B = 0\)

Probability of A and B (independent)

flowchart LR
    A["Start"] -->|"Pr(A)"| B["A = 1"]
    A -->|"1 - Pr(A)"| C["A = 0"]
    B -->|"Pr(B)"| D["B = 1"]
    B -->|"1 - Pr(B)"| E["B = 0"]
    C -->|"Pr(B)"| F["B = 1"]
    C -->|"1 - Pr(B)"| G["B = 0"]
    D --> H["A = 1, B = 1"]
    E --> I["A = 1, B = 0"]
    F --> J["A = 0, B = 1"]
    G --> K["A = 0, B = 0"]

\[\text{Pr}(A \text{ and } B) = \text{Pr}(A) \times \text{Pr}(B)\]

\[ \text{Pr}(A \text{ and } B) = 0.5 \times 0.5 = 0.25\]

Note

For dependent events: \(\text{Pr}(A \text{ and } B) = \text{Pr}(A) \times \text{Pr}(B | A)\)

Probability of A or B

\[\text{Pr}(A \text{ or } B) = \text{Pr}(A) + \text{Pr}(B) - \text{Pr}(A \text{ and } B)\]

\[\text{Pr}(A \text{ or } B) = \text{Pr}(A) + \text{Pr}(B) - \text{Pr}(A \times B)\]

\[\text{Pr}(A \text{ or } B) = 0.5 + 0.5 - 0.25 = 0.75\]

Three Coins

\[ \begin{aligned} \text{Pr}(A \text{ or } B \text{ or } C) &= \text{Pr}(A) + \text{Pr}(B) + \text{Pr}(C) \\ &- \text{Pr}(A \text{ and } B) - \text{Pr}(A \text{ and } C) - \text{Pr}(B \text{ and } C) \\ &+ \text{Pr}(A \text{ and } B \text{ and } C) \end{aligned} \]

Multiplying random variables

\[X \sim \text{Binomial}(10, 0.5)\]

\[Y \sim 3 \times X\]

\(E[k \times X] = k \times E[X]\)

\(Var[k \times X] = k^2 \times Var[X]\)

Adding two random variables (independent)

\[X \sim \text{Binomial}(10, 0.5)\]

\[Y \sim \text{Binomial}(100, 0.2)\]

\[Z \sim X + Y\]

\[E[X + Y] = E[X] + E[Y]\]

\[Var[X + Y] = Var[X] + Var[Y]\]

Bayesian statistics

20 flips of a coin

  • 14 heads
  • 6 tails

Two piles of 50k coins

  • 50k fair coins

  • 50k unfair coins

  • 20 flips each

  • 14 heads

  • 6 tails

  • Which pile did the coin come from?

\(\text{Pr}(\text{Biased | 14 heads}) = \frac{\text{biased w/ 14 heads}}{\text{total w/ 14 heads}} = \frac{8356}{1903 + 8356} =\) 0.8145043

Differently sized piles

\(\text{Pr}(\text{Biased | 14 heads}) = \frac{\text{biased w/ 14 heads}}{\text{total w/ 14 heads}} = \frac{1698}{1698 + 3440} =\) 0.3304788

Bayes’ theorem

Conditional probability

\[ \text{Pr}(\text{Biased | 14 heads}) = \frac{\text{Pr}(\text{14 heads and Biased})}{\text{Pr}(\text{14 heads and Biased}) + \text{Pr}(\text{14 heads and Fair})} \\ = \frac{\text{Pr}(\text{14 heads | Biased}) \times \text{Pr}(\text{Biased})}{\text{Pr}(\text{14 heads | Biased}) \times \text{Pr}(\text{Biased}) + \text{Pr}(\text{14 heads | Fair}) \times \text{Pr}(\text{Fair})} \]


Bayes’ theorem

\[ \text{Pr}(\text{A | B}) = \frac{\text{Pr}(\text{B | A}) \times \text{Pr}(\text{A})}{\text{Pr}(\text{B})} \]

\[ \text{A} = \text{Biased}, \text{B} = \text{14 heads} \]

Normal distribution

Flipping a Coin 10 times

Flipping a Coin 1000 times

Normal distribution: mean and standard deviation

\[X \sim \text{Normal}(\mu, \sigma)\]

\[\sigma = \sqrt{\text{Var}(X)}\]

Normal approximation of binomial

\[\mu = \text{size} \times p\]

\[\sigma = \sqrt{\text{size} \times p \times (1 - p)}\]

The Poisson distribution

Flipping many conis, each with low probability of heads

\[X \sim \text{Binomial}(1000, 1 / 1000)\]

This particular case of the binomial, where \(n\) is large and \(p\) is small, can be approximated by the Poisson distribution.

Properties of the Poisson distribution

\[X \sim \text{Poisson}(\lambda)\]

  • \(E[X] = \lambda\)
  • \(Var[X] = \lambda\)

Poisson distribution

  • modeling how many people walk in in each hour in a store
  • number of emails received in a day
  • number of phone calls received in a day

Geometric distribution

  • Number of flips until the first head
  • Whaiting time until the machine breaks


  • \(X \sim \text{Geometric}(p)\)
  • \(E[X] = \frac{1}{p} - 1\)
  • \(Var[X] = \frac{1 - p}{p^2}\)

Are there any other distributions?