Probability for machine learning and data science – Basic Probability 2 of 6

Probability Axioms

In this post we will look into probability axioms there are 3 basic axioms of probability and it is mentioned below

Axiom 1 : For any set A

P(A) \geq 0

Axiom 2 :

P (\Omega) = 1

Axiom 3 : if A_1, A_2, ... is any set of disjoint events, then

P\left(\bigcup\limits_{i=1}^{\infty}A_i\right) = \sum\limits_{i=1}^{\infty}P(A_i)


if A = \bigcup\limits_{i=1}^{\infty} A_i,. and A_1, A_2, .... are disjoint, then A_1, A_2, .... is said to be a partition of A

Axiom 3 also holds for finite collection of events A_1,..., A_n which is trivially true if you set A_{n+1} = \emptyset for all i \in \N

By using the above axioms we can get more axioms. Below are the results from axioms


P(A^c) = 1-P(A)


if A is contained in B ( A \subset B), then

P(B \cap A^c) = P(B) - P(A)

Inclusion – Exclusion

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Equally likely outcomes

In some cases, we can safely assume the outcomes are equally likely like if we roll a fair dice or toss a fair coin twice or more.


  • n_A is the number of sample points in event A
  • N is the number of sample points in a finite sample space \Omega

if all outcomes are equally likely in a sample space \Omega, then the probability that event A occurs is

  • n_A is the number of sample points in A
  • N is the number of sample points in \Omega

So in this post we have seen the Axioms of probability and in the next post we will start with Counting

Happy Learning

This post was made possible by LaTeX