Home > O-Level > E Maths > Statistics and Probability

Statistics and Probability

Definition

Population and Sample

A population refers to the entire set of data that we want to study. When we want to obtain reliable information about the population, it is often impossible or impractical for us to study the entire population as it may be too huge. Therefore, we need to select a sample which is a representative subset of the population.

To avoid any possible bias during sampling, we select a random sample such that any observations made are independent.

Discrete Data

Discrete variables take on a countable number of possible values and are often restricted to integer values. Examples include shoe sizes and number of students in a class. When discrete data is presented in a frequency table, it may be individual or grouped into classes.

Continuous Data

Continuous variables can take on any value in a certain range and are usually not restricted to integer values. Examples include height and weight. When continuous data is presented in a frequency distribution table, the data is grouped into ranges.

Presentation of Statistical Data

Frequency Table

A frequency table shows the frequency of occurrence of each element in a dataset.

Bar Chart

A bar chart is a diagram where discrete data is represented by horizontal or vertical bars. The size of each bar represents the frequency data.

Histogram

A histogram is a vertical bar chart representing numerical information, often used for continuous data. The area of each rectangle is proportional to its frequency. By using class boundaries instead of class intervals, there will not be any gaps between the bars.

Grouped Data

Class limits may be used to represent a certain class interval. However, to avoid any ambiguity, a class interval such as 50 to 59 will be taken as 49.5 to 59.5. In this example, the class width is 10 units.

Measure of Central Tendency

Mean

The mean is the sum of the observations divided by the total number of observations in a set. To calculate the mean of grouped data, we represent all values in a class interval by the middle value of the interval.

Population mean, $μ = \frac{\sum_{i = 1}^{k} f_{i} x_{i}}{n}$

Median

The median is the middle observation (for the case of an odd number of observations) or the mean of two middle observations (for the case of an even number of observations) when the set is arranged in ascending order.

Lower Quartile

The median divides the set into two halves. The median of the set of the lower values is known as the lower quartile.

Upper Quartile

The median of the set of higher values in a set divided into two halves is known as the upper quartile.

Interquartile Range

The interquartile range refers to the difference in value between the upper quartile and lower quartile.

Percentile

Similarly, a set of values can be divided into 100 equal parts. Each of the parts is called a percentile. The median corresponds to the 50th percentile. The lower quartile corresponds to the 25th percentile, while the upper quartile corresponds to the 75th percentile.

Mode

The mode refers to the observation which has the highest number of occurrences in a set. In a grouped frequency distribution, the class with the highest number of occurrences is known as the modal class.

Variance

The population variance is an indicator of how spread out the observations is from the average.

Population variance, $σ^{2} = \frac{\sum_{i = 1}^{k} f_{i} {(x_{i} - μ)}^{2}}{n} = \frac{\sum_{i = 1}^{k} f_{i} x_{i}^{2}}{n} - μ^{2}$

Standard Deviation

The sample standard deviation is the square root of the sample variance.

$Standard deviation = \sqrt{Variance}$

When data is given in the form of a frequency distribution, the standard deviation is given by

$s_{n} = \sqrt{\frac{\sum_{i = 1}^{n} f_{i} {(x_{i} - \bar{x})}^{2}}{\sum_{i = 1}^{n} f_{i}}}$ .

Probability

In a given possibility space (the set of all possible outcomes) U, each possible outcome is known as a sample point. If the possibility space has a finite number of sample points, the number of points is denoted by n(U).

For an event A which has m sample points, and that there are n sample points in the possibility space, the probability $P (A) = \frac{n (A)}{n (U)} = \frac{m}{n}$ . In other words, for an event A which can happen m out of n equally likely outcomes (i.e. there is no bias), the probability of it happening is denoted by P(A).

Since A is a subset of U, we have $0 \leq m \leq n$ , which can be simplified to $0 \leq \frac{m}{n} \leq 1$ . In other words, $0 \leq P (A) \leq 1$ .

Complementary Events

The complement of an event A refers to the event that A does not occur, and is usually denoted as $A^{'}$ .

$P (A^{'}) = \frac{n (A^{'})}{n (U)} = \frac{n - m}{n} = 1 - \frac{m}{n} = 1 - P (A)$

Venn Diagram

We may use a Venn diagram to illustrate the relationship between the probabilities of two events, A and B.

From the Venn diagram, we can observe this relationship:

$P (A or B) = P (A) + P (B) - P (A and B)$ , where $P (A) \neq 0$ and $P (B) \neq 0$ ,

which is also written as $P (A \cup B) = P (A) + P (B) - P (A \cap B)$ .

Conditional Probability

Given two events A and B such that $P (A) \neq 0$ and $P (B) \neq 0$ , the probability of A, given that B has already occurred is

$P (A | B) = \frac{P (A \cap B)}{P (B)}$ .

$P (A | B) \times P (B) = P (B | A) \times P (A) = P (A \cap B)$

Mutually Exclusive Events

Two events A and B are mutually exclusive if the probability of both events occurring at the same time is zero. In other words, both events A and B cannot occur together.

$\begin{matrix} A \cap B = \emptyset \\ n (A \cap B) = 0 \end{matrix}$

$\begin{matrix} P (A \cap B) = 0 \\ P (A \cup B) = P (A) + P (B) \end{matrix}$

Exhaustive Events

Two events A and B are exhaustive if the probability of either event occurring is 1.

$\begin{matrix} A \cup B = U \\ P (A \cup B) = 1 \end{matrix}$

The complementary events A and $A^{'}$ are mutually exclusive and exhaustive.

Independent Events

Two events A and B are independent if the occurrence of A does not affect the occurrence of B.

$\begin{matrix} P (A | B) = P (A) \\ P (A \cap B) = P (A | B) \times P (B) = P (A) \times P (B) \end{matrix}$

When given two independent events, we notice that the probability of each event remains constant, regardless of whether the other event has occurred.

Consider $P (A) = 0.3$ and $P (B) = 0.2$ , which gives us $P (A^{'}) = 0.7$ and $P (B^{'}) = 0.8$ .

$P (A \cap B) = P (A) \times P (B) = 0.06$ . We can find the other values based on the given information.

When we consider that A has occurred, the probability of B occurring and the probability of B not occurring remains the same.

$\begin{array}{l} P (B | A) = \frac{0.06}{0.24 + 0.06} = 0.2 = P (B) \\ P (B^{'} | A) = \frac{0.24}{0.24 + 0.06} = 0.8 = P (B^{'}) \end{array}$

When we consider that B has occurred, the probability of A occurring and the probability of A not occurring remains the same.

$\begin{array}{l} P (A | B) = \frac{0.06}{0.14 + 0.06} = 0.3 = P (A) \\ P (A^{'} | B) = \frac{0.14}{0.14 + 0.06} = 0.7 = P (A^{'}) \end{array}$

When we consider that A has not occurred, the probability of B occurring and the probability of B not occurring remains the same.

$\begin{array}{l} P (B | A^{'}) = \frac{0.14}{0.14 + 0.56} = 0.2 = P (B) \\ P (B^{'} | A^{'}) = \frac{0.56}{0.14 + 0.56} = 0.8 = P (B^{'}) \end{array}$

When we consider that B has not occurred, the probability of A occurring and the probability of A not occurring remains the same.

$\begin{array}{l} P (A | B^{'}) = \frac{0.24}{0.24 + 0.56} = 0.3 = P (A) \\ P (A^{'} | B^{'}) = \frac{0.56}{0.24 + 0.56} = 0.7 = P (A^{'}) \end{array}$

From the above cases, we observe that the probability of an event occurring or not occurring is not affected by that of the other event.

Bayes' Theorem

$P (B | A) = \frac{P (B) P (A | B)}{P (B) P (A | B) + P (B^{'}) P (A | B^{'})}$

$P (B_{i} | A) = \frac{P (B_{i}) P (A | B_{i})}{P (B_{1}) P (A | B_{1}) + P (B_{2}) P (A | B_{2}) + \dots P (B_{n}) P (A | B_{n})}$

Three Events

The above results can be extended to three events.

$\begin{array}{l} P (A \cup B \cup C) = P (A) + P (B) + P (C) - P (A \cap B) - P (B \cap C) - P (C \cap A) + P (A \cap B \cap C) \\ P (A \cup B \cup C) = P (A) + P (B) + P (C) for mutually exclusive events \\ P (A \cap B \cap C) = P (A) \times P (B) \times P (C) for independent events \end{array}$

Tree Diagram

A tree diagram can be used to determine the probability of obtaining specific results. In the tree diagram below, the probability of picking one red ball and one blue ball without replacement is $P (R, B) + P (B, R) = \frac{4}{9} \times \frac{5}{8} + \frac{5}{9} \times \frac{4}{8} = \frac{5}{9}$ .

Random Variables

A variable describes a quantity being measured. When this variable comes from a random experiment, it is known as a random variable. Random variables are often denoted by capital letters such as X, while the possible values for the random variables are denoted by small letters such as x.

Discrete Random Variables

Discrete random variables take on a countable number of possible values and are often restricted to integer values.

For a discrete variable which can assume a countable number of values $x_{1}, x_{2}, \dots, x_{n}$ , the probabilities associated must be such that $0 < P (X = x_{i}) \leq 1$ . 0 is excluded because $x_{i}$ will not be included in the list of values when it cannot occur.

$\sum_{All x} P (X = x) = 1$

Probability Density Function

The probability distribution of a discrete random variable shows the possible values of X and their associated probabilities. The probability distribution can be presented either in a table form or in a graphical form.

The probability density function (PDF) gives the relationship between the possible values of X and their associated probabilities. It is often expressed as a formula depending on the type of distribution.

Cumulative Distribution Function

The cumulative distribution function (CDF) gives the sum of the associated probabilities for the possible values of X up to a certain value not more than x.

Suppose that X can assume integer values between 0 and 6 inclusive.

$P (X \leq 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)$

Expectation

The expectation of a discrete random variable X is defined by

$E (X) = \sum_{All x} x P (X = x) = μ$ .

Numerically, the expectation is equal to the population mean, μ.

If E(X) is the expectation of X,

E(k) = k, where k is a constant,

E(kX) = kE(X), where k is a constant,

$E (a X \pm b Y) = a E (X) \pm b E (Y)$ , where a and b are constants.

Variance

The population variance is defined by

$Var (X) = E (X^{2}) - {[E (X)]}^{2}$

For the case of a discrete random variable X,

$Var (X) = \sum_{All x} {(x - μ)}^{2} P (X = x) = \sum_{All x} x^{2} P (X = x) - μ^{2} = σ^{2}$ .

The population standard deviation of the discrete random variable X is $σ = \sqrt{Var (x)}$ .

If Var(X) is the variance of X,

Var(k) = 0, where k is a constant,

Var(kX) = $k^{2} Var (X)$ , where k is a constant,

$Var (a X \pm b Y) = a^{2} Var (X) + b^{2} Var (Y)$ , where a and b are constants.

We must add two variances because variance is always positive.

Binomial Distribution

In an experiment with n repeated independent trials and two mutually exclusive outcomes, where either success or failure can occur, we will obtain a binomial distribution. It is necessary for the probability of success p to remain the same throughout the experiment.

When X follows a binomial distribution, we say that $X ~ B (n, p)$ , where there is a total of n trials and the probability of success is p.

$P (X = x) = (\begin{matrix} n \\ x \end{matrix}) p^{x} {(1 - p)}^{n - x}$ , where $x = 0, 1, \dots, n$

$\begin{matrix} E (X) = n p \\ Var (X) = n p (1 - p) \end{matrix}$

Poisson Distribution

The Poisson distribution is often used to model the probability of a number of events occurring in a fixed period of time, with a known mean value.

When X follows a Poisson distribution, we say that $X ~ Po (m)$ , where m is the parameter such that m > 0. Observe that there is no upper limit for x.

$P (X = x) = \frac{m^{x} e^{- m}}{x!}$ , where $x = 0, 1, 2, \dots$

$\begin{matrix} E (X) = m \\ Var (X) = m \end{matrix}$

Poisson Approximation to the Binomial Distribution

A binomial distribution can be approximated using the Poisson distribution with mean m = np when

n is large (n > 50), and

p is sufficiently small such that np < 5.

Continuous Random Variables

Continuous random variables can take on any value on the number line. A continuous probability density function (PDF) is a function where

$\int_{a}^{b} f (x) d x = 1$ , on a given interval.

The expectation for a continuous random variable is

$E (X) = \int_{- \infty}^{\infty} x f (x) d x = μ$ ,

while the variance for a continuous random variable is

$Var (X) = \int_{- \infty}^{\infty} {(x - μ)}^{2} f (x) d x = \int_{- \infty}^{\infty} x^{2} f (x) d x - μ^{2}$ .

Normal Distribution

When a continuous random variable X follows a normal distribution, we say that $X ~ N (μ, σ^{2})$ .

The probability density function $f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{1}{2} {(\frac{x - μ}{σ})}^{2}}$ , where $x \in ℝ$ . The area under the probability density function gives us the probabilities. For all values of x, $f (x) \geq 0$ and $\int_{- \infty}^{\infty} f (x) d x = 1$ .

To find $P (a < x < b)$ , we may use the GDC as the above function is difficult to integrate.

The normal distribution is bell-shaped and symmetrical about the line x = μ. The maximum value of f(x) occurs at x = μ, where $f (x) = \frac{1}{σ \sqrt{2 π}}$ at that point. There are two points of inflexion at $x = μ \pm σ$ .

Standard Normal Distribution

When μ and $σ^{2}$ are unknown, we need to standardise the variable $X ~ N (μ, σ^{2})$ such that $E (X) = μ = 0$ and $Var (X) = σ^{2} = 1$ . This distribution is known as the standard normal distribution, where $Z ~ N (0, 1)$ .

$Z = \frac{X - μ}{σ}$

$P (X \leq x) = P (Z \leq \frac{x - μ}{σ})$

We can also make use of the symmetrical properties of the curve when finding positive and negative values of z.

$\begin{array}{l} P (Z < a) = P (Z > - a) = 1 - P (Z \geq a) \\ P (Z > a) = P (Z < - a) = 1 - P (Z \leq a) \end{array}$

To find the value of x given the value of p such that P(X < x) = p, we may use the inverse normal function.

Equations

Statistics and Probability

Definition

Presentation of Statistical Data

Measure of Central Tendency

Probability

Complementary Events

Graphs

Worked Solutions

Take a look at our worked solutions for past-year examination papers.

Year 2010

Year 2009

Statistics and Probability

Definition

Population and Sample

Discrete Data

Continuous Data

Presentation of Statistical Data

Frequency Table

Bar Chart

Histogram

Grouped Data

Measure of Central Tendency

Mean

Median

Lower Quartile

Upper Quartile

Interquartile Range

Percentile

Mode

Variance

Standard Deviation

Probability

Complementary Events

Venn Diagram

Conditional Probability

Mutually Exclusive Events

Exhaustive Events

Independent Events

Bayes' Theorem

Three Events

Tree Diagram

Random Variables

Discrete Random Variables

Probability Density Function

Cumulative Distribution Function

Expectation

Variance

Binomial Distribution

Poisson Distribution

Poisson Approximation to the Binomial Distribution

Continuous Random Variables

Normal Distribution

Standard Normal Distribution

Contents

Popular Topics

Worked Solutions