Summary Statistics


Summary Statistics

The summary statistics are used to initially describe the data. They fall into three main categories:

  1. Location
  2. Spread
  3. Shape

Location

The location statistics consist of one value giving an indication of the central location of the distribution. They are the mode, median (also the second quartile) , and the  mean  or average.

Mode

The mode of a set of values is the value which occurs most frequently. For example if we have the data set:

2, 7, 7, 1, 7, 2, 4, 5, 6, 7, 2, 4, 4, 7

the mode is 7 as this occurs with highest frequency.

In general the mode is not widely used as it shows very little about the data. As can be seen in the above data set the mode is the largest number, although is not always the case. When looking at a grouped frequency distribution, for example in a histogram, the modal class is the class with the greatest frequency.

Quartiles

The quartiles are the values at the quarters points of the ordered data set. That is, 0% (the minimum, Q0), 25% (the first quartile, Q1), 50% (the Median, Q2), 75% (the third quartile, Q3) and 100% (the maximum, Q4). If more precision is required then percentiles can be used, with a similar idea of taking the value after n% of the ordered data has been recorded. If the data set is an even number of data points then the Mean of the two values around that point is taken. If the data set contains an odd number of points, the value is simply the one found at that point.

The quartiles are normally summarised using a Boxplot which allows the shape of the data to be visualised. They are also part of the “five number summary” of the data.

Median

The median is the number in the centre of odd sized data sets or the mean of the centre two values in even sized data sets. The median is normally represent on boxplots as the line within the box created by the 50th  percentile.

Mean

Sample mean
$$\bar{x}=\frac{\sum\limits_{i=1}^{n}{x_i}}{n}$$

Weighted mean
$$\bar{x}_w=\frac{\sum\limits_{i=1}^{n}{w_i x_i}}{\sum\limits_{i=1}^{n}{w_i}}$$
Alternatively for frequency distributions
$$\bar{x}=\sum\limits_{i=1}^{n}{f_i x_i}$$

Back to top

Spread

Population variance

$$\sigma^2=\frac{\sum\limits_{i=1}^{n}{\left(x_i-\bar{x}\right)^2}}{n}$$

Sample variance

$$s^2=\frac{\sum\limits_{i=1}^{n}{\left(x_i-\bar{x}\right)^2}}{n-1}$$

Back to top

Population standard deviation

This the square root of the population variance, denoted by sigma

Sample standard deviation

This the square root of the sample variance, denoted by s

Back to top

Shape

Kurtosis. Skewness

$$s=\sqrt{s}$$

 

3 Responses

  1. Kelly

    May 26, 2017 11:05 pm

    I needed to thank you for this fantastic read!!
    I undoubtedly enjoying every small bit of it I have you bookmarked to check out new
    stuff you post.

    Reply

Leave a Reply