Summary Statistics
The summary statistics are used to initially describe the data. They fall into three main categories:
Location
The location statistics consist of one value giving an indication of the central location of the distribution. They are the mode, median (also the second quartile) , and the mean or average.
Mode
The mode of a set of values is the value which occurs most frequently. For example if we have the data set:
2, 7, 7, 1, 7, 2, 4, 5, 6, 7, 2, 4, 4, 7
the mode is 7 as this occurs with highest frequency.
In general the mode is not widely used as it shows very little about the data. As can be seen in the above data set the mode is the largest number, although is not always the case. When looking at a grouped frequency distribution, for example in a histogram, the modal class is the class with the greatest frequency.
Quartiles
The quartiles are the values at the quarters points of the ordered data set. That is, 0% (the minimum, Q0), 25% (the first quartile, Q1), 50% (the Median, Q2), 75% (the third quartile, Q3) and 100% (the maximum, Q4). If more precision is required then percentiles can be used, with a similar idea of taking the value after n% of the ordered data has been recorded. If the data set is an even number of data points then the Mean of the two values around that point is taken. If the data set contains an odd number of points, the value is simply the one found at that point.
The quartiles are normally summarised using a Boxplot which allows the shape of the data to be visualised. They are also part of the “five number summary” of the data.
Median
The median is the number in the centre of odd sized data sets or the mean of the centre two values in even sized data sets. The median is normally represent on boxplots as the line within the box created by the 50th percentile.
Mean
Sample mean
$$\bar{x}=\frac{\sum\limits_{i=1}^{n}{x_i}}{n}$$
Weighted mean
$$\bar{x}_w=\frac{\sum\limits_{i=1}^{n}{w_i x_i}}{\sum\limits_{i=1}^{n}{w_i}}$$
Alternatively for frequency distributions
$$\bar{x}=\sum\limits_{i=1}^{n}{f_i x_i}$$
Spread
Population variance
$$\sigma^2=\frac{\sum\limits_{i=1}^{n}{\left(x_i-\bar{x}\right)^2}}{n}$$
Sample variance
$$s^2=\frac{\sum\limits_{i=1}^{n}{\left(x_i-\bar{x}\right)^2}}{n-1}$$
Population standard deviation
This the square root of the population variance, denoted by sigma
Sample standard deviation
This the square root of the sample variance, denoted by s
May 23, 2017 3:22 am
Saved as a favorite, I actually like your blog!
May 26, 2017 11:05 pm
I needed to thank you for this fantastic read!!
I undoubtedly enjoying every small bit of it I have you bookmarked to check out new
stuff you post.
January 8, 2020 11:40 pm
Excellent explanation of the concepts! Thanks!