## Proportions and count data

Some data are collected as counts within a category. An example would be the number of animals that were vaccinated or unvaccinated and their disease status (diseased and not diseased). In another example we may be collecting numbers of weeds for a given area under two tillage regimes. Counts may be taken over a time period or for a spatial area.

## Aims and Objectives

In this module you will

- Understand the difference between counts (frequencies) and proportions
- Calculate proportions and percentages
- Conditional proportions
- Display data in contingency tables
- Explore data using tests of proportions
- Use Chi square test of Association (also known as test for Independence)
- Use Chi square test for Goodness of Fit
- Be aware of the limitations of some methods

## Contents

- What is a count and a proportion
- Two way tables (contingency tables) of two categorical variables
- Proportions and conditional proportions
- Plotting of data showing proportions and counts
- Tabulating data
- Tests of proportions
- Chi square test of association
- Chi square Goodness of Fit

## Counts and proportions

A count is a discrete variable. Responses in some experiments may be the number of organisms in comparing two or more groups. Animals can be diseased or healthy, a weed may be dead or alive. In a survey the number of respondents in various categories are counts. Some development policies may be piloting an intervention in particular regions or within particular schools. Farmers may be adopters or nonadopters of a new practice in agriculture, or of growing improved crop varieties. Note in this table we a column describing the category, the next is the data consisting of a count. We also have a row showing the Total. In the proportion column we then can express the count of that category as part of the total. This is expressed as a value between 0 and 1 and is a proportion. In the final column we have expressed the proportion as a percent. Take a minute to make sure you could construct your own data table with this type of data. Some data consists of just two categories. We will also use data comprising three or more categories.

Group | Count or frequency | Proportion | Percent |
---|---|---|---|

adopters | 600 | \(\frac{600}{1000}=0.6 \) | \(0.6\times 100 = 60\% \) |

non-adopters | 400 | 400/1000 = 0.4 | 0.4*100= 40% |

Total | 1000 | 1.0 | 100% |

Range of values | 0 and over | 0 to 1 | 0 to 100% |

## Two way tables (contingency tables) of two categorical variables

A two way table allows the summary of two categories to be summarised. The elements of the table consist of rows and columns. The **marginal totals** are the row totals and column totals. The overall total is the total number of items in the study. Summary data can be produced as percentages of columns or percentages by rows. An individual item should only occur within the table once. A 2 by 2 table has two rows and two columns (r by c). The 2 by 2 cells contain the data. Marginal row and column totals, and the overall total can be calculated. Proportions and percentages can also be calculated for the table.

## Proportions and conditional proportions

## Plotting of data showing proportions and counts

## Tabulating data

Imagine a Training Course, where we were interested in the Question: Which workshop did you participate in?

For this example we are entering code in the Source as part of the syntax. You should be able to bring your own larger file and work with it.

Download this pdf and try out the example for yourself.

Analysing a small data set on Workplace training

## Tests of proportions

## Chi square test of association

## Chi square Goodness of Fit

## Leave a Reply