Module 0

Module 0 Getting Started with R

In this module we will cover how to install the software (called R and RStudio) used throughout this course. First is the installation guides, and then a brief orientation of the software and its features. Think of this as a familiarisation before more complicated methods are presented in future modules. We use the term ‘registration’, and all we really require is for you to fill in the questions that are placed at the beginning and end of Modules. Your interest and feedback can contribute to making this a more useful site. If you require some specific types of help please contact us.

Contents

  1. Install R and RStudio on your own computer
    1. Windows
    2. Mac OS X
    3. Linux
  2. The types of content used in this online course
  3. The RStudio interface
  4. R as an Object Orientated Program
  5. Revise and/or learn basic functions in Excel
  6. An Introduction to Excel for selecting items randomly
  7. Projects in RStudio
  8. Importing data from a .csv file into RStudio
  9. Revise and/or learn basic use of RStudio
  10. Types of data
  11. Basic Statistics (mean, variance, standard deviation)
  12. Thinking about plotting in R
  13. Some uses for Excel

Install R and RStudio on your own computer

After an hour you should have installed R and RStudio on your computer, and tried out a few commands. In this course we will expect users to also be familiar with Excel. However in this introduction we will include some Screencasts of useful ways of using Excel for those who are less familiar or would like reminding of its flexibility and use in data capture.
Back to top

Windows


Back to top

Mac OS X

Similarly to CRAN, we do not have Mac OS X systems at BeST so our knowledge of how hard installing R and RStudio on a Mac is limited. The binary for R version 3.2.2 can be found here. Important Note: OS X no long contains XQuartz which is required to use X11. You can reinstall XQuartz (you have to do this everytime you upgrade too) here. Further information about R on Mac OS can be found on the CRAN website.

RStudio is a little easier to install. To do this you need to download the latest version of RStudio (which can be found here. Once downloaded, simply drag to your applications folder and it should install.
Back to top

Linux

Most package managers on Linux contain an entry to obtain R. These versions of R usually are frozen at the release of the version of Linux so updates are slower than on the actual CRAN mirrors. However one can add a CRAN repository to the linux system and then receive the updated packages directly.

Ubuntu/Linux Mint

At the moment here at BeST for Africa we run a Linux Mint system so we thought we would show how you can use the package manager to get the up to date R packages direct in your systems update. This does take a little bit of messing around, but can be done through graphical user interfaces as well. First we need to add a repository to the system and download a security key. We can do this with the following terminal commands.

sudo apt-key adv -keyserver keyserver.ubuntu.com -recv-keys E084DAB9
sudo add-apt-repository 'deb /bin/linux/ubuntu trusty/'
sudo apt-get update
sudo apt-get install r-base

Back to top

R Studio on Linux

The easiest way to obtain RStudio for Linux is to download the server edition for the relevant version of Linux at the following page
http://www.rstudio.com/products/rstudio/download
test
Back to top

The types of content used in this online course

This course will have several types of content, all focused around    “HOW TO…”
HOW TO START the steps to go through (above) and working through the module (objectives)
HOW TO DO by  watching  a Screen-cast, or read a demonstration document or check an FAQ link
HOW TO THINK run through a downloadable interactive PowerPoint
HOW TO TRY using a demonstration document (including code) run it yourself and work through
HOW TO SAVE is to assist in writing, R code and analysis (save as a Project in RStudio)
HOW TO RELATE is to consider additional questions you need to know for your experiment and send us feedback

Back to top

RStudio interface

Back to top

R as an object orientated program

An Introduction to R as an object orientated program

Back to top

An introduction to Excel using a real sample of survey data

Introduction to basic Excel

It is very useful to use Excel for some simple understanding of basic statistical terms, for collecting small data sets and for setting up data sheets for research. Watch this ScreenCast to show how we may look at an incoming data set using a few functions of Excel, with a survey data file:

Back to top

An Introduction to Excel for selecting items randomly

Here is a short screen-cast on how to randomly pick some varieties from a list.
How to randomise using excel:

Back to top

Using Projects in RStudio

This is an external link to the use of Projects in RStudio. If you open a Project at the start of a Module all of your work will be stored in the same directory, and it will be easier to work with programs, data and outputs as they are stored within the project A link to a description explaining Projects in RStudio

Watch this screencast to explain Setting up a Project in R:

Back to top

Importing data from a .csv file into RStudio

Back to top

Types of data

When before we design a study we should consider what type of variable will provide the information needed. There are many types of variable, but they can be summarised in the following way

Quantitative Variables

Quantitative variable, as the name suggests, are those that measure a quantity of the experimental unit.

  • Continuous Variables – These are variables where the value recorded can take any possible value in a range, so there is no gap between possible values (for the mathematically minded, real numbers). Continuous variables are usually given as decimals to a specific number of decimal places or significant figures. For example, Percentage, Milk Yield of a Cow, Blood Pressure, Humidity are generally considered to be continuous variable as they can be measured to high accuracy.
  • Discrete Variables – These are variable where the value recorded does not have continuous graduations, so there is a definite gap between two values. Usually Discrete Variables are measured in whole numbers (integers for the mathematically minded) although they can be fractions too. For example, the number of plants in a plot is a discrete variable as it makes no sense to have half a plant.

Categorical Variables

Categorical Variables or Qualitative Variables are variables that record a category or a quality of the experimental unit.

  • Nominal Variables – As the name suggests these are categorical variables that have no inherent order. These variables can be names or ranges. Examples of Nominal Variables include: Type of crop, Gender, Country of Farm.
  • Ordinal Variables – Again the name gives it away, as these are categorical variables which have an order implied or defined. This means the categories can be ranked. Examples of Ordinal Variables include: Ratings, Plant Density (if measured as High, Medium or Low), Scales

Explanatory, Confounding and Response Variables

The variable(s) that are adjusted or used to explain a system are called the Explanatory Variable(s). This is the variable which is controlled or thought to influence the variable of interest in the study. Explanatory variables are sometimes referred to as Independent Variables.

The variable(s) that is of interest in the experiment/study is called the Response Variable(s). Response variables are sometimes referred to as Dependent Variables as their value is thought to depend on the Explanatory Variables.

A Confounding Variable is a variable which affects both Explanatory variables and Response variables in a model. This can lead to bias due to the omission of these variables.

Review types of data

Now review these concepts in this presentation on types of data Types of Data
Back to top

Basic Statistics

When we initially try to summarise data to understand its distribution we need to calculate a number of summary statistics. We use two approaches to describing data with summary statistics

  1. Quartiles, Percentiles Ranges and the Median
  2. Averages – Mean, Variance, Standard deviation
  3. Proportions and Probability

In practice Percentiles and Averages are used on Quantitative data while Proportions are used on Categorical data.

In this module the calculations for these statistics will not be shown in detail, but can be found for Summary Statistics here and for Probability here.

Quartiles, Percentiles, Ranges and the Median

Initially to calculate any of these quantities we need to order the data form smallest to largest. The Range of the data is simply the difference between the largest and smallest number.

Back to top

Bringing data into R and some simple checks

Revise bringing in some data as a csv file. Then look at the data

Producing some summary statistics

Here we will check the data was brought into R, check the database structure.
Questions to ask:
Have we got missing data?
The next module will look at a two sample t test with two groups
Are the treatments identified as Factors (categories) in the data set?
How do we make a variable into a Factor in R
Can we list the variables in the dataset?
Can we do a 5 number summary?

Back to top

Thinking about plotting in R

There are several basic plots in R. Here are some examples of the types of plots you can do in R.
Download this file to look at many plots available in R. There will be a Module on plotting which will provide a lot more examples and teach you some useful code.

Look through this PowerPoint to examine some of the many ways of using R to plot  graphs.

Plotting Demonstration.  

In a later Module we will show how to do this using specific examples. There are several packages in R that allow us to produce very nice quality plots.

See also our section on plotting in R
Back to top

Some ways of using Excel

Download this file to look at aspects of using Excel. In science it is useful to make a database capturing the design elements (the plots with the treatments), and data we collect such as yield, height, grain colour. We can capture metadata, and field plans in separate worksheets.

Using Excel

We can use Excel for understanding simple statistics, like deviations from a mean, which can assist with understanding what a variance and a standard deviation are.

Well done you are ready to start with Module 1

Back to top

 

Leave a Reply