Introduction

R

What is R? R is a interactive language and environment for statistical computing and graphics.

In an oversimplified sense, think of a programmable calculator, now think of a programming language as a advanced programmable calculator. Difference being that you need to know how to “talk” the R/Python-language in order to tell it to do what you want - The “talk” occurs through writing.

Each language has its own specific syntax. This is simply a set of rules that makes the writer (you) and the reader (computer) make sense of the written sentences. Even in a calculator, you cannot write 3+3+. This will throw an error. That is because the statement/sentence is syntactically incorrect.

What is Rstudio?

RStudio is an integrated development environment (IDE). Think of it as a software application that provides the capability to easily run a specified programming language. Note that though the original objective for Rstudio was to easily run R, Rstudio has expanded to incorporate other languages such as Python, C,C++, SQL, Perl, Javascript etc, enabling one to build various tools within one platform. It also support tools such as pandoc, CSS, markdowns etc. These notes(website) for example was created using Quarto on Rstudio.

Rstudio has 4 panes:

  • Editor pane - This is where you write your code.

  • Console pane - This is where the results of your code are displayed. In case of Interactive language, this is also where you run your code.

  • Environment pane -Gives you an overview of the variables currently stored in memory

  • Plot pane - Shows the graphs plotted.

For easier access to your code, ensure to write the code in the editor pane.

In this course we will learn R and its syntax.

Simple Math Expressions

You can do any normal calculations, the same way you do in a calculator.

3 + 3
[1] 6
4 - 5
[1] -1
4 * 9 + 6/2
[1] 39
3 + 3
6
4 - 5
-1
4 * 9 + 6/2
39.0

Note that the basic order of operations – ie Parenthesis, Exponents, Division and Multiplication and Lastly Addition and Summation is followed. Note that for Division, Multiplication, Addition and Subtraction is done from left to right.

Math Operators

These are functions used to do basic math math operations. They are subdivided into two categories:

  1. Arithmetic Operators: used to carry out mathematical operations
Operator Expression Description
+ x + y Addition
x – y Subtraction
* x * y Multiplication
/ x / y Division
^ or ** x^ y or x ** y Exponent
%% x %% y Modulus (Remainder from division)
%/% x %/% y Integer Division
5 + 3 #addition
[1] 8
5 - 3 #Subtraction
[1] 2
-3 #Negation
[1] -3
5 * 3 #Multiplication
[1] 15
5 / 3 # Division
[1] 1.666667
5^3 #Exponentiation; 5 raised to 3
[1] 125
5**3 # 5 raised to 3
[1] 125
5 %% 3 #the remainder of 5 divide by 3 is 2
[1] 2
5 %/% 3 #Integer Division 3 goes into 5 1 time
[1] 1
1 + 2 * (5 + 4) # Paranthesis first then multiply by 2 then add 1
[1] 19
Operator Expression Description
+ x + y Addition
x – y Subtraction
* x * y Multiplication
/ x / y Division
^ x ^ y Exponent
% x % y Modulus (Remainder from division)
÷ x ÷ y Integer Division
\ x \ y inverse division; same as y / x
5 + 3 #addition
8
5 - 3 #Subtraction
2
-3 #Negation
-3
5 * 3 #Multiplication
15
5 / 3 # Division
1.6666666666666667
5 ^ 3 # 5 raised to 3
125
5 % 3 #the remainder of 5 divide by 3 is 2
2
5 ÷ 3 #Integer Division. 3 goes into 5 1 time
1
1 + 2(5 + 4) # Paranthesis first then multiply by 2 then add 1
19
3 \ 5 # Inverse division. ie 5/3
1.6666666666666667
8÷2(3-1) 
2
  1. Relational Operators: Used to compare between two values.
Operator Expression Description
< x < y Less than
> x > y Greater than
<= x <= y Less than or equal to
>= x >= y Greater than or equal to
== x == y Equal to
!= x != y Not equal to
3 < 10
[1] TRUE
3 < 2
[1] FALSE
3 > 2
[1] TRUE
3 <= 3
[1] TRUE
3 == 3
[1] TRUE
3 != 10
[1] TRUE
Operator Expression Description
< x < y
> x > y
<=,≤ x <= y Less than or equal to
>=,≥ x >= y Greater than or equal to
== x == y Equal to
!=,≠ x != y Not equal to
3 < 10
true
3 < 2
false
3 > 2
true
3 <= 3
true
3 == 3
true
3 != 10
true
  1. Logical Operators
Operator Description Description
& x & y AND
| x | y OR
! !x NOT ie negation
&& x && y Short-circuited AND
|| x || y Short-circuited OR
(3 < 10) & (4 > 5) # similar to 3 < 10 & 4 > 5
[1] FALSE
(3 < 10) | (4 > 5)
[1] TRUE
!(3 > 2)
[1] FALSE
Operator Description Description
&& x && y Short-circuited AND
|| x || y Short-circuited OR
! !x NOT ie negation
& x & y bitwise AND
| x | y bitwise OR
~ ~x bitwise NOT
x ⊻ y bitwise XOR (exclusive or)
x ⊼ y bitwise nand (not and)
x ⊽ y bitwise nor (not or)
3 < 10 & 4 > 5 
false
3 < 10 | 4 > 5
true
!(3 > 2)
false

Variables

Variables are used to store data, whose value can be changed according to our need. Unique name given to variable is identifier as it enables identify the data stored in memory.

Usually they are lvalues and rvalues, ie they can be on left side of the assignment operator and also be on the right side of the assignment operator.

Naming Variables

One usually decides on the name to use for his/her variables. The rules followed in coming up with a variable name are:

  1. Identifiers can be a combination of letters, digits, period (.) and underscore (_) ONLY.

  2. It must start with a letter or a period. If it starts with a period, it cannot be followed by a digit.

  3. Reserved words and Constants in R cannot be used as identifiers.

Variable and function names should be lowercase. Use an underscore (_) to separate words within a name. Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful.

    # Good
    day_one
    day_1

    # Bad
    first_day_of_the_month
    DayOne
    dayone
    djm1

What is good? Bad? This is quite subjective. Some ground rules have been laid to try and have consistency in variable naming. There are many cases that have been proposed,

  • camelCase – The name starts with a lower case and then the next words are capitalized. eg dayOne

  • PascalCase – upper camelCase. ie starts with a capital letter. eg DayOne

  • snake_case – Underscore is used to separate the words. All words are in lowercase eg day_one

Of course you could use any of the above cases, as long as the variable is valid. Do not for example use kebab-case as it is not valid in R.

The Assignment Operator

In order to make use of the variables, we need to be able to assign values to the variable. This is done by the help of the assignment operator. Often a language will restrict the assignment operator to only one symbol, =. That is not the case with R. In R we have many assignment operators.

  1. The left assignment operator. <- or =

    x <- 3
    y = 2
    a <- b <- 4 # assigning 4 to both a and b
    d = e = 5 # assigning 5 to both d and e
  2. The right assignment operator ->

    10 -> x # assigning 10 to x
    10 -> a -> b # assigning 10 to both a and b

Example of using a variable

x <- 10 # create a variable x with the value 10
x # implicitly print the value of x. We could also use print(x)
[1] 10
x * 2 # Multiply x by 2 ie 10*2
[1] 20
x <- x + 2 # increment x by 2
x #x is now 12
[1] 12

Note: Refrain from using inbuilt function names as variables. eg c <- 3. c is a function in R and hence should not be used as a variable name.

Note: There is an assign function which can also be used to assign values to variables. The variable need to be written in literal form ie with quotes

assign("var_1", 3)
var_1
[1] 3

So far we have avoided the use of literal strings/characters. But they too can be used in assignment. Although this is a bad practice.

"var_2" <- 39 # DO NOT USE THIS THOUGH IT WORKS
var_2
[1] 39

Note that = is not a comparison operator. ie for comparison use ==. Assignment occurs from right to left. ie a statement like x = 1 means we assign the value 1 to a variable named x.

Reserved Words.

While variables names could be anything, there are words reserved in R such that they cannot be changed nor can they be used as variables

if else repeat while function
for in next break TRUE
FALSE NULL Inf NaN NA
NA_integer_ NA_real_ NA_complex_ NA_character_ …1, …2
TRUE <- 1
Error in TRUE <- 1: invalid (do_set) left-hand side to assignment
if <- 2
Error: <text>:1:4: unexpected assignment
1: if <-
       ^

Constants

These are rvalues. They cannot be on the left hand side of the assignment operator. Though common in lower level languages, R does not have much constants in it. Examples include numbers eg 5, literal strings/characters eg ’hello' , complex numbers -a number patched with the letter i eg 5i , 3+9i , integers eg 5L, hexadecimals-numbers preceded by 0X or 0x eg 0xff ,logical values eg TRUE

5
[1] 5
3+9i
[1] 3+9i
0xff 
[1] 255
TRUE
[1] TRUE

The value \(\pi\) which is a constant in nature is just a normal variable in R. It can be changed. Hence be careful when dealing with these types of values

pi
[1] 3.141593
pi <-4
pi # pi changed
[1] 4
rm(pi)#To remove the current stored variable pi and revert back to the original pi
pi
[1] 3.141593

NOTE: The variables F and T store the logical values false and true simultaneously. Though logical, they can be changed. Hence refrain from using them, or simply refrain from having variables named as F or T

T
[1] TRUE
T <- FALSE # Change the T
T # changed T
[1] FALSE
rm(T) #remove the variable T and revert back to the original T
T
[1] TRUE

Comments

Comments are often important part of a program as they describe what each part of the program does. It is often necessary to include them so as your code can be understood by someone else or even by yourself later on when reviewing it. In R as in Python, comments are preceded by a sharp/hash/tag/pound symbol ie # Thus any line of code from the hash onwards is considered commented out as it will not be parsed by the interpretor

1+1 # This is a comment
[1] 2
#This whole line is a comment on finding sin of degrees
sin(30*pi/180)
[1] 0.5

Data Types

A data type is a collection of data values that are specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. Data types are important because they tell a computer system how to interpret the value of data.

The data types in R could be divided into two categories:

  1. basic data types

    1. Numeric - Integer, double
    2. Character
    3. logical
    4. Complex
    5. Raw
  2. Containers

    1. vectors
    2. Matrix/array
    3. factor
    4. list
    5. dataframe

We can use the function typeof to determine the type of an object in R. Lets look at each type:

Basic Data Types

NULL

This type has a single value. There is a single object with this value. This object is accessed through the built-in name NULL. It is used to signify the absence of a value in many situations, e.g., it is returned from functions that don’t explicitly or implicitly return anything.

NULL
NULL
typeof(NULL)
[1] "NULL"

logical

These are logical values TRUE and FALSE. Internally stored as an integer with TRUE given the value 1 and FALSE given the value 0. The letters T and F are a shorthand of indicating true and false respectively. Note that T and F are not reserved words and thus their values can change. Use the cautiously.

typeof(TRUE)
typeof(FALSE)

R contains one more logical value. This is the NA value. This value represents missing values. or values that are not applicable. Note that there is a distinction between NULL and NA.

integer and double

Integers are zero, positive or negative whole numbers without a fractional part and having unlimited precision. Example include 0, 100, -10 . They also include binary, octal and hexadecimal numbers.On the other hand doubles are numbers with a fractional/decimal part. In R, every numeric number is stored internally as a double. To explicitly create an integer, you must append the letter L after the number. eg 123L, -12L

typeof(1) 
typeof(12.34)
typeof(-10L)
typeof(0xfL)#Positive hexadecimal integer ie base 16 (0-9,a-f)

Complex

These are numbers that contain a real and imaginary part eg 3 + 2i . To create a complex number, you must append the letter i afte the number. Thus a number that is immediately followed by a lower case letter i will be interpreted as a complex number.

3.21 + 2i
[1] 3.21+2i
typeof(3+2i)
[1] "complex"

character

This is a representation of anything else that is not a number. Must be in quotes. Either single or double. eg names, string, sentences etc

"hello"
[1] "hello"
typeof('a')
[1] "character"

While other languages such as C, C++, Python allows one to access the individual characters of a string, R does not. The character/string is in its simplest form. To access the individual characters, one has to first split the string into the individual characters.

vector

A vector is a collection of the basic data types of the same kind. Eg suppose I have 3 integers. I can put them in a vector. Vectors are created by the c function ie the concatenat function.

c(1L, 2L, 3L) #a vector of 3 integers.
[1] 1 2 3
c(2.4, 3.4) # a vector of 2 doubles.
[1] 2.4 3.4
c('hello world', 'hello class') # a vector of 2 strings
[1] "hello world" "hello class"

Note that a vector can only have the same type of data. Mixing the types in a data will result in implicit type conversion.

To access the elements of a vector, we use the extraction operator []

vec_one <- c(1,2,3,4,6)
vec_one[1] # gets the first element
[1] 1
vec_one[3] # gets the 3rd element
[1] 3

We can also change the elements of a vector

vec_one[1] <- 10 # changes the first element to 10
vec_one
[1] 10  2  3  4  6

Note that everything in R is a vector. Even the single element 1 is a vector of length 1. Use the length function to determine the number of elements in a given vector.

x <- c(1,4,5)
length(x)
[1] 3
y <- c('hello world', 'hello class')
length(y)
[1] 2

More on vectors will be discussed latter on. The other data types will also be discussed later on

Coercion/Type Conversion.

One can convert the basic data types from one type to the other using the as.type function. eg as.logical to convert ot logical, as.integer function is used to convert explicitly to integers, as.double to double, as.character to character and as.complex to convert to complex type.

Note that only legal values can be coerced to the specified type. ie a string literal "123" can be converted to an integer 123 while the word 'hello' cannot be converted to an integer. Thus '123' is legal while 'hello' is illegal. In case you try to convert an illegal value, the NA will be the result.

from number to logical

as.logical(c(10, -1,3,0, 3+2i, 0+0i)) # from numeric to logical
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

All numerals including complex numbers apart from 0 are converted to TRUE. Only the number 0 whether double 0, integer 0L or complex 0 + 0i is converted to FALSE.

from string to logical

as.logical(c('TRUE', 'true', 'True', 'T', 'FALSE', 'false', 'False', 'F'))
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

Only the listed characters above can directly be converted to their logical equivalence. Any other string will produce NA. Try it out

From double to integer

as.integer(c(1,3.3)) # drops the fractional part
[1] 1 3

from character to double

as.numeric(c('123', '0.4', '-45'))
[1] 123.0   0.4 -45.0
as.double(c('123', '0.4', '-45'))
[1] 123.0   0.4 -45.0
as.integer(c('123', '0.4', '-45'))
[1] 123   0 -45

to character

as.character(c(1,3,4.5))
[1] "1"   "3"   "4.5"
as.character(c(TRUE, FALSE, T, F))
[1] "TRUE"  "FALSE" "TRUE"  "FALSE"
as.character(34 + 6i)
[1] "34+6i"

Note that most of the conversions occur implicitly. Suppose we have a double and an integer and place them in a vector. Since a vector can only contain one basic data type, the integer will be converted to a double. Why is this the case? There is a hierarchy followed. We first have logicals, integers, doubles, complex and lastly characters. This is in accordance with the amount of memory used to store each data type. Note that doubles require more memory to store than integers. Converting a double to an integer will lead to data loss while conversion to double leads to higher precision. Thats why we convert to type that requires more memory for storage.

eg:

c(TRUE, 1L) # Converts the logical value TRUE to integer
[1] 1 1
c(1, TRUE, '2') # converts everything to character.
[1] "1"    "TRUE" "2"   
c(c(1, TRUE), '2') # Explain why the results differ from the one above.
[1] "1" "1" "2"

Note that in the example c(c(1, TRUE), 2) we get the results '1', '1', '2'. This is because we first converted the logical value TRUE to double when creating the inner vector, then we converted that to character.

How can we get the value TRUE from a character '1'?

as.logical(as.numeric('1'))
[1] TRUE

Note that we could use more than one type conversion to get to the desired results.

Writing Basic Functions

A function in R is an object containing multiple interrelated statements that are run together in a predefined order every time the function is called.

A simple function is defined by the keyword function and then stored in a variable name:

eg

square_10 <- function() {
  return (10^2)
}

The simple statement above when called will return 100

square_10()
[1] 100

It does not make sense to write a function that will always return a constant. We just rather define the constant itself. But to make use of the function property, we need to define the function with some passed parameters. This will enable the function to evaluate the parameters in a predefined manner. The parameter, is just a variable, ie placeholder that is passed into the function, when the function is called

Example: A function to square any number, not necessarily 10

square <- function(x){ # x is your parameter.
  return(x^2)
}
square(10)
[1] 100
square(5)
[1] 25
rect_area <- function(len, width){ # takes 2 parameters
  area <- len * width
  return(area)
}
rect_area(10, 5)
[1] 50

Take note that when calling any function in R, whether user defined or inbuilt functions, we use the parenthesis. ie mean(a)etc.

There are a lot of details that a function entails, although those will be discussed in a future date.

Named Math functions

But R is an advanced calculator. How can I compute trigonometric values? The only downside with R is that you need to know the name for the functions you want. Regarding math functions, this is simple as they are stored exactly the same way they are called in math. look at the list of math functions below:

abs atanh   cummin  floor   log2    tan
acos    ceiling cumprod gamma   sign    tanh
acosh   cos cumsum  lgamma  sin tanpi
asin    cosh    digamma log sinh    trigamma
asinh   cospi   exp log10   sinpi   trunc
atan    cummax  expm1   log1p   sqrt

From the list above, it is easy to tell what sqrt function does. ie It is the \(\sqrt{~~}\) function. We can tell what exp, sin, cos, tan are. But what about atan, asin, tanh etc?

Meaning if you need to compute \(sin^{-1}(x)\) you would have to know that the sine inverse function is represented as asin in R whereby the a stands for arc ie arc sine function. In other languages, the same function will have a different name. eg in python we use arcsin instead.

From now on, you are expected to know the function names that you would use before using it. If you are not sure what the function is, you can Google.

Exercise 1

  1. Given a number x ,write a program that would obtain the digit immediately after the decimal point.

    Input: 12.34
    Output: 3
    
    Input: 0.6123
    Output: 6
    
    Input: 213
    Output: 0 

    Hint: Use math operations. eg modulus operator, integer division, multiplication etc

  2. Given a number x and position, write a program that would obtain digit at the specified position. In this exercise we will assume that positions to the left of the decimal points are positive while those to the right of the decimal point are negative.

    Example:

    Input: 612.34, pos = 2
    Output: 6
    
    Input: 612.34, pos = 1
    Output: 1
    
    Input: 612.34, pos = 0
    Output: 2
    
    Input: 612.34, pos = -1
    Output: 3
    
    Input: 612.34, pos = -2
    Output: 4
    
    Input: 612.34, pos = 4
    Output: 0

    Hint: Use math operations. eg modulus operator, integer division, multiplication etc.

    Extra: What would change if the positions to the left were given as negative while those to the right as positive?

  3. A simple function to determine maximum of two numbers:

    my_max <- function(x, y){
      index <- (y > x) + 1
      return (c(x,y)[index])
    }

    Now we could run:

    my_max(3, 10)
    [1] 10
    my_max(19, 3)
    [1] 19
    my_max(12,-4)
    [1] 12

    Take a good look at the code. Why did we add 1 to the logical y > x? Now write a function my_min that will take in two arguments and output the minimum of the two. Note that there are built in functions max and mint

    Other ways of writing the max function could be:

    1. \(x^iy^{1-i}\) where \(i = x > y\)
    my_max1 <- function(x,y){
      i <-  x > y
      return (x**i * y**(1-i))
    }
    my_max1(3, 10)
    [1] 10
    my_max1(19, 3)
    [1] 19
    my_max1(12,-4)
    [1] 12
    1. \(xi + y(1-i)\) where \(i = x > y\)

      Implement this method.

    Could you explain as to how the two methods above are able to compute the maximum?

  4. Write a function my_sign that outputs the sign of the input. ie a negative number has a sign of -1 and a positive number has 1

    Input: -9
    Output: -1
    
    Input: 23
    Output: 1

    Hint1: Use math operations. Recall \(x^0 = 1\). Select a good \(x\) and then think of what your exponent should be.

    Hint2: Use math operations. Use a logical \(x\) and the expression \(2x - 1\)

  5. Write a program absolute that returns the absolute value of a number. Hint: Use the sign function you wrote in question 4.

    Input: -9
    Output: 9
    
    Input: 23
    Output: 23
  6. Write a program cbrt to compute the cuberoot: Hint \(\sqrt[3]{x} = sign(x)\sqrt[3]{|x|}\) Where \(|\cdot|\) is the absolute function and \(sign(\cdot)\) is the sign function.

    Input: -8
    Output: -2
    
    Input: 1
    Output: 1
    
    Input: 27
    Output: 3
  7. Given a vector, write a R program called swap_first_last to swap first and last element of the vector.

    Examples:

    Input : c(12, 35, 9, 56, 24)
    Output : 24, 35, 9, 56, 12
    
    Input : c(1, 2, 3)
    Output : 3, 2, 1

    Hint. You need a temporary variable.

  8. Given a vector and provided the positions of the elements, write a program swap to swap the two elements in the vector. Hint: The program has 3 parameters.

    Examples:

    Input : vec = c(23, 65, 19, 90), pos1 = 1, pos2 = 3
    Output : 19, 65, 23, 90
    
    Input : vec = c(1, 2, 3, 4, 5), pos1 = 2, pos2 = 5
    Output : 1, 5, 3, 4, 2
  9. Compute the following using R:

    1. \(\log_{10}100\)
    2. \(\log_{e}e^2\)
    3. \(\log_2 8\)
    4. \(\sin(30^\circ)\) hint
    5. \(\sin^{-1}(0.5)\) in degrees. hint: look at part d above.
    6. \(\sqrt{4}\) and \(\sqrt{-4}\)
    7. \(\sqrt[3]{8}\) and \(\sqrt[3]{-8}\) hint: \(\sqrt[3]{-8} = -2\)
    8. \(0^0\). Is this correct?
  10. Round off the following numbers using R:

    1. \(980, 950, 930\) to the nearest 100 Hint: round(123, -2) rounds to nearest 100

    2. \(98, 95, 93\) to the nearest 10

    3. \(9.8, 9.5, 9.3\) to the nearest 1

    4. \(0.98, 0.95, 0.93\) to the nearest 0.1

    5. \(0.098, 0.095, 0.093\) to the nearest 0.01

  11. Determine the final output of the following operations and check your answer against those produced by R

        (3 > 4) | TRUE
        3 > 4 | TRUE
        (3 > 4) & TRUE
        (3 > 4) | FALSE
        3 >= 3
        3 != 4
  12. A cylinder has a radius of r cm and height of hcm. Write a function to obtain the surface area when completely covered. \(SA = 2\pi rh + 2\pi r^2\) Compute the SA when radius = 10cm and height = 18cm

  13. let x = 3 what is the value of !x ? Elaborate as to why that is the case. What numerical value can x take such that !x results to TRUE ?

  14. There are other assignment operators in R, ie <<- and ->>. What is the difference between these and the ones discussed above? Run ?`=` in R and read the help page to the end. See whether you could answer the question.

  15. What are the differences between “=” and “<-” assignment operators?

Tough Task

Rounding numbers is a task that occurs in every aspect of math. We would like to implement a function my_round such that we could round to the nearest ten, one, five, tenths, half, fifths etc. How can we go about this? The formula is simple. For any given number \(x\), we can round it to the nearest number \(y\) by:

  1. dividing \(x\) by \(y\)

  2. Obtain the integral part of the quotient

  3. Determine whether the fractional part of the quotient is greater than \(0.5\). This should be a logical value.

  4. add the logical value obtained in step 3 to the integral part in step 2

  5. Multiply the sum by \(y\). This should be the needed result.

Implement the above procedure in a program called my_round

Test the function with the following inputs:

Input: x = 123.4656, y = 1
Output: 123


Input: x = 123.4656, y = 5
Output: 125


Input: x = 123.4656, y = 10
Output: 120

Input: x = 126.4656, y = 10
Output: 130

Input: x = 123.4656, y = 100
Output: 100

Input: x = 123.4656, y = 1000
Output: 0

Input: x = 123.4656, y = 0.1
Output: 123.5

Input: x = 123.4656, y = 0.01
Output: 123.47

Input: x = 123.4956, y = 0.01
Output: 123.5

Input: x = 123.4656, y = 0.001
Output: 123.466

Input: x = 123.4656, y = 3
Output: 123
Back to top