3 + 3
[1] 6
4 - 5
[1] -1
4 * 9 + 6/2
[1] 39
What is R? R is a interactive language and environment for statistical computing and graphics.
In an oversimplified sense, think of a programmable calculator, now think of a programming language as a advanced programmable calculator. Difference being that you need to know how to “talk” the R/Python-language in order to tell it to do what you want - The “talk” occurs through writing.
Each language has its own specific syntax. This is simply a set of rules that makes the writer (you) and the reader (computer) make sense of the written sentences. Even in a calculator, you cannot write 3+3+
. This will throw an error. That is because the statement/sentence is syntactically incorrect.
What is Rstudio?
RStudio is an integrated development environment (IDE). Think of it as a software application that provides the capability to easily run a specified programming language. Note that though the original objective for Rstudio was to easily run R, Rstudio has expanded to incorporate other languages such as Python, C,C++, SQL, Perl, Javascript etc, enabling one to build various tools within one platform. It also support tools such as pandoc, CSS, markdowns etc. These notes(website) for example was created using Quarto on Rstudio.
Rstudio has 4 panes:
Editor pane - This is where you write your code.
Console pane - This is where the results of your code are displayed. In case of Interactive language, this is also where you run your code.
Environment pane -Gives you an overview of the variables currently stored in memory
Plot pane - Shows the graphs plotted.
For easier access to your code, ensure to write the code in the editor pane.
In this course we will learn R and its syntax.
You can do any normal calculations, the same way you do in a calculator.
3 + 3
[1] 6
4 - 5
[1] -1
4 * 9 + 6/2
[1] 39
3 + 3
6
4 - 5
-1
4 * 9 + 6/2
39.0
Note that the basic order of operations – ie Parenthesis, Exponents, Division and Multiplication and Lastly Addition and Summation is followed. Note that for Division, Multiplication, Addition and Subtraction is done from left to right.
These are functions used to do basic math math operations. They are subdivided into two categories:
Operator | Expression | Description |
---|---|---|
+ | x + y | Addition |
– | x – y | Subtraction |
* | x * y | Multiplication |
/ | x / y | Division |
^ or ** | x^ y or x ** y | Exponent |
%% | x %% y | Modulus (Remainder from division) |
%/% | x %/% y | Integer Division |
5 + 3 #addition
[1] 8
5 - 3 #Subtraction
[1] 2
-3 #Negation
[1] -3
5 * 3 #Multiplication
[1] 15
5 / 3 # Division
[1] 1.666667
5^3 #Exponentiation; 5 raised to 3
[1] 125
5**3 # 5 raised to 3
[1] 125
5 %% 3 #the remainder of 5 divide by 3 is 2
[1] 2
5 %/% 3 #Integer Division 3 goes into 5 1 time
[1] 1
1 + 2 * (5 + 4) # Paranthesis first then multiply by 2 then add 1
[1] 19
Operator | Expression | Description |
---|---|---|
+ | x + y | Addition |
– | x – y | Subtraction |
* | x * y | Multiplication |
/ | x / y | Division |
^ | x ^ y | Exponent |
% | x % y | Modulus (Remainder from division) |
÷ | x ÷ y | Integer Division |
\ | x \ y | inverse division; same as y / x |
5 + 3 #addition
8
5 - 3 #Subtraction
2
-3 #Negation
-3
5 * 3 #Multiplication
15
5 / 3 # Division
1.6666666666666667
5 ^ 3 # 5 raised to 3
125
5 % 3 #the remainder of 5 divide by 3 is 2
2
5 ÷ 3 #Integer Division. 3 goes into 5 1 time
1
1 + 2(5 + 4) # Paranthesis first then multiply by 2 then add 1
19
3 \ 5 # Inverse division. ie 5/3
1.6666666666666667
8÷2(3-1)
2
Operator | Expression | Description |
---|---|---|
< | x < y | Less than |
> | x > y | Greater than |
<= | x <= y | Less than or equal to |
>= | x >= y | Greater than or equal to |
== | x == y | Equal to |
!= | x != y | Not equal to |
3 < 10
[1] TRUE
3 < 2
[1] FALSE
3 > 2
[1] TRUE
3 <= 3
[1] TRUE
3 == 3
[1] TRUE
3 != 10
[1] TRUE
Operator | Expression | Description |
---|---|---|
< | x < y | |
> | x > y | |
<=,≤ | x <= y | Less than or equal to |
>=,≥ | x >= y | Greater than or equal to |
== | x == y | Equal to |
!=,≠ | x != y | Not equal to |
3 < 10
true
3 < 2
false
3 > 2
true
3 <= 3
true
3 == 3
true
3 != 10
true
Operator | Description | Description |
---|---|---|
& | x & y | AND |
| | x | y | OR |
! | !x | NOT ie negation |
&& | x && y | Short-circuited AND |
|| | x || y | Short-circuited OR |
3 < 10) & (4 > 5) # similar to 3 < 10 & 4 > 5 (
[1] FALSE
3 < 10) | (4 > 5) (
[1] TRUE
!(3 > 2)
[1] FALSE
Operator | Description | Description |
---|---|---|
&& | x && y | Short-circuited AND |
|| | x || y | Short-circuited OR |
! | !x | NOT ie negation |
& | x & y | bitwise AND |
| | x | y | bitwise OR |
~ | ~x | bitwise NOT |
⊻ | x ⊻ y | bitwise XOR (exclusive or) |
⊼ | x ⊼ y | bitwise nand (not and) |
⊽ | x ⊽ y | bitwise nor (not or) |
3 < 10 & 4 > 5
false
3 < 10 | 4 > 5
true
3 > 2) !(
false
Variables are used to store data, whose value can be changed according to our need. Unique name given to variable is identifier as it enables identify the data stored in memory.
Usually they are lvalues and rvalues, ie they can be on left side of the assignment operator and also be on the right side of the assignment operator.
One usually decides on the name to use for his/her variables. The rules followed in coming up with a variable name are:
Identifiers can be a combination of letters, digits, period (.) and underscore (_) ONLY.
It must start with a letter or a period. If it starts with a period, it cannot be followed by a digit.
Reserved words and Constants in R cannot be used as identifiers.
Variable and function names should be lowercase. Use an underscore (_
) to separate words within a name. Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful.
# Good
day_one
day_1
# Bad
first_day_of_the_month
DayOne
dayone
djm1
What is good? Bad? This is quite subjective. Some ground rules have been laid to try and have consistency in variable naming. There are many cases that have been proposed,
camelCase – The name starts with a lower case and then the next words are capitalized. eg dayOne
PascalCase – upper camelCase. ie starts with a capital letter. eg DayOne
snake_case – Underscore is used to separate the words. All words are in lowercase eg day_one
Of course you could use any of the above cases, as long as the variable is valid. Do not for example use kebab-case as it is not valid in R.
In order to make use of the variables, we need to be able to assign values to the variable. This is done by the help of the assignment operator. Often a language will restrict the assignment operator to only one symbol, =
. That is not the case with R. In R we have many assignment operators.
The left assignment operator. <-
or =
<- 3
x = 2
y <- b <- 4 # assigning 4 to both a and b
a = e = 5 # assigning 5 to both d and e d
The right assignment operator ->
10 -> x # assigning 10 to x
10 -> a -> b # assigning 10 to both a and b
Example of using a variable
<- 10 # create a variable x with the value 10
x # implicitly print the value of x. We could also use print(x) x
[1] 10
* 2 # Multiply x by 2 ie 10*2 x
[1] 20
<- x + 2 # increment x by 2
x #x is now 12 x
[1] 12
Note: Refrain from using inbuilt function names as variables. eg c <- 3
. c
is a function in R and hence should not be used as a variable name.
Note: There is an assign
function which can also be used to assign values to variables. The variable need to be written in literal form ie with quotes
assign("var_1", 3)
var_1
[1] 3
So far we have avoided the use of literal strings/characters. But they too can be used in assignment. Although this is a bad practice.
"var_2" <- 39 # DO NOT USE THIS THOUGH IT WORKS
var_2
[1] 39
Note that =
is not a comparison operator. ie for comparison use ==
. Assignment occurs from right to left. ie a statement like x = 1
means we assign the value 1 to a variable named x
.
While variables names could be anything, there are words reserved in R such that they cannot be changed nor can they be used as variables
if | else | repeat | while | function |
---|---|---|---|---|
for | in | next | break | TRUE |
FALSE | NULL | Inf | NaN | NA |
NA_integer_ | NA_real_ | NA_complex_ | NA_character_ | …1, …2 |
TRUE <- 1
Error in TRUE <- 1: invalid (do_set) left-hand side to assignment
if <- 2
Error in parse(text = input): <text>:1:4: unexpected assignment
1: if <-
^
These are rvalues. They cannot be on the left hand side of the assignment operator. Though common in lower level languages, R does not have much constants in it. Examples include numbers eg 5
, literal strings/characters eg ’hello'
, complex numbers -a number patched with the letter i eg 5i
, 3+9i
, integers eg 5L
, hexadecimals-numbers preceded by 0X
or 0x
eg 0xff
,logical values eg TRUE
5
[1] 5
3+9i
[1] 3+9i
0xff
[1] 255
TRUE
[1] TRUE
The value \(\pi\) which is a constant in nature is just a normal variable in R. It can be changed. Hence be careful when dealing with these types of values
pi
[1] 3.141593
<-4
pi # pi changed pi
[1] 4
rm(pi)#To remove the current stored variable pi and revert back to the original pi
pi
[1] 3.141593
NOTE: The variables F
and T
store the logical values false and true simultaneously. Though logical, they can be changed. Hence refrain from using them, or simply refrain from having variables named as F
or T
T
[1] TRUE
<- FALSE # Change the T
T # changed T T
[1] FALSE
rm(T) #remove the variable T and revert back to the original T
T
[1] TRUE
A data type is a collection of data values that are specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. Data types are important because they tell a computer system how to interpret the value of data.
The data types in R could be divided into two categories:
basic data types
Containers
We can use the function typeof
to determine the type of an object in R. Lets look at each type:
This type has a single value. There is a single object with this value. This object is accessed through the built-in name NULL
. It is used to signify the absence of a value in many situations, e.g., it is returned from functions that don’t explicitly or implicitly return anything.
NULL
NULL
typeof(NULL)
[1] "NULL"
These are logical values TRUE
and FALSE
. Internally stored as an integer with TRUE given the value 1 and FALSE given the value 0. The letters T
and F
are a shorthand of indicating true and false respectively. Note that T and F are not reserved words and thus their values can change. Use the cautiously.
typeof(TRUE)
typeof(FALSE)
R contains one more logical value. This is the NA
value. This value represents missing values. or values that are not applicable. Note that there is a distinction between NULL
and NA
.
Integers are zero, positive or negative whole numbers without a fractional part and having unlimited precision. Example include 0
, 100
, -10
. They also include binary, octal and hexadecimal numbers.On the other hand doubles are numbers with a fractional/decimal part. In R, every numeric number is stored internally as a double. To explicitly create an integer, you must append the letter L
after the number. eg 123L
, -12L
typeof(1)
typeof(12.34)
typeof(-10L)
typeof(0xfL)#Positive hexadecimal integer ie base 16 (0-9,a-f)
These are numbers that contain a real and imaginary part eg 3 + 2i
. To create a complex number, you must append the letter i
afte the number. Thus a number that is immediately followed by a lower case letter i
will be interpreted as a complex number.
3.21 + 2i
[1] 3.21+2i
typeof(3+2i)
[1] "complex"
This is a representation of anything else that is not a number. Must be in quotes. Either single or double. eg names, string, sentences etc
"hello"
[1] "hello"
typeof('a')
[1] "character"
While other languages such as C, C++, Python allows one to access the individual characters of a string, R does not. The character/string is in its simplest form. To access the individual characters, one has to first split the string into the individual characters.
A vector is a collection of the basic data types of the same kind. Eg suppose I have 3 integers. I can put them in a vector. Vectors are created by the c
function ie the concatenat function.
c(1L, 2L, 3L) #a vector of 3 integers.
[1] 1 2 3
c(2.4, 3.4) # a vector of 2 doubles.
[1] 2.4 3.4
c('hello world', 'hello class') # a vector of 2 strings
[1] "hello world" "hello class"
Note that a vector can only have the same type of data. Mixing the types in a data will result in implicit type conversion.
To access the elements of a vector, we use the extraction operator []
<- c(1,2,3,4,6)
vec_one 1] # gets the first element vec_one[
[1] 1
3] # gets the 3rd element vec_one[
[1] 3
We can also change the elements of a vector
1] <- 10 # changes the first element to 10
vec_one[ vec_one
[1] 10 2 3 4 6
Note that everything in R is a vector. Even the single element 1
is a vector of length 1. Use the length
function to determine the number of elements in a given vector.
<- c(1,4,5)
x length(x)
[1] 3
<- c('hello world', 'hello class')
y length(y)
[1] 2
More on vectors will be discussed latter on. The other data types will also be discussed later on
One can convert the basic data types from one type to the other using the as.type
function. eg as.logical
to convert ot logical, as.integer
function is used to convert explicitly to integers, as.double
to double, as.character
to character and as.complex
to convert to complex type.
Note that only legal values can be coerced to the specified type. ie a string literal "123"
can be converted to an integer 123
while the word 'hello'
cannot be converted to an integer. Thus '123'
is legal while 'hello'
is illegal. In case you try to convert an illegal value, the NA
will be the result.
as.logical(c(10, -1,3,0, 3+2i, 0+0i)) # from numeric to logical
[1] TRUE TRUE TRUE FALSE TRUE FALSE
All numerals including complex numbers apart from 0 are converted to TRUE. Only the number 0 whether double 0
, integer 0L
or complex 0 + 0i
is converted to FALSE.
as.logical(c('TRUE', 'true', 'True', 'T', 'FALSE', 'false', 'False', 'F'))
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
Only the listed characters above can directly be converted to their logical equivalence. Any other string will produce NA
. Try it out
as.integer(c(1,3.3)) # drops the fractional part
[1] 1 3
as.numeric(c('123', '0.4', '-45'))
[1] 123.0 0.4 -45.0
as.double(c('123', '0.4', '-45'))
[1] 123.0 0.4 -45.0
as.integer(c('123', '0.4', '-45'))
[1] 123 0 -45
as.character(c(1,3,4.5))
[1] "1" "3" "4.5"
as.character(c(TRUE, FALSE, T, F))
[1] "TRUE" "FALSE" "TRUE" "FALSE"
as.character(34 + 6i)
[1] "34+6i"
Note that most of the conversions occur implicitly. Suppose we have a double and an integer and place them in a vector. Since a vector can only contain one basic data type, the integer will be converted to a double. Why is this the case? There is a hierarchy followed. We first have logicals, integers, doubles, complex and lastly characters. This is in accordance with the amount of memory used to store each data type. Note that doubles require more memory to store than integers. Converting a double to an integer will lead to data loss while conversion to double leads to higher precision. Thats why we convert to type that requires more memory for storage.
eg:
c(TRUE, 1L) # Converts the logical value TRUE to integer
[1] 1 1
c(1, TRUE, '2') # converts everything to character.
[1] "1" "TRUE" "2"
c(c(1, TRUE), '2') # Explain why the results differ from the one above.
[1] "1" "1" "2"
Note that in the example c(c(1, TRUE), 2)
we get the results '1', '1', '2'
. This is because we first converted the logical value TRUE to double when creating the inner vector, then we converted that to character.
How can we get the value TRUE
from a character '1'
?
as.logical(as.numeric('1'))
[1] TRUE
Note that we could use more than one type conversion to get to the desired results.
A function in R is an object containing multiple interrelated statements that are run together in a predefined order every time the function is called.
A simple function is defined by the keyword function
and then stored in a variable name:
eg
<- function() {
square_10 return (10^2)
}
The simple statement above when called will return 100
square_10()
[1] 100
It does not make sense to write a function that will always return a constant. We just rather define the constant itself. But to make use of the function property, we need to define the function with some passed parameters. This will enable the function to evaluate the parameters in a predefined manner. The parameter, is just a variable, ie placeholder that is passed into the function, when the function is called
Example: A function to square any number, not necessarily 10
<- function(x){ # x is your parameter.
square return(x^2)
}
square(10)
[1] 100
square(5)
[1] 25
<- function(len, width){ # takes 2 parameters
rect_area <- len * width
area return(area)
}rect_area(10, 5)
[1] 50
Take note that when calling any function in R, whether user defined or inbuilt functions, we use the parenthesis. ie mean(a)
etc.
There are a lot of details that a function entails, although those will be discussed in a future date.
But R is an advanced calculator. How can I compute trigonometric values? The only downside with R is that you need to know the name for the functions you want. Regarding math functions, this is simple as they are stored exactly the same way they are called in math. look at the list of math functions below:
abs atanh cummin floor log2 tan
acos ceiling cumprod gamma sign tanh
acosh cos cumsum lgamma sin tanpi
asin cosh digamma log sinh trigamma
asinh cospi exp log10 sinpi trunc
atan cummax expm1 log1p sqrt
From the list above, it is easy to tell what sqrt
function does. ie It is the \(\sqrt{~~}\) function. We can tell what exp, sin, cos, tan
are. But what about atan, asin, tanh
etc?
Meaning if you need to compute \(sin^{-1}(x)\) you would have to know that the sine inverse function is represented as asin
in R whereby the a
stands for arc
ie arc sine
function. In other languages, the same function will have a different name. eg in python we use arcsin
instead.
From now on, you are expected to know the function names that you would use before using it. If you are not sure what the function is, you can Google.
Given a number x
,write a program that would obtain the digit immediately after the decimal point.
Input: 12.34
Output: 3
Input: 0.6123
Output: 6
Input: 213
Output: 0
Hint: Use math operations. eg modulus operator, integer division, multiplication etc
Given a number x
and position, write a program that would obtain digit at the specified position. In this exercise we will assume that positions to the left of the decimal points are positive while those to the right of the decimal point are negative.
Example:
Input: 612.34, pos = 2
Output: 6
Input: 612.34, pos = 1
Output: 1
Input: 612.34, pos = 0
Output: 2
Input: 612.34, pos = -1
Output: 3
Input: 612.34, pos = -2
Output: 4
Input: 612.34, pos = 4
Output: 0
Hint: Use math operations. eg modulus operator, integer division, multiplication etc.
Extra: What would change if the positions to the left were given as negative while those to the right as positive?
A simple function to determine maximum of two numbers:
<- function(x, y){
my_max <- (y > x) + 1
index return (c(x,y)[index])
}
Now we could run:
my_max(3, 10)
[1] 10
my_max(19, 3)
[1] 19
my_max(12,-4)
[1] 12
Take a good look at the code. Why did we add 1 to the logical y > x
? Now write a function my_min
that will take in two arguments and output the minimum of the two. Note that there are built in functions max
and min
t
Other ways of writing the max function could be:
<- function(x,y){
my_max1 <- x > y
i return (x**i * y**(1-i))
}
my_max1(3, 10)
[1] 10
my_max1(19, 3)
[1] 19
my_max1(12,-4)
[1] 12
\(xi + y(1-i)\) where \(i = x > y\)
Implement this method.
Could you explain as to how the two methods above are able to compute the maximum?
Write a function my_sign
that outputs the sign of the input. ie a negative number has a sign of -1
and a positive number has 1
Input: -9
Output: -1
Input: 23
Output: 1
Hint1: Use math operations. Recall \(x^0 = 1\). Select a good \(x\) and then think of what your exponent should be.
Hint2: Use math operations. Use a logical \(x\) and the expression \(2x - 1\)
Write a program absolute
that returns the absolute value of a number. Hint: Use the sign function you wrote in question 4.
Input: -9
Output: 9
Input: 23
Output: 23
Write a program cbrt
to compute the cuberoot: Hint \(\sqrt[3]{x} = sign(x)\sqrt[3]{|x|}\) Where \(|\cdot|\) is the absolute function and \(sign(\cdot)\) is the sign function.
Input: -8
Output: -2
Input: 1
Output: 1
Input: 27
Output: 3
Given a vector, write a R program called swap_first_last
to swap first and last element of the vector.
Examples:
Input : c(12, 35, 9, 56, 24)
Output : 24, 35, 9, 56, 12
Input : c(1, 2, 3)
Output : 3, 2, 1
Hint. You need a temporary variable.
Given a vector and provided the positions of the elements, write a program swap
to swap the two elements in the vector. Hint: The program has 3 parameters.
Examples:
Input : vec = c(23, 65, 19, 90), pos1 = 1, pos2 = 3
Output : 19, 65, 23, 90
Input : vec = c(1, 2, 3, 4, 5), pos1 = 2, pos2 = 5
Output : 1, 5, 3, 4, 2
Compute the following using R:
Round off the following numbers using R:
\(980, 950, 930\) to the nearest 100 Hint: round(123, -2)
rounds to nearest 100
\(98, 95, 93\) to the nearest 10
\(9.8, 9.5, 9.3\) to the nearest 1
\(0.98, 0.95, 0.93\) to the nearest 0.1
\(0.098, 0.095, 0.093\) to the nearest 0.01
Determine the final output of the following operations and check your answer against those produced by R
(3 > 4) | TRUE
3 > 4 | TRUE
(3 > 4) & TRUE
(3 > 4) | FALSE
3 >= 3
3 != 4
A cylinder has a radius of r cm and height of hcm. Write a function to obtain the surface area when completely covered. \(SA = 2\pi rh + 2\pi r^2\) Compute the SA when radius = 10cm and height = 18cm
let x = 3
what is the value of !x
? Elaborate as to why that is the case. What numerical value can x
take such that !x
results to TRUE
?
There are other assignment operators in R, ie <<-
and ->>
. What is the difference between these and the ones discussed above? Run ?`=`
in R and read the help page to the end. See whether you could answer the question.
What are the differences between “=” and “<-” assignment operators?
Rounding numbers is a task that occurs in every aspect of math. We would like to implement a function my_round
such that we could round to the nearest ten, one, five, tenths, half, fifths etc. How can we go about this? The formula is simple. For any given number \(x\), we can round it to the nearest number \(y\) by:
dividing \(x\) by \(y\)
Obtain the integral part of the quotient
Determine whether the fractional part of the quotient is greater than \(0.5\). This should be a logical value.
add the logical value obtained in step 3 to the integral part in step 2
Multiply the sum by \(y\). This should be the needed result.
Implement the above procedure in a program called my_round
Test the function with the following inputs:
Input: x = 123.4656, y = 1
Output: 123
Input: x = 123.4656, y = 5
Output: 125
Input: x = 123.4656, y = 10
Output: 120
Input: x = 126.4656, y = 10
Output: 130
Input: x = 123.4656, y = 100
Output: 100
Input: x = 123.4656, y = 1000
Output: 0
Input: x = 123.4656, y = 0.1
Output: 123.5
Input: x = 123.4656, y = 0.01
Output: 123.47
Input: x = 123.4956, y = 0.01
Output: 123.5
Input: x = 123.4656, y = 0.001
Output: 123.466
Input: x = 123.4656, y = 3
Output: 123
Comments
Comments are often important part of a program as they describe what each part of the program does. It is often necessary to include them so as your code can be understood by someone else or even by yourself later on when reviewing it. In R as in Python, comments are preceded by a sharp/hash/tag/pound symbol ie
#
Thus any line of code from the hash onwards is considered commented out as it will not be parsed by the interpretor