This chapter discusses how to create loops and program with if-else statements. These chapters only contain elements that we expect you to know at the end of the course. More detailled books and references exist. For more detailes explanations on specific points, please check: R for Data Science.
Loops are a useful tool for programming. It allows us to automatically repeat operations. For example if we want to compute indicators for each year in a dataset, we can automatically subset per year, compute indicators, store the results and move on to the next year.
Warning
The biggest difference between R and Python resides in the fact that Python starts at 0 while R starts at 1. You can see that in the results below. range(5) gives 0,1,2,3,4 in python.
Making loops
For Loops
A loop in R starts with the for operator. This is followed by a condition that determines how the loop runs (“the loop runs for…”). We start by defining a variable that will take the different values in the loop. Suppose we want to print the value 1 to 5. This requires a loop that takes a variables that starts at 1, and increases by one with each iteration. The in operator is used to define the values the variables will take.
In the following code we will show different ways to loop:
# range of numbersfor (i in1:5) {print(i)}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
# items in a vectorfruits <-c("apple", "banana", "cherry", "date")for (fruit in fruits) {print(fruit)}
[1] "apple"
[1] "banana"
[1] "cherry"
[1] "date"
# Loop with an indexlanguages <-c("Python", "Java", "C++", "Ruby")for (i in1:length(languages)) {cat("Language", i, "is", languages[i], "\n")}
Language 1 is Python
Language 2 is Java
Language 3 is C++
Language 4 is Ruby
# Nested loopsfor (i in1:3) {for (j in1:2) {cat("i =", i, ", j =", j, "\n") }}
i = 1 , j = 1
i = 1 , j = 2
i = 2 , j = 1
i = 2 , j = 2
i = 3 , j = 1
i = 3 , j = 2
# break statement# it is possible to stop the loop given a certain conditionnumbers <-c(3, 7, 1, 9, 4, 2)for (num in numbers) {if (num ==9) {cat("Found 9. Exiting the loop.\n")break }cat("Processing", num, "\n")}
Processing 3
Processing 7
Processing 1
Found 9. Exiting the loop.
A loop in Python starts with the for operator. This is followed by a condition that determines how the loop runs (“the loop runs for…”). We start by defining a variable that will take the different values in the loop. Suppose we want to print the value 0 to 4. This requires a loop that takes a variables that starts at 0, and increases by one with each iteration. The in operator is used to define the values the variables will take.
In the following code we will show different ways to loop:
# a range of numbersfor i inrange(5):print("Iteration", i)# items in a listfruits = ["apple", "banana", "cherry", "date"]for fruit in fruits:print("I like", fruit)# with an indexlanguages = ["Python", "Java", "C++", "Ruby"]for i, language inenumerate(languages):print("Language", i +1, "is", language)# with a custom stepfor number inrange(1, 11, 2):print("Odd number:", number)# Nested loopsfor i inrange(3):for j inrange(2):print("i =", i, ", j =", j)# a break statementnumbers = [3, 7, 1, 9, 4, 2]for num in numbers:if num ==9:print("Found 9. Exiting the loop.")breakprint("Processing", num)
Pay Attention
While R, in general, does not care about the indentation of your code, Python does. If the code is not properly indented, the script will not run.
While loops
While loops continue to loop as long as a specific condition is satisfied. The therefore differ from the for loops which have a specified stopping point. The danger with these loops is that they can theoretically run forever if the conditions is always verified. The basic logic of these loops is while followed by a condition and then the code to execute while this condition is verified:
# Example 1: Simple while loopcount <-1while (count <=5) {cat("Iteration", count, "\n") count <- count +1}
# Example 3: Loop with a condition and next statement (equivalent to continue in Python)i <-0while (i <10) { i <- i +1if (i %%2==0) {next } # Skip even numberscat(i, "\n")}
1
3
5
7
9
# Example 4: Nested while loopsrow <-1while (row <=3) { col <-1while (col <=3) {cat("Row", row, "Column", col, "\n") col <- col +1 } row <- row +1}
# Example 1: Simple while loopcount =1while count <=5:print("Iteration", count) count +=1# Example 3: Loop with a condition and continue statementi =0while i <10: i +=1if i %2==0:continue# Skip even numbersprint(i)# Example 4: Nested while loopsrow =1while row <=3: col =1while col <=3:print("Row", row, "Column", col) col +=1 row +=1
Conditional statements (if, else, elif)
# Using if statementx <-10if (x >5) {cat("x is greater than 5\n")}
x is greater than 5
# Using if-else statementy <-3if (y >5) {cat("y is greater than 5\n")} else {cat("y is not greater than 5\n")}
y is not greater than 5
# Using if-else if-else statementz <-7if (z >10) {cat("z is greater than 10\n")} elseif (z >5) {cat("z is greater than 5 but not greater than 10\n")} else {cat("z is not greater than 5\n")}
z is greater than 5 but not greater than 10
# Nested if statementsa <-15b <-6if (a >10) {if (b >5) {cat("Both a and b are greater than 10 and 5, respectively\n") } else {cat("a is greater than 10, but b is not greater than 5\n")}}
Both a and b are greater than 10 and 5, respectively
# Using a conditional expression (ternary operator)age <-18status <-if (age >=18) "adult"else"minor"cat("You are an", status, "\n")
You are an adult
# Using if statementx =10if x >5:print("x is greater than 5")# Using if-else statementy =3if y >5:print("y is greater than 5")else:print("y is not greater than 5")# Using if-elif-else statementz =7if z >10:print("z is greater than 10")elif z >5:print("z is greater than 5 but not greater than 10")else:print("z is not greater than 5")# Nested if statementsa =15b =6if a >10:if b >5:print("Both a and b are greater than 10 and 5, respectively")else:print("a is greater than 10, but b is not greater than 5")# Using a conditional expression (ternary operator)age =18status ="adult"if age >=18else"minor"print("You are an", status)
Mapping functions
When working with data, we often want to apply a function to all rows of a column, or even on a whole dataframe. For example if we want to remove the capital letters from text or round up numbers to a set number of digits. This could be achieved by looping over all the cells in the dataframe, but there are often more efficient ways of doing this. There are functions that allow us to apply another function to each cell of the column, or dataframe, that we select. This function takes care of the looping behind the scenes.
In R we have multiple options for this, some in base R, some from different packages. We will discuss the most notable functions here.
apply sapply mapply lapply
map
# All these functions are # present in base R# Example 1: apply functionmatrix_data <-matrix(1:9, nrow =3)# Apply a function to rows (axis 1)row_sums <-apply(matrix_data, 1, sum)print("Row Sums:")
[1] "Row Sums:"
print(row_sums)
[1] 12 15 18
# Apply a function to columns (axis 2)col_sums <-apply(matrix_data, 2, sum)print("Column Sums:")
[1] "Column Sums:"
print(col_sums)
[1] 6 15 24
# Example 2: lapply functiondata_list <-list(a =1:3, b =4:6, c =7:9)# Apply a function to each element of the listsquared_list <-lapply(data_list, function(x) x^2)print("Squared List:")
[1] "Squared List:"
print(squared_list)
$a
[1] 1 4 9
$b
[1] 16 25 36
$c
[1] 49 64 81
# Example 3: sapply function# Similar to lapply, but attempts to simplify the resultsquared_vector <-sapply(data_list, function(x) x^2)print("Squared Vector:")
[1] "Squared Vector:"
print(squared_vector)
a b c
[1,] 1 16 49
[2,] 4 25 64
[3,] 9 36 81
# Example 4: tapply function# Used for applying a function by factorssales_data <-data.frame(product =c("A", "B", "A", "B", "A"),sales =c(100, 150, 120, 180, 90))# Apply a function to calculate total sales by producttotal_sales <-tapply(sales_data$sales, sales_data$product, sum)print("Total Sales by Product:")
[1] "Total Sales by Product:"
print(total_sales)
A B
310 330
import numpy as npimport pandas as pd# Example 1: NumPy equivalent of applymatrix_data = np.array([[1, 4, 7], [2, 5, 8], [3, 6, 9]])# Apply a function to rows (axis 1)row_sums = np.apply_along_axis(np.sum, axis=1, arr=matrix_data)print("Row Sums:")print(row_sums)# Apply a function to columns (axis 0)col_sums = np.apply_along_axis(np.sum, axis=0, arr=matrix_data)print("Column Sums:")print(col_sums)# List data_list = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}# Apply a function to each element of the dictionary valuessquared_list = {key: [x**2for x in value] for key, value in data_list.items()}print("Squared Dictionary:")print(squared_list)# Example 3: pandasdata_dict = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}df = pd.DataFrame(data_dict)# Apply a function to each column (series) and return a DataFramesquared_df = df.apply(lambda x: x**2)print(squared_df)# pandas sales_data = pd.DataFrame({'product': ['A', 'B', 'A', 'B', 'A'],'sales': [100, 150, 120, 180, 90]})# Apply a function to calculate total sales by producttotal_sales = sales_data.groupby('product')['sales'].sum()print("Total Sales by Product:")print(total_sales)
The map() function in R
In R we also have the map() function, this function is part of the purrr package. map() is able to apply a function of your chosing to each element of a list, vector or other data strucutr. The basic syntax for the function is:
map(.x, .f, ...)
The .x argument is the data structure you wish to apply the function to. The .f argument will be the function you will be applying to the dataframe (for example, gsub(), as.integer(), tolower(), etc.)
The main advantage of map() is that it is a one format solution while the base R functions (apply, lappply, sapply and tapply) all have different syntaxes and behaviour. map() works on all and behaves identically on all datastructures.
Here is an example of how to use this function:
library(purrr)test_data =list(3,9,12,15,18)# we will apply the sqrt() function to these numberstest_data_sqrt =map(test_data, ~sqrt(.))print(test_data_sqrt)
It is possible to use map() on a specific colomn in a dataframe. You will need to pay close attention to select this column correctly. Applying the function to a matrix will result in the matrix being reverted back to a list. For this reason we add the unlist() function so that the results can be returned in matrix format.