Week 3: Loops and exercises

0. Overview

R Example R
for loop for(){}
While loop while(){}

1. Looping functions

Loops are a fundamental part of programming in data science because they allow you to efficiently perform repetitive tasks on large datasets. Instead of writing repetitive code manually, loops enable automation by iterating over data structures like lists, dataframes, or arrays. This is particularly important when processing datasets with thousands or millions of entries, as it saves time and reduces the risk of errors. Loops also allow for dynamic analysis, enabling you to apply functions, calculations, or transformations to each element in a dataset, making them a powerful tool for scalability and flexibility in data analysis.

Imagine you are analyzing the carbon footprint of 1,000 households to assess their environmental impact. Each household has data on electricity consumption, gas usage, and travel habits. Your task is to calculate the total carbon emissions for each household using a simple formula:

\[ CarbonEmissions=(Electricity(kWh) × 0.5) + (Gas(m³) × 2.1) + (Travel(km)×0.2) \]

Without loops, you would need to manually calculate this formula for each household, which is tedious and impractical. Instead, with a loop, you can automate the process.

Imagine that we have measurements of forestation over time:

Year Forest Area (ha)
2000 1,000,000
2001 990,000
2002 975,000
2003 950,000
2004 930,000
2005 920,000
2005 900,000

If we want to compute the change in ha over time we would have to compute each year of change individually. Using a loop we automate this process by sequentially moving over the dataframe and computing the change rate. Let’s see this can be done in Python and R.

There are two main techniques for looping: a while loop and a for loop. The while loop needs a condition, while this condition is Ttrue, the loop will continue to run. The for loop is used when you want to repeat a task a specific number of times or iterate over a sequence (like a list, range, or string). It works by stepping through each element in the sequence until it reaches the end. Let’s have a look in detail:

1.1 For Loops in R

A loop in R starts with the for operator. This is followed by an argument that determines how the loop runs (“the loop runs for…”). We start by defining a variable that will take the different values in the loop. Suppose we want to print the value 1 to 5. This requires a loop that takes a variables that starts at 1, and increases by one with each iteration. The in operator is used to define the values the variables will take.

In the following code we will show different ways to loop:

# range of numbers
for (i in 1:5) {
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

In this first example the code will start with i = 1, then it runs that code between brackets, so it will print out “1”. It then automatically moves to 2, prints, moves to 3 etc. until i = 5. You can put any number of operations between the curly brackets:

# range of numbers
j = 0
for (i in 1:5) {
  j = j + 1
  print(i)
  print(j)
}
[1] 1
[1] 1
[1] 2
[1] 2
[1] 3
[1] 3
[1] 4
[1] 4
[1] 5
[1] 5

By default the for operator will use a step of 1 (1, 2, 3, 4, 5). We can use different steps by setting th second argument as a vector that only contains the elements that we want i to take. seq(1, 10, by = 2) creates a vector starting at 1 stopping at 10 with a step of 2, this means that it creates a vector with: 1,3,5,7,9. i then iterates over this list taking only those values:

# custom step
for (i in seq(1, 10, by = 2)) {
  print(paste("Odd number:", i))
}
[1] "Odd number: 1"
[1] "Odd number: 3"
[1] "Odd number: 5"
[1] "Odd number: 7"
[1] "Odd number: 9"

The elements we use for the loop do not have to be numbers, they can also take the form of text, instead of using a vector with numbers, we use a vector with words for example:

# items in a vector
fruits <- c("apple", "banana", "cherry", "date")
for (fruit in fruits) {
  print(fruit)
}
[1] "apple"
[1] "banana"
[1] "cherry"
[1] "date"
# Loop with an index
languages <- c("Python", "Java", "C++", "Ruby")
for (i in 1:length(languages)) {
  cat("Language", i, "is", languages[i], "\n")
}
Language 1 is Python 
Language 2 is Java 
Language 3 is C++ 
Language 4 is Ruby 

Of course it is possible to complexify loops by nesting them.

# Nested loops
for (i in 1:3) {
  for (j in 1:2) {
    cat("i =", i, ", j =", j, "\n")
  }}
i = 1 , j = 1 
i = 1 , j = 2 
i = 2 , j = 1 
i = 2 , j = 2 
i = 3 , j = 1 
i = 3 , j = 2 
# break statement
# it is possible to stop the loop given a certain condition
numbers <- c(3, 7, 1, 9, 4, 2)
for (num in numbers) {
  if (num == 9) {
    cat("Found 9. Exiting the loop.\n")
    break
  }
  cat("Processing", num, "\n")
}
Processing 3 
Processing 7 
Processing 1 
Found 9. Exiting the loop.

2. While loops

While loops continue to loop as long as a specific condition is satisfied. They therefore differ from the for loops which have a specified stopping point. The danger with these loops is that they can theoretically run forever if the conditions is always verified. The basic logic of these loops is: while followed by a condition and then the code to execute while this condition is verified:

2.1 While loops

# Example 1: Simple while loop
count <- 1
while (count <= 5) {
  cat("Iteration", count, "\n")
  count <- count + 1}
Iteration 1 
Iteration 2 
Iteration 3 
Iteration 4 
Iteration 5 

The main point of the while loops is that the execution of the code should update something that bring the condition closer to being verified. In the example above, we increase the count variable each time we run the loop until it’s equal to a certain value. If you use a while loop but nothing in the script is updated, the loop will run forever.

# Example 2: Loop with a condition and next statement (equivalent to continue in Python)
i <- 0
while (i < 10) {
  i <- i + 1
  if (i %% 2 == 0) {
    next }  # Skip even numbers
  cat(i, "\n")}
1 
3 
5 
7 
9 
# Example 3: Nested while loops
row <- 1
while (row <= 3) {
  col <- 1
  while (col <= 3) {
    cat("Row", row, "Column", col, "\n")
    col <- col + 1 }
  row <- row + 1}
Row 1 Column 1 
Row 1 Column 2 
Row 1 Column 3 
Row 2 Column 1 
Row 2 Column 2 
Row 2 Column 3 
Row 3 Column 1 
Row 3 Column 2 
Row 3 Column 3 

# Example 1: Simple while loop
count = 1
while count <= 5:
    print("Iteration", count)
    count += 1

# Example 3: Loop with a condition and continue statement
i = 0
while i < 10:
    i += 1
    if i % 2 == 0:
        continue  # Skip even numbers
    print(i)

# Example 4: Nested while loops
row = 1
while row <= 3:
    col = 1
    while col <= 3:
        print("Row", row, "Column", col)
        col += 1
    row += 1

3. Exercises

3.1 Multiples of 3 and 5

Use a for loop to iterate over a range of numbers and a while loop to calculate cumulative sums.

  • Write a program that:
    • Iterates through the numbers from 1 to 50 using a for loop.
    • For each number, checks if it is divisible by 3 or 5 using an if statement.
    • Prints the number if the condition is true.
  • Extend the program:
    • Use a while loop to calculate the cumulative sum of all the numbers divisible by 3 or 5.
    • Stop the loop once the sum exceeds 200 and print the final cumulative sum.

3.2 Filtering Rows from a Dataframe

# Use the following dataframe:
data = data_frame(
    "Region" = c("A", "B", "A", "C", "B", "C", "A"),
    "EnergyUsage" = c(10, 20, 30, 40, 50, 60, 70),
    "Sustainable"= c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE)
)
  • Write a program that:
    • Iterates through each row of the dataframe using a for loop.
    • Checks if the Sustainable column is True and EnergyUsage is greater than 20.
    • Prints the rows meeting these criteria.
  • Extend the program:
    • Use a while loop to iterate through the dataframe rows, adding the EnergyUsage of rows that are sustainable to a cumulative total.
    • Stop the loop once the cumulative EnergyUsage exceeds 50, and print the total and the rows contributing to it.

3.3 Guess the Secret Number

  • Write a program that:
    • Randomly selects a secret number between 1 and 20 using runif()
    • Uses a while loop to allow the user to guess the number.
    • If the guess is too high, print “Too high!”.
    • If the guess is too low, print “Too low!”.
    • Break the loop and print “Correct!” when the user guesses the number.
  • Extend the program:
    • Limit the user to 5 attempts using an additional counter.
    • If the user fails within 5 attempts, print “Game Over!” and reveal the secret number.
    • use the readlines function to let the user enter a value:
guess <- as.integer(readline(prompt = "Guess a number between 1 and 20: "))
Guess a number between 1 and 20: 

3.4 working with real data

On blackboard you wil find a dataset with energy production per county. Your task is to analyze this data to gain insights into the energy trends of these regions.

  • Load and Inspect the Data
    • Import the dataset into your programming environment.
    • Display the first 10 rows of the dataset to understand its structure.
    • Identify the columns and their data types.
    • In R you can use the summary() function to have a quick look at the data, in Python you can use the print(mydata.describe()) function.
  • Subset the Data
    • Filter the dataset to include only rows corresponding to European countries.
    • Save this subset into a new variable (e.g., EU_data).
  • Compute Renewable Energy Percentage
    • Create a new column called renewable_percentage that calculates the percentage of renewable + energy in the total energy production.
  • Classify Countries
    • Create a new column called renewable_category:
    • If renewable_percentage is greater than 50%, classify as “High Renewable”.If between 20% and 50%, classify as “Medium Renewable”. Otherwise, classify as “Low Renewable”.
  • Using If/else
    • Write a script to check if there are any countries with missing values in the renewable_energy or total_energy columns. If missing values exist, print a message indicating how many rows are incomplete.
  • Group and Summarize Data
    • Group the data by renewable_category and compute the average renewable_percentage for each category.
  • Display the results in a tabular format.

4. Exam-type questions

In the exam there will be different types of questions.

  • The first part will consist of several multiple choice questions. You will be provided with a code snippet in either python or R (some questions will have R, some python) you will have to answer the question based on this code.
  • The second type of question consists in reading and explaining a bigger script. With this type of question you are requested to explain each line of code nd deduce the output of the script.
  • The third type consists in finding an error. We will provide you with a script, you will be requested to identify the error and explain how to adjust the code.
  • The final question type consists in writing a short script. You will be provided with a task, you are requested to write a script that performs the task. You may pick the language.

4.1 Exercise 1: reading code in R

You are provided with an R script. Explain each line of code of the script.

factorials <- list()
for (n in 1:10) {
  fact <- 1
  counter <- n
  while (counter > 1) {
    fact <- fact * counter
    counter <- counter - 1
  }
  factorials[[as.character(n)]] <- fact
}
print(factorials)

4.2 Exercise 2: reading code in R

discounted_prices <- c()
for (i in 1:nrow(products)) {
  price <- products$Price[i]
  in_stock <- products$InStock[i]
  if (price > 50 && in_stock) {
    new_price <- price * 0.9
  } else if (price <= 50 && in_stock) {
    new_price <- price * 0.95 
  } else {
    new_price <- price
  }
  discounted_prices <- c(discounted_prices, new_price)
}
products$DiscountedPrice <- discounted_prices
print(products)

4.5 Exercise 5: write code

You are in charge of a log-in system on a website in which a user has to provide the correct password. When a correct password is provides print “success!”. If the user has failed 3 times stop the system. Write a function that can perform this task.