Week 1: The programming basics

0. Overview

For this first session, the goal for you is to learn the following operations:

R Example R
Install a package install.packages(“dplyr”)
Load a package library(dplyr)
Create a variable value <- 42
Write a comment # This is a comment
Understand the difference between text, values, and variables plop <- “42” plop <- 42 plop <- fourtytwo
Paste paste(“the value is”, plop)
Print print(paste(“the value is”, plop))
Basic arithmetic 42 + 48
Use a function max(c(1,2,3,4,68))
Use a function from a script
Use logic operators a != b
Combine functions

1. Loading and installing libaries

R comes with basic, build-in functions. This means that when you launch R you can use these functions directly. This is the case for the print function for example, but also basic mathematical operations max(), min(), mean().

print("This is how we use the print function in R")
[1] "This is how we use the print function in R"

Think of R as a basic toolbox. This toolbox comes with some essential tools, like a hammer (the print() function), a screwdriver (the len() function), and a measuring tape (the max() function). These tools are available right away because they’re built into the toolbox (R).

However, sometimes you need more specialized tools that aren’t in the basic toolbox. For example, if you want to build a bookshelf, you might need a power drill or a saw. These tools aren’t included by default in the basic toolbox. In the world of R, these extra tools are available is something called packages.

A package is like an extra toolbox full of new tools (functions) that someone else has created for a specific purpose. For example:

If you want to work with maps, you would need functions to create maps, functions for this are available in the maps package. If you want to make graphs, you might need the ggplot package. To use these packages you will need to download and install them first.

When you install a package, it’s like going to the store, buying the specialized toolbox, and adding it to your existing set of tools. Except that package in R are free. Once installed, you can use the new tools (functions) it provides.

Let’s see how we install packages in R:

install.packages("dplyr")

We only have to install packages once. Once they are installed we only need to tell R that we want to use the functions from these packages. We do by loading the package. This let’s R know that you want to use the functions:

library(dplyr)

You only have to load the packages once when you start working. Reload is only necessary if you have quit R. Usually when you get the following error messages, this means that you did not load the package and therefore R cannot find the function you are trying to load:

x = add.image(plop)
Error in add.image(plop) : could not find function "add.image"

The error message clearly states that the function we are trying to use cannot be found. This means that it’s time for you to search the package it is in and load it using library.

Exercises part 1:
  1. Install required packages in R:
    Install the following packages: tidyverse, ggplot2, igraph

  2. Manual Install of a package:
    Install the NetworkIsLife Package.

2. Create a variable

A variable is like a container or a labeled storage box that holds data or information. In programming, we use variables to store values such as numbers, text, or more complex data, so that we can easily refer to, and manipulate them throughout our code.\

Let’s take a real world analogy:

Imagine you’re working at a lab bench or a kitchen counter:

  • You have a jar with a label on it: “sugar”.
  • Inside, you put 250 grams of sugar.
  • Now, whenever you need sugar, you don’t have to remember “250”, you just use the jar labeled “sugar”.

In R, it’s the same idea:

sugar <- 250

You’ve created a variable called sugar, and you’ve stored the number 250 in it.

If later you need to calculate how much sugar you’ll need for 3 recipes, you can do:

sugar * 3

R looks inside the “sugar” jar, finds 250, and does the math for you, no need to retype or remember the number. The example may seem trivial, but we use this to store entire data frames with millions of lines. Using the same logic we don’t have to recreate the data frame every time we want to modify or add something to it.

Why do we create variables?

  • Convenience: Instead of typing the same value repeatedly, we store it in a variable and use the variable’s name. For example, if you store a value like 42 in a variable called age, you can use age throughout your program instead of the number 42.
  • Flexibility: Variables allow us to use information that may change over time. If you store a value in a variable, you can update the variable’s content later without having to rewrite your entire program. For example, a variable temperature could hold the temperature of a room, which you might update as the temperature changes.
  • Reusability: Once you’ve stored a value in a variable, you can reuse it as many times as you want in your program. This avoids repetition and makes the code cleaner.

# without variables
print(20 + 30)
[1] 50
# With variables

a <- 20
b <- 30
result <- a + b
print(result)
[1] 50

Creating the variables a, b and result, allows us to use them later in the code. Beyond simple numerical values, other variables are created in an identical manner. In R, using “<-” assigns the value/object on the right to the variable on the left.

3. Using functions

What can be done with programming has no bounds. Many people before us have been programming and written code that do amazing things. The beauty of the opensource environments is that this work is shared with us in the form of packages and functions. A function is basically a script that executes a very specific task. For example the print() function will print out whatever you put between parentheses. Since this function exists, you don’t have to worry about writing a script that does this. The same is true for mathematical operations. You can simply use the max() function without having to write a script that searches through data to find the max yourself. Let’s see how this works.

3.1 Using functions from packages

Using a function always works in the same way in R, the construction is always:

  • Name of the function: paste(), print(), matrix(), data.frame(), as.numeric()
  • Between parenthesis we add the arguments of the function: matrix(nrow = 10, ncol = 3, 0)

So in matrix(nrow = 10, ncol = 3), matrix is the function, nrow and ncol are the arguments. From the names you might be able to guess what this does, this function creates a matrix, the parameters define the dimensions of the matrix. nrow = 10 means that the matrix will have 10 rows. ncol = 3 means that the matrix will have 3 columns.

When we run this code you will see that a 3x3 matrix is created:

matrix(nrow = 3, ncol = 3)
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA
[3,]   NA   NA   NA

Let’s make this more interesting and create a matrix full of 3’s. For this we need to add an additional argument:

m <- matrix(nrow = 3, ncol = 3, 3) # create and store the matrix in a variable
m # display the matrix
     [,1] [,2] [,3]
[1,]    3    3    3
[2,]    3    3    3
[3,]    3    3    3

Now let’s multiply this matrix by 3

m <- m * 3
m
     [,1] [,2] [,3]
[1,]    9    9    9
[2,]    9    9    9
[3,]    9    9    9

Because we stored the matrix in a variable “m”, we don’t have to create it again, we can just call it and apply a multiplication. Note that now that we have stored the multiplication in the same variable the matrix with 3’s is destroyed and replaced with the matrix with 9’s.

Let’s illustrate this with the example of a matrix. We first create a matrix:\

Small_matrix <- matrix(0,nrow = 3, ncol = 3)
# this creates a variables called Small_matrix that is an empty 3x3 matrix.
print(Small_matrix)
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0

In the examples above, we create a new variables small_matrix which contains a matrix. We can then use this later by referring to the name of the object. Below we add 3 to each cell of the matrix:

Small_matrix <- Small_matrix + 3

By writing the same name on the left and righ, the object Small_matrix is replaced by Small_matrix + 3 which means that we cannot go back to the initial Small_matrix filled with 0’s.

If we want to keep both variables, we need can create a new variables:

Small_matrix_3 <- Small_matrix + 3

When we work with data, we often need to perform multiple tasks in a row before we have data that is actually usable. For example, imagine that you have a database with emission levels for the world. You’re doing an analysis on Europe so you would have to extract only European countries. Then you might want to transform the format of the data, round up some numbers, compute percentage, growth rates etc. We can combine multiple functions in one line of code.\

The following creates a matrix and then transforms the matrix into a dataframe.

Thesis_data_subset <- as.data.frame(matrix(nrow = 20, ncol = 3, 3))

An alternative exists in base R to make this easier to read. We can use what is called a pipe: “|>”:

Thesis_data_subset <-  matrix(nrow = 20, ncol = 3, 3) |> as.data.frame()

This approach can make scripts more readable when many transformations are applied at the same time.

This line creates the matrix and then applies the as.data.frame() function. Equivalent to this is the pipe that we find in the tidyverse:

library(tidyverse)
Thesis_data_subset <-  matrix(nrow = 20, ncol = 3, 3) %>% as.data.frame()

The tidyverse is often found in data science with R manuals.

Exercises part 3.1:
  1. Create a Matrix:
    Write a code to create a 4x4 matrix filled with zeros. What is the variable’s name, and what type of object does it store?

  2. Variable Assignment:
    Assign the number 10 to a variable named my_number. What happens if you assign a new value, for example 20, to the same variable afterward?

  3. Matrix Operations:

    • Perform an operation that adds 5 to each element in the matrix you created in question 1.
    • Multiply the matrix by 9.
    • Multiply the matrix by itself (matrix multiplication). This requires a specific operator. Use the internet to find which operator to use and perform the multiplication.
  4. Practice chaining functions

  • Create a vector with values from 1 to 20, then use the as.matrix() function to transform it into a matrix, then use max() to get the highest values and then transform then compute the log of this value. Use the |> operator to chain the functions.

3.2 Working with your own functions

While packages offer a whole universe of functions, it often happens that we write our own for specific tasks. Imagine a case where you’re working on a big dataset in and you have to apply the same type of operation multiple times. This can result in a very long script in which you have repeat the same lines of code multiple times. This makes everything difficult to read. In these cases it can be efficient to write your own function and use one line of code instead.\

We do not expect you to make your own functions. We only want you to be able to identify custom functions in a script.

In R, a function is defined by the function…function…let me clarify:

my_own_function <- function(argument){
  return(something)
}

We create here a new function called my_own_function, which has one argument. Everything between brackets is the code that is executed when the function is called.

For example, we could make a function that returns a greeting when supplied with a name. In other words, when you enter a name, the function will return “Hi, name”:

greet <- function(name){ # Creates a function and stores it in the object "greet", this is the name of the function.
  print(paste("Hi,", name, "!")) # This is what happens when we use the greet() function
}

We create a function called “greet” which takes one argument which is the name. It then returns the text “Hi, name”.

If you see a function in a script, all you have to do is run the code once. This will load the function, just like the import function. You are then ready to use the function.

You can then use this function just like you would a function from a package:

greet("Danny Rojas")

4. Text, values and variables

A variable is like a labeled container that holds information (data). You give this container a name, and you can store anything inside it—like numbers, words, or even more complex data. An important distinction to make is the difference between numerical values and textual values. When we create a variable and a assign a number, this means we can later use this for mathematical operations. This needs to be differentiated from assigning the textual value “42”.

# assigning a number
variable_numeric <- 42 
# assigning text
variable_text <- "42"

Any value between is understood as text by python and R. This means that if we try to multiply variable by a number, we get an error in R and an unexpected output in Python.

variable_text <- "42"
variable_text * 2
# Error in variable_text * 2 : non-numeric argument to binary operator

A binary operator is for example +, -, /. When we try to use non-numeric values (here, “42”) then we get this error message. In python, the output is not an error, the string is simply multiplied literally, resulting in a new string that is twice the previous string: “4242” (which is not a number, still a string).

It’s good practice, especially when loading data from an external source for the first time, to check the format of the data, i.e ensuring that what you want to be numbers are numbers, and what you want to be text is indeed…text. Below we show you how to check for a specific format. You can use this when you want to check that the data is in the expected format.

variable_text <- "42"
# check if the variable is numeric:
is.numeric(variable_text)
# check if the variable is text:
is.character(variable_text)

When you don’t know what the format of the data is, or you are getting frustrated because a function is not working for some reason, you can also ask what the format is directly. These functions will return the format of the data, allowing you to check directly what you are working with:

class(data)
Data types

1. Integer (int)

  • Represents whole numbers, both positive and negative, without decimals.
  • Example: 5, -42, 100

2. Float (float)

  • Represents numbers that have a decimal point. It can store both positive and negative decimal numbers.
  • Example: 3.14, -0.001, 42.0

3. String (str)

  • Represents a sequence of characters (text), typically enclosed in single or double quotes.
  • Example: "Hello, world!", 'Python'

4. Boolean (bool)

  • Represents a value that is either True or False. Booleans are often used in conditions and logical operations.
  • Example: True, False

5. Logical (This is equivalent to Boolean in Python)

  • In Python, logical values are handled by the boolean (bool) type, where logical conditions return True or False.
  • Example: a == b might evaluate to True or False.
Exercises part 4:
  1. Check numeric values:
    Assign numeric values to two variables. You can name them however you want and assign any numerical value. Check the format of the variable of this variable.

  2. Check textual values:
    Assign a textual value to 2 variables. You can name them however you want and assign any textual value. Check the format of the variable.

  3. Transform the values: We now want to transform the values to another format. It often happens that when we download data and load it into python/R that the format is not understood correctly. Use the str() function in python to transform your number into text, then use the isinstance() function to check that it worked. Do the same in R, with the as.character() function and the is.character() function.

5. Logic operators

Logical operators are commonly used in data analysis, especially when sub-setting datasets. For example when we want to extract documents that are from the year 2000 which have the term “sustainability” and the term “climate change” but not the term “fossil fuel”. Combining these operators is important, and so is understanding how they work.

Pay Attention

Some differences between R and Python become apparent here. In R, TRUE and FALSE must be written in all caps to be recognised as the logical operator. In Python, True and False must start with a capitalized letter. or, and, not should also be written exactly in this manner.If these operators are written differently, they will be recognized as objects.

x <- 4
y <- 8
# Equal (==)
x == y
[1] FALSE
# And (&)
x == 4 & y == 8
[1] TRUE
# Or (|)
x == 4 | y == 8
[1] TRUE
# Not (!)
!y
[1] FALSE
# Combine
z <- "plop"
x == 4 & (y == 8 | z == "plop")
[1] TRUE

In data analysis, we usually use operators to subset data. This means that we compare a variable to a value to check if it fits our criteria. For example, if we have a column that contains a year, and we only want observations with the year 2003, we will search for year == 2003. In this setting the R operators we just described will be the same. It is possible that these operators vary when different packages are used in python. For instance, in the context of the pandas package, and becomes &, or becomes |, not becomes ~. We will address these variations in the database manipulation chapter.

Exercises part 5:
  1. Equality Check:
    • Create two variables, x and y, with different values. Write an expression that checks if x is not equal to y, and print the result. Experiment by making x and y equal and see how the output changes.
  2. Inequality Check:
    • Create two variables, a and b, and assign them any numbers you like. Write an expression that checks if a is equal to b, and print the result. Try changing the values of a and b to see when the result is True and when it’s False.
  3. Multiple Conditions:
    • Create two variables, temperature and humidity. Set temperature to a value greater than 20 and humidity to a value less than 50. Write an expression that checks if temperature is both greater than 20 and humidity is less than 50, and print the result.

6. The print and paste functions

The print function is crucial in programming because it allows developers to display information to the user or themselves during the development process. It provides immediate feedback, helping programmers debug code by checking the values of variables, verifying the flow of execution, or ensuring that certain conditions are met. Without print, it would be challenging to observe how the program behaves internally, making it a vital tool for both learning and real-world software development.

A very basic example of this is printng out the value of a variable in the context of what it represents. Imagine that we are working on a project in which we start with some raw data, that we clean, step by step before we can start using it for a regression. In this process we remove missing values, outdates values, we might removes some regions etc. To make sure we don’t remove to much data, or even just to be sure we don’t make a mistake, we can decide to print out some information at different steps. The print function allows us to print out this type of information:

plop <- 42
# print a numeric value
print(plop)
# print text, remember to use "" for text
print("Step 3 is done")

We can make things more interesting by combining values and text. For this we use the paste function in R and the “+” operator in python:

# lets create a value at random and then print out the sentence: "the number generated was x"
# we use the runif function to generate a value
# this function takes three arguments, n, min and max. 
# n is the number of numbers we want to generate, min and max are the boundaries for the value
x <- runif(1, min = 0, max = 5)
# here we generate one number with a value between 0  and 5.
# Now we want to print out the sentence "the number generated was x"
# for this we are going to paste the text "the number generated was" and the value of x:
paste("The number generated was ", x)
[1] "The number generated was  4.79251309065148"
# the paste function can take as many argument as you want, it will paste all of them together
# Now if we want to print the result:
print(paste("The number generated was ", x))
[1] "The number generated was  4.79251309065148"
Exercises: Using paste() and print()
  1. Concatenating Text and Numbers:
    • Generate a random number between 4 and 87 using the runif() function in R or random.uniform() in Python. Print a sentence that says, “The generated number is x”, where x is the random number. Use a different function to generate an integer between 1 and 70 and print the result (you have to find the function yourself).
  2. Arithmetic Results in Sentences:
  • Create two variables a and b, with values of your choice. Calculate the sum, difference, and product of a and b, then print the results in complete sentences, e.g., “The sum of a and b is x”.
  1. Boolean Comparisons in Text:
  • Create two numbers, x and y. Write an expression that checks if x is greater than y. Print a sentence that says, “It is TRUE/FALSE that x is greater than y”, depending on the result of the comparison.

7. Exercices: Reading scripts

The following questions are exam-type question, where we ask you to explain what happens in a script. You will be required to explain each line of code.

7.1: Exercice 1

You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns:

calculate_energy_savings <- function(incandescent_wattage, led_wattage, hours_per_day, number_of_bulbs) {
  daily_energy_incandescent <- incandescent_wattage * hours_per_day * number_of_bulbs
  daily_energy_led <- led_wattage * hours_per_day * number_of_bulbs
  daily_energy_savings <- daily_energy_incandescent - daily_energy_led
  annual_energy_savings <- (daily_energy_savings * 365) / 1000
  return(annual_energy_savings)
}

incandescent_wattage <- 60  
led_wattage <- 10           
hours_per_day <- 5          
number_of_bulbs <- 10

annual_savings <- calculate_energy_savings(incandescent_wattage, led_wattage, hours_per_day, number_of_bulbs)

print(paste("The household saves", annual_savings, "kilowatt-hours annually by switching to LED bulbs."))

7.2: Exercice 2

You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns:

library(dplyr)

distance_km <- 30
emission_gasoline <- 150
emission_ev <- 10  
calculate_co2_reduction <- function(distance, emission_gasoline, emission_ev) {
  co2_gasoline <- distance * emission_gasoline
  co2_ev <- distance * emission_ev
  daily_reduction <- co2_gasoline - co2_ev
  annual_reduction <- daily_reduction * 260
  return(annual_reduction)
}
annual_co2_reduction <- calculate_co2_reduction(distance_km, emission_gasoline, emission_ev)
print(paste("By switching to an electric vehicle, the household reduces their annual CO₂ emissions by", annual_co2_reduction, "grams."))

7.3 Exercice 3

You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns:

calculate_recycling_savings <- function(plastic_kg, paper_kg, glass_kg, cost_per_kg) {
  total_plastic_reduced <- plastic_kg * 52
  total_paper_reduced <- paper_kg * 52
  total_glass_reduced <- glass_kg * 52
  savings_plastic <- total_plastic_reduced * cost_per_kg$plastic
  savings_paper <- total_paper_reduced * cost_per_kg$paper
  savings_glass <- total_glass_reduced * cost_per_kg$glass
  total_waste_reduced <- total_plastic_reduced + total_paper_reduced + total_glass_reduced
  total_savings <- savings_plastic + savings_paper + savings_glass
  return(list(total_waste_reduced = total_waste_reduced, total_savings = total_savings))
}
plastic_kg <- 1.5  
paper_kg <- 2.0   
glass_kg <- 0.8
cost_per_kg <- list(plastic = 0.2, paper = 0.15, glass = 0.1)
recycling_results <- calculate_recycling_savings(plastic_kg, paper_kg, glass_kg, cost_per_kg)
print(paste("The household reduces", recycling_results$total_waste_reduced, 
            "kg of waste annually and saves", recycling_results$total_savings, "in recycling costs."))

8. Supplementary training exercices