print("This is how we use the print function in R")[1] "This is how we use the print function in R"
For this first session, the goal for you is to learn the following operations:
| R | Example R | |
|---|---|---|
| Install a package | install.packages(“dplyr”) | |
| Load a package | library(dplyr) | |
| Create a variable | value <- 42 | |
| Write a comment | # This is a comment | |
| Understand the difference between text, values, and variables | plop <- “42” plop <- 42 plop <- fourtytwo | |
| Paste | paste(“the value is”, plop) | |
| print(paste(“the value is”, plop)) | ||
| Basic arithmetic | 42 + 48 | |
| Use a function | max(c(1,2,3,4,68)) | |
| Use a function from a script | ||
| Use logic operators | a != b | |
| Combine functions |
R comes with basic, build-in functions. This means that when you launch R you can use these functions directly. This is the case for the print function for example, but also basic mathematical operations max(), min(), mean().
Think of R as a basic toolbox. This toolbox comes with some essential tools, like a hammer (the print() function), a screwdriver (the len() function), and a measuring tape (the max() function). These tools are available right away because they’re built into the toolbox (R).
However, sometimes you need more specialized tools that aren’t in the basic toolbox. For example, if you want to build a bookshelf, you might need a power drill or a saw. These tools aren’t included by default in the basic toolbox. In the world of R, these extra tools are available is something called packages.
A package is like an extra toolbox full of new tools (functions) that someone else has created for a specific purpose. For example:
If you want to work with maps, you would need functions to create maps, functions for this are available in the maps package. If you want to make graphs, you might need the ggplot package. To use these packages you will need to download and install them first.
When you install a package, it’s like going to the store, buying the specialized toolbox, and adding it to your existing set of tools. Except that package in R are free. Once installed, you can use the new tools (functions) it provides.
Let’s see how we install packages in R:
We only have to install packages once. Once they are installed we only need to tell R that we want to use the functions from these packages. We do by loading the package. This let’s R know that you want to use the functions:
You only have to load the packages once when you start working. Reload is only necessary if you have quit R. Usually when you get the following error messages, this means that you did not load the package and therefore R cannot find the function you are trying to load:
The error message clearly states that the function we are trying to use cannot be found. This means that it’s time for you to search the package it is in and load it using library.
Install required packages in R:
Install the following packages: tidyverse, ggplot2, igraph
Manual Install of a package:
Install the NetworkIsLife Package.
A variable is like a container or a labeled storage box that holds data or information. In programming, we use variables to store values such as numbers, text, or more complex data, so that we can easily refer to, and manipulate them throughout our code.\
Let’s take a real world analogy:
Imagine you’re working at a lab bench or a kitchen counter:
In R, it’s the same idea:
You’ve created a variable called sugar, and you’ve stored the number 250 in it.
If later you need to calculate how much sugar you’ll need for 3 recipes, you can do:
R looks inside the “sugar” jar, finds 250, and does the math for you, no need to retype or remember the number. The example may seem trivial, but we use this to store entire data frames with millions of lines. Using the same logic we don’t have to recreate the data frame every time we want to modify or add something to it.
Why do we create variables?
Creating the variables a, b and result, allows us to use them later in the code. Beyond simple numerical values, other variables are created in an identical manner. In R, using “<-” assigns the value/object on the right to the variable on the left.
What can be done with programming has no bounds. Many people before us have been programming and written code that do amazing things. The beauty of the opensource environments is that this work is shared with us in the form of packages and functions. A function is basically a script that executes a very specific task. For example the print() function will print out whatever you put between parentheses. Since this function exists, you don’t have to worry about writing a script that does this. The same is true for mathematical operations. You can simply use the max() function without having to write a script that searches through data to find the max yourself. Let’s see how this works.
Using a function always works in the same way in R, the construction is always:
So in matrix(nrow = 10, ncol = 3), matrix is the function, nrow and ncol are the arguments. From the names you might be able to guess what this does, this function creates a matrix, the parameters define the dimensions of the matrix. nrow = 10 means that the matrix will have 10 rows. ncol = 3 means that the matrix will have 3 columns.
When we run this code you will see that a 3x3 matrix is created:
Let’s make this more interesting and create a matrix full of 3’s. For this we need to add an additional argument:
m <- matrix(nrow = 3, ncol = 3, 3) # create and store the matrix in a variable
m # display the matrix [,1] [,2] [,3]
[1,] 3 3 3
[2,] 3 3 3
[3,] 3 3 3
Now let’s multiply this matrix by 3
Because we stored the matrix in a variable “m”, we don’t have to create it again, we can just call it and apply a multiplication. Note that now that we have stored the multiplication in the same variable the matrix with 3’s is destroyed and replaced with the matrix with 9’s.
Let’s illustrate this with the example of a matrix. We first create a matrix:\
Small_matrix <- matrix(0,nrow = 3, ncol = 3)
# this creates a variables called Small_matrix that is an empty 3x3 matrix.
print(Small_matrix) [,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
In the examples above, we create a new variables small_matrix which contains a matrix. We can then use this later by referring to the name of the object. Below we add 3 to each cell of the matrix:
By writing the same name on the left and righ, the object Small_matrix is replaced by Small_matrix + 3 which means that we cannot go back to the initial Small_matrix filled with 0’s.
If we want to keep both variables, we need can create a new variables:
When we work with data, we often need to perform multiple tasks in a row before we have data that is actually usable. For example, imagine that you have a database with emission levels for the world. You’re doing an analysis on Europe so you would have to extract only European countries. Then you might want to transform the format of the data, round up some numbers, compute percentage, growth rates etc. We can combine multiple functions in one line of code.\
The following creates a matrix and then transforms the matrix into a dataframe.
An alternative exists in base R to make this easier to read. We can use what is called a pipe: “|>”:
This approach can make scripts more readable when many transformations are applied at the same time.
This line creates the matrix and then applies the as.data.frame() function. Equivalent to this is the pipe that we find in the tidyverse:
The tidyverse is often found in data science with R manuals.
Create a Matrix:
Write a code to create a 4x4 matrix filled with zeros. What is the variable’s name, and what type of object does it store?
Variable Assignment:
Assign the number 10 to a variable named my_number. What happens if you assign a new value, for example 20, to the same variable afterward?
Matrix Operations:
5 to each element in the matrix you created in question 1.Practice chaining functions
While packages offer a whole universe of functions, it often happens that we write our own for specific tasks. Imagine a case where you’re working on a big dataset in and you have to apply the same type of operation multiple times. This can result in a very long script in which you have repeat the same lines of code multiple times. This makes everything difficult to read. In these cases it can be efficient to write your own function and use one line of code instead.\
We do not expect you to make your own functions. We only want you to be able to identify custom functions in a script.
In R, a function is defined by the function…function…let me clarify:
We create here a new function called my_own_function, which has one argument. Everything between brackets is the code that is executed when the function is called.
For example, we could make a function that returns a greeting when supplied with a name. In other words, when you enter a name, the function will return “Hi, name”:
We create a function called “greet” which takes one argument which is the name. It then returns the text “Hi, name”.
If you see a function in a script, all you have to do is run the code once. This will load the function, just like the import function. You are then ready to use the function.
You can then use this function just like you would a function from a package:
A variable is like a labeled container that holds information (data). You give this container a name, and you can store anything inside it—like numbers, words, or even more complex data. An important distinction to make is the difference between numerical values and textual values. When we create a variable and a assign a number, this means we can later use this for mathematical operations. This needs to be differentiated from assigning the textual value “42”.
Any value between “ is understood as text by python and R. This means that if we try to multiply variable by a number, we get an error in R and an unexpected output in Python.
A binary operator is for example +, -, /. When we try to use non-numeric values (here, “42”) then we get this error message. In python, the output is not an error, the string is simply multiplied literally, resulting in a new string that is twice the previous string: “4242” (which is not a number, still a string).
It’s good practice, especially when loading data from an external source for the first time, to check the format of the data, i.e ensuring that what you want to be numbers are numbers, and what you want to be text is indeed…text. Below we show you how to check for a specific format. You can use this when you want to check that the data is in the expected format.
When you don’t know what the format of the data is, or you are getting frustrated because a function is not working for some reason, you can also ask what the format is directly. These functions will return the format of the data, allowing you to check directly what you are working with:
int)5, -42, 100float)3.14, -0.001, 42.0str)"Hello, world!", 'Python'bool)True or False. Booleans are often used in conditions and logical operations.True, Falsebool) type, where logical conditions return True or False.a == b might evaluate to True or False.Check numeric values:
Assign numeric values to two variables. You can name them however you want and assign any numerical value. Check the format of the variable of this variable.
Check textual values:
Assign a textual value to 2 variables. You can name them however you want and assign any textual value. Check the format of the variable.
Transform the values: We now want to transform the values to another format. It often happens that when we download data and load it into python/R that the format is not understood correctly. Use the str() function in python to transform your number into text, then use the isinstance() function to check that it worked. Do the same in R, with the as.character() function and the is.character() function.
Logical operators are commonly used in data analysis, especially when sub-setting datasets. For example when we want to extract documents that are from the year 2000 which have the term “sustainability” and the term “climate change” but not the term “fossil fuel”. Combining these operators is important, and so is understanding how they work.
Some differences between R and Python become apparent here. In R, TRUE and FALSE must be written in all caps to be recognised as the logical operator. In Python, True and False must start with a capitalized letter. or, and, not should also be written exactly in this manner.If these operators are written differently, they will be recognized as objects.
In data analysis, we usually use operators to subset data. This means that we compare a variable to a value to check if it fits our criteria. For example, if we have a column that contains a year, and we only want observations with the year 2003, we will search for year == 2003. In this setting the R operators we just described will be the same. It is possible that these operators vary when different packages are used in python. For instance, in the context of the pandas package, and becomes &, or becomes |, not becomes ~. We will address these variations in the database manipulation chapter.
x and y, with different values. Write an expression that checks if x is not equal to y, and print the result. Experiment by making x and y equal and see how the output changes.a and b, and assign them any numbers you like. Write an expression that checks if a is equal to b, and print the result. Try changing the values of a and b to see when the result is True and when it’s False.The print function is crucial in programming because it allows developers to display information to the user or themselves during the development process. It provides immediate feedback, helping programmers debug code by checking the values of variables, verifying the flow of execution, or ensuring that certain conditions are met. Without print, it would be challenging to observe how the program behaves internally, making it a vital tool for both learning and real-world software development.
A very basic example of this is printng out the value of a variable in the context of what it represents. Imagine that we are working on a project in which we start with some raw data, that we clean, step by step before we can start using it for a regression. In this process we remove missing values, outdates values, we might removes some regions etc. To make sure we don’t remove to much data, or even just to be sure we don’t make a mistake, we can decide to print out some information at different steps. The print function allows us to print out this type of information:
We can make things more interesting by combining values and text. For this we use the paste function in R and the “+” operator in python:

# lets create a value at random and then print out the sentence: "the number generated was x"
# we use the runif function to generate a value
# this function takes three arguments, n, min and max.
# n is the number of numbers we want to generate, min and max are the boundaries for the value
x <- runif(1, min = 0, max = 5)
# here we generate one number with a value between 0 and 5.
# Now we want to print out the sentence "the number generated was x"
# for this we are going to paste the text "the number generated was" and the value of x:
paste("The number generated was ", x)[1] "The number generated was 4.79251309065148"
# the paste function can take as many argument as you want, it will paste all of them together
# Now if we want to print the result:
print(paste("The number generated was ", x))[1] "The number generated was 4.79251309065148"
paste() and print()
runif() function in R or random.uniform() in Python. Print a sentence that says, “The generated number is x”, where x is the random number. Use a different function to generate an integer between 1 and 70 and print the result (you have to find the function yourself).The following questions are exam-type question, where we ask you to explain what happens in a script. You will be required to explain each line of code.
You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns:
calculate_energy_savings <- function(incandescent_wattage, led_wattage, hours_per_day, number_of_bulbs) {
daily_energy_incandescent <- incandescent_wattage * hours_per_day * number_of_bulbs
daily_energy_led <- led_wattage * hours_per_day * number_of_bulbs
daily_energy_savings <- daily_energy_incandescent - daily_energy_led
annual_energy_savings <- (daily_energy_savings * 365) / 1000
return(annual_energy_savings)
}
incandescent_wattage <- 60
led_wattage <- 10
hours_per_day <- 5
number_of_bulbs <- 10
annual_savings <- calculate_energy_savings(incandescent_wattage, led_wattage, hours_per_day, number_of_bulbs)
print(paste("The household saves", annual_savings, "kilowatt-hours annually by switching to LED bulbs."))You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns:
library(dplyr)
distance_km <- 30
emission_gasoline <- 150
emission_ev <- 10
calculate_co2_reduction <- function(distance, emission_gasoline, emission_ev) {
co2_gasoline <- distance * emission_gasoline
co2_ev <- distance * emission_ev
daily_reduction <- co2_gasoline - co2_ev
annual_reduction <- daily_reduction * 260
return(annual_reduction)
}
annual_co2_reduction <- calculate_co2_reduction(distance_km, emission_gasoline, emission_ev)
print(paste("By switching to an electric vehicle, the household reduces their annual CO₂ emissions by", annual_co2_reduction, "grams."))You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns:
calculate_recycling_savings <- function(plastic_kg, paper_kg, glass_kg, cost_per_kg) {
total_plastic_reduced <- plastic_kg * 52
total_paper_reduced <- paper_kg * 52
total_glass_reduced <- glass_kg * 52
savings_plastic <- total_plastic_reduced * cost_per_kg$plastic
savings_paper <- total_paper_reduced * cost_per_kg$paper
savings_glass <- total_glass_reduced * cost_per_kg$glass
total_waste_reduced <- total_plastic_reduced + total_paper_reduced + total_glass_reduced
total_savings <- savings_plastic + savings_paper + savings_glass
return(list(total_waste_reduced = total_waste_reduced, total_savings = total_savings))
}
plastic_kg <- 1.5
paper_kg <- 2.0
glass_kg <- 0.8
cost_per_kg <- list(plastic = 0.2, paper = 0.15, glass = 0.1)
recycling_results <- calculate_recycling_savings(plastic_kg, paper_kg, glass_kg, cost_per_kg)
print(paste("The household reduces", recycling_results$total_waste_reduced,
"kg of waste annually and saves", recycling_results$total_savings, "in recycling costs."))