Week 1: The programming basics

0. Overview

For this first session, the goal for you is to learn the following operations:

	Python	Example Python
Install a package		pip install pandas
Load a package		import pandas as pd
Create a variable		value = 42
Difference between text, values, and variables		plop = “42” plop = 42 plop = fourtytwo
Paste		f”The value is {plop}”
Print		print(f”The value is {plop}“)
Basic arithmetics		42 + 48
Use a function		max([1, 2, 3, 4, 68])
Use a function from a script
Use logic operators		a != b

1. Loading and installing libaries

R and Python come with basic, build-in functions. This means that when you launch R/Python you can use these functions directly. This is the case for the print function for example.

print("This is how we use the print function in Python")

Think of Python as a basic toolbox. This toolbox comes with some essential tools, like a hammer (the print() function), a screwdriver (the len() function), and a measuring tape (the max() function). These tools are available right away because they’re built into the toolbox (Python).

However, sometimes you need more specialized tools that aren’t in the basic toolbox. For example, if you want to build a bookshelf, you might need a power drill or a saw. These tools aren’t included by default in the basic toolbox. In the world of Python, these extra tools are called packages.

A package is like an extra toolbox full of new tools (functions) that someone else has created for a specific purpose. For example:

If you want to work with data in tables (like Excel), you’d install the pandas package. If you want to make graphs, you might need the matplotlib package. When you install a package, it’s like going to the store, buying the specialized toolbox, and adding it to your existing set of tools. Once installed, you can use the new tools (functions) it provides.

Let’s see how we install packages in R and python:

pip install pandas

We only have to install packages once. Once they are installed we only need to tell Python that we want to use the functions from these packages. We do this in the following way:

import pandas

It is common practice to give short names to specific packages. When using functions from packages we need to refer to the package they come from: pandas.function(). With long package names this becomes a bit annoying, so we give the package a short reference name. For pandas we usually use “pd” so we can write pd.function(). We set the name when importing the package in the following way:

import pandas as pd

You only have to load the packages once when you start working. Reload is only necessary if you have quit Python. Usually when you get the following error messages, this means that you did not load the package and therefore R/python cannot find the function you are trying to load:

df = pd.Dataframe()
NameError: name 'pd' is not defined

Installing packages in Jupyter

In Jupyter, installing a package can be done as described above. However, the cell needs to be empty and only contain the pip command. If you want to add comments, or install multiple packages, use: !pip install pandas.

The ! tells jupyter that you are running a system command not python code. You can then use

# Installing packages
!pip install pandas
!pip install numpy

Exercises part 1:

Install required packages in Python: Install the following packages in Python: pandas, numpy, matplotlib
Load the packages Load the packages you just installed in Jupyter
Reload the Kernel Restart the kernel after installing

2. Create a variable

A variable is like a container or a labeled storage box that holds data or information. In programming, we use variables to store values—such as numbers, text, or more complex data—so that we can easily refer to and manipulate them throughout our code.

Why do we create variables?

Convenience: Instead of typing the same value repeatedly, we store it in a variable and use the variable’s name. For example, if you store a value like 42 in a variable called age, you can use age throughout your program instead of the number 42.
Flexibility: Variables allow us to use information that may change over time. If you store a value in a variable, you can update the variable’s content later without having to rewrite your entire program. For example, a variable temperature could hold the temperature of a room, which you might update as the temperature changes.
Reusability: Once you’ve stored a value in a variable, you can reuse it as many times as you want in your program. This avoids repetition and makes the code cleaner.

# without variables
print(20 + 30)

# With variables

a = 20
b = 30
result = a + b
print(result)

Creating the variables a, b and result, allows us to use them later in the code. Beyond simple numerical values, other variables are created in an identical manner. In R, using “<-” assigns the value/object on the right to the variable on the left. In python the logic is the same, but we use the “=” operator.

# In python we first need to load a package to use a function to create a matrix
# We load numpy
import numpy as np

# Then we create a matrix called "small_matrix"
# This matrix is created by the "zeros" function from the np (numpy) package
small_matrix = np.zeros((3, 3))
print(small_matrix)

In the examples above, we create a new variables small_matrix which contains a matrix. We can then use this later by referring to the name of the object. Below we add 3 to each cell of the matrix:

Small_matrix = Small_matrix + 3

By writing the same name on the left and righ, the object Small_matrix is replaced by Small_matrix + 3 which means tha we cannot go back to the initial Small_matrix filled with 0’s.

If we want to keep both variables, we need can create a new variables:

Small_matrix_3 = Small_matrix + 3

This creates a new object with the name Small_matrix_3.

Important:

What is the difference between a variable and an object? Why are small_matrix and Small_matrix_3 considered both a variable and an object?

Explanation:

A variable is a name used to store data. It acts like a label or container for a value. For example, when we write small_matrix = np.zeros((3, 3)), small_matrix is a variable that holds the matrix.
An object is an instance of a class in Python. In this case, the value stored in small_matrix is an object of the numpy.ndarray class. This object has properties (like its shape) and methods (like matrix operations) that you can use.

Therefore, small_matrix is a variable (a name for storing data) and also an object (the data itself, which is a matrix object).

Exercises part 2:

Create a Matrix:
Write a code to create a 4x4 matrix filled with zeros. What is the variable’s name, and what type of object does it store?
Variable Assignment:
Assign the number 10 to a variable named my_number. What happens if you assign a new value, for example 20, to the same variable afterward? Explain the behavior of variables in Python.
Matrix Operations:
3.1 Perform an operation that adds 5 to each element in the matrix you created in question 1. 3.2 Multiply the matrix by itself.

3. Text, values and variables

A variable is like a labeled container that holds information (data). You give this container a name, and you can store anything inside it—like numbers, words, or even more complex data. An important distinction to make is the difference between numerical values and textual values. When we create a variable and a assign a number, this means we can later use this for mathematical operations. This needs to be differentiated from assigning the textual value “42”.

# assigning a number
variable_numeric = 42 
# assigning text
variable_text = "42"

Any value between “ is understood as text by Python. This means that if we try to multiply variable by a number, we get an unexpected output in Python.

variable_text = "42"
variable_text * 2
'4242'

A binary operator is for example +, -, /. When we try to use non-numeric values (here, “42”) then we get this error message. In python, the output is not an error, the string is simply multiplied literlly, resulting in a new string that is twice the previous sring: “4242” (which is not a number, still a string).

It’s good practice, especially when loading data from an external source for the first time, to check the format of the data, i.e ensuring that what you want to be numbers are numbers, and what you want to be text is indeed…text. Below we show you how to check for a specific format. You can use this when you want to check that the data is in the expected format.

variable_text = "42"
# check if the variable is numeric:
isinstance(variable_text, float)
# check if the variable is text:
isinstance(variable_text, str)

When you don’t know what the format of the data is, or you are getting frustrated because a function is not working for some reason, you can also ask what the format is directly. These functions will return the format of the data, allowing you to check directly what you are working with:

type(data)

Data types

1. Integer (`int`)

Represents whole numbers, both positive and negative, without decimals.
Example: 5, -42, 100

2. Float (`float`)

Represents numbers that have a decimal point. It can store both positive and negative decimal numbers.
Example: 3.14, -0.001, 42.0

3. String (`str`)

Represents a sequence of characters (text), typically enclosed in single or double quotes.
Example: "Hello, world!", 'Python'

4. Boolean (`bool`)

Represents a value that is either True or False. Booleans are often used in conditions and logical operations.
Example: True, False

5. Logical (This is equivalent to Boolean in Python)

In Python, logical values are handled by the boolean (bool) type, where logical conditions return True or False.
Example: a == b might evaluate to True or False.
If a = 5 & b = 5 then a == b will be True

Exercises part 3:

Check numeric values:
1.1 Assign numeric values to two variables. You can name them however you want and assign any numerical value. 1.2 Check the format of the variable of this variable.
Check textual values:
2.1 Assign a textual value to 2 variables. You can name them however you want and assign any textual value. 2.2 Check the format of the variable. 2.3 Multiply the variable by the number 2, what error did you get?
Transform the values: We now want to transform the values to another format. It often happens that when we download data and load it into python that the format is not understood correctly. Number are for example often misunderstood as text. Use the str() function in python to transform your number into text, then use the isinstance() function to check that it worked.

4. Logic operators

Logical operators are commonly used in data analysis, especially when sub-setting datasets. For example we want to extract documents that are from the year 2000 which have the term “sustainability” and the term “climate change”. In this scenario we would need to check in the data that the year is 2000 (Year == 2000), and that both terms are in the data (climate change and sustainability). Combining these operators is important, and so is understanding how they work.

Pay Attention

In Python, True and False must start with a capitalized letter. or, and, not should also be written exactly in this manner. If these operators are written differently, they will be recognized as objects.

x = 4
y = 8
# Equal (==)
x == y
# And 
x == 4 and y == 8

# Or
x == 4 or y == 8

# Not
not x

# Combine
z = "plop"
x == 4 and (y == 8 or z == "plop")

In data analysis, we usually use operators to subset data. This means that we compare a variable to a value to check if it fits our criteria. For example, if we have a column that contains a year, and we only want observations with the year 2003, we will search for year == 2003. In this setting the R operators we just described will be the same. It is possible that these operators vary when different packages are used in python. For instance, in the context of the pandas package, and becomes &, or becomes |, not becomes ~. We will address these variations in the database manipulation chapter.

Exercises part 4:

Equality Check:
- Create two variables, x and y, with different values. Write an expression that checks if x is not equal to y, and print the result. Experiment by making x and y equal and see how the output changes.
Inequality Check:
- Create two variables, a and b, and assign them any numbers you like. Write an expression that checks if a is equal to b, and print the result. Try changing the values of a and b to see when the result is True and when it’s False.
Multiple Conditions:
- Create two variables, temperature and humidity. Set temperature to a value greater than 20 and humidity to a value less than 50. Write an expression that checks if temperature is both greater than 20 and humidity is less than 50, and print the result.

5. The print and paste functions

The print function is crucial in programming because it allows developers to display information to the user or themselves during the development process. It provides immediate feedback, helping programmers debug code by checking the values of variables, verifying the flow of execution, or ensuring that certain conditions are met. Without print, it would be challenging to observe how the program behaves internally, making it a vital tool for both learning and real-world software development.

A very basic example of this is printng out the value of a variable in the context of what it represents. Imagine that we are working on a project in which we start with some raw data, that we clean, step by step before we can start using it for a regression. In this process we remove missing values, outdates values, we might removes some regions etc. To make sure we don’t remove to much data, or even just to be sure we don’t make a mistake, we can decide to print out some information at different steps. The print function allows us to print out this type of information:

plop = 42
# print a numeric value
print(plop)
# print text, remember to use "" for text
print("Step 3 is done")

We can make things more interesting by combining values and text. For this we use the “+” operator in python:

import random

# Generate a random value between 0 and 5 using random.uniform()
x = random.uniform(0, 5)

# Print out the sentence "The number generated was x" using an f-string (formatted string)
# to print a combination of text and variables, put an "f" before the string
# put the name of the variable between {}:
message = f"The number generated was  {x}"
print(message)

Exercises: Using paste() and print()

Concatenating Text and Numbers:
- Generate a random number between 4 and 87 using the random.uniform() function in Python. Print a sentence that says, “The generated number is x”, where x is the random number. Use a different function to generate an integer between 1 and 70 and print the result (you have to find the function yourself).
Arithmetic Results in Sentences:

Create two variables a and b, with values of your choice. Calculate the sum, difference, and product of a and b, then print the results in complete sentences, e.g., “The sum of a and b is x”.

Boolean Comparisons in Text:

Create two numbers, x and y. Write an expression that checks if x is greater than y. Print a sentence that says, “It is True/False that x is greater than y”, depending on the result of the comparison.

6. Using functions

R and python can be quite different when it comes to using functions. We will discuss them separately.

6.1 Using existing functions

When you see something between parentheses in Python, it usually indicates:

You’re calling a function, and the value between parentheses is an argument being passed to the function. For example, print(“Hello”) calls the print() function with the argument “Hello”.

In Python, dot notation (.) is used to access attributes or methods (functions) of objects. For example, numpy is a package that provides arrays and various mathematical functions, and you access specific functionality within numpy using dot notation. For example:

import numpy as np
# Create a NumPy array
data = np.array([1, 2, 3, 4])
# Access the shape of the array using dot notation
print(data.shape)  # Outputs the shape (dimensions) of the array

In this case, data is a NumPy array object, and data.shape is an attribute (a property) that tells us the dimensions of the array. The dot (.) is used to access this attribute from the data object.

Similarly, if you call a method of an object (a function that belongs to the object), you also use dot notation with parentheses:

# Using a method to reshape the array
data_reshaped = data.reshape(2, 2)  # Calls the reshape method to change the array dimensions

np.array(): This is a function from the numpy module that creates an array.
data.shape: This accesses an attribute of the array data (no parentheses because it’s an attribute, not a function).
data.reshape(): This calls a method that modifies the array (parentheses are used because it’s a function).

Important:

Parentheses are used to call functions, and the values inside them are arguments.
Dot notation is used to access methods or attributes of an object, like using np.array() to create a NumPy array or data.shape to get the dimensions of an array.
Methods (functions that belong to objects) also use parentheses, but attributes (like data.shape) don’t.

6.2 Working with your own functions

We do not expect you to make your own functions. We only want you to be able to identify costum functions in a script.

In Python, functions are defined using the def keyword, followed by the function name, parentheses (), and a colon :. Inside the parentheses, you can specify any input parameters (arguments) that the function will take. The code that the function executes is indented, and you can use the return statement to send back a result:

def function_name(parameters):
    # Code block (indented)
    return result

For example, we could make a function that returns a greeting when supplied with a name. In other words, when you enter a name, the function will return “Hi, name”:

def greet(name):
    return f"Hi, {name}!"

We create a function called “greet” which takes one argument which is the name. It then returns the text “Hi, name”.

If you see a function in a script, all you have to do is run the code once. This will load the function, just like the import function. You are then ready to use the function.

7. Exercices: Reading scripts

These are questions that you might encounter in the remindo exam.

7.1 Exercice 1 -

You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns.

distance_by_car = 50  # Distance traveled by car in km
distance_by_bus = 30  # Distance traveled by bus in km
distance_by_bike = 10  # Distance traveled by bike in km
emission_car = 120  # CO₂ per km for a car
emission_bus = 68   # CO₂ per km for a bus
emission_bike = 0   # CO₂ per km for a bike
total_emission_car = distance_by_car * emission_car
total_emission_bus = distance_by_bus * emission_bus
total_emission_bike = distance_by_bike * emission_bike
total_emission = total_emission_car + total_emission_bus + total_emission_bike
print(f"The total CO₂ emissions for the trip are {total_emission} grams.")

7.2 Exercice 2

You are provided with a script, your aim is to explain the script. For each line of code, comment what the line of code does and then explain what the function does and returns:

def calculate_energy_savings(incandescent_wattage, led_wattage, hours_per_day, number_of_bulbs):
    daily_energy_incandescent = incandescent_wattage * hours_per_day * number_of_bulbs
    daily_energy_led = led_wattage * hours_per_day * number_of_bulbs
    daily_energy_savings = daily_energy_incandescent - daily_energy_led
    annual_energy_savings = (daily_energy_savings * 365) / 1000
    return annual_energy_savings
  
incandescent_wattage = 60  
led_wattage = 10           
hours_per_day = 5          
number_of_bulbs = 10      
annual_savings = calculate_energy_savings(incandescent_wattage, led_wattage, hours_per_day, number_of_bulbs)
print(f"The household saves {annual_savings} kilowatt-hours annually by switching to LED bulbs.")

7.3 Exercice 3