+ - 0:00:00
Notes for current slide
Notes for next slide

COMP35112 Chip Multiprocessors



Parallel Programming Using Shared Memory

Pierre Olivier

1 / 42

Threads

2 / 42

Processes

  • A program executing on the CPU runs as a process
  • With virtual memory, the process gets the illusion it has access to 100% of the memory
    • Address space
3 / 42
  • When a program executes on the CPU, on top of the operating system, it runs as an entity named a process
  • Using virtual memory, the OS gives to the process the illusion that it can access all the memory, so each process in the system is free to use any address in the lower half of the 64 bits address space, the upper one being reserved for the OS
  • That area of memory that the process can address is named the address space

Processes

  • A program executing on the CPU runs as a process
  • With virtual memory, the process gets the illusion it has access to 100% of the memory
    • Address space
  • Two programs: 2 different processes, i.e. 2 different address spaces
4 / 42
  • When a program executes on the CPU, on top of the operating system, it runs as an entity named a process
  • Using virtual memory, the OS gives to the process the illusion that it can access all the memory, so each process in the system is free to use any address in the lower half of the 64 bits address space, the upper one being reserved for the OS
  • That area of memory that the process can address is named the address space
  • Two processes have two separate address spaces
  • For example here process A and B both have some data at address 42, however that data is different

Processes

  • A program executing on the CPU runs as a process
  • With virtual memory, the process gets the illusion it has access to 100% of the memory
    • Address space
  • Two programs: 2 different processes, i.e. 2 different address spaces
5 / 42
  • When a program executes on the CPU, on top of the operating system, it runs as an entity named a process
  • Using virtual memory, the OS gives to the process the illusion that it can access all the memory, so each process in the system is free to use any address in the lower half of the 64 bits address space, the upper one being reserved for the OS
  • That area of memory that the process can address is named the address space
  • This is because the virtual address 0x42 for both processes is mapped to different physical location in the page table
  • Indeed each process has its own page table.
  • To put things simply, 1 process == 1 virtual address space == 1 page table.
  • The OS also makes sure that processes cannot address anything outside of their address space
  • To leverage parallelism, two processes, i.e. two programs, can run on two different cores of a multicore processor
  • However, how can we use parallelism within the same program?

Processes

  • A program executing on the CPU runs as a process
  • With virtual memory, the process gets the illusion it has access to 100% of the memory
    • Address space
  • Two programs: 2 different processes, i.e. 2 different address spaces

Can run 2 independent processes in parallel on a multicore, but how to parallelise a single program?

5 / 42
  • When a program executes on the CPU, on top of the operating system, it runs as an entity named a process
  • Using virtual memory, the OS gives to the process the illusion that it can access all the memory, so each process in the system is free to use any address in the lower half of the 64 bits address space, the upper one being reserved for the OS
  • That area of memory that the process can address is named the address space
  • This is because the virtual address 0x42 for both processes is mapped to different physical location in the page table
  • Indeed each process has its own page table.
  • To put things simply, 1 process == 1 virtual address space == 1 page table.
  • The OS also makes sure that processes cannot address anything outside of their address space
  • To leverage parallelism, two processes, i.e. two programs, can run on two different cores of a multicore processor
  • However, how can we use parallelism within the same program?

Threads

  • A thread is a sequence of instructions executed on a CPU core
  • A process consists of one or more threads
  • Each process in the system has an address space
  • All threads in the same process share the same address space
6 / 42
  • A thread is a flow of control executing a program
  • It is a sequence of instructions executed on a CPU core
  • The nice thing with threads is that a process can consist of one but also multiple threads
  • In other words, we can have for a single program several execution flows running on different cores and sharing a single address space

Threads

  • A thread is a sequence of instructions executed on a CPU core
  • A process consists of one or more threads
  • Each process in the system has an address space
  • All threads in the same process share the same address space
7 / 42
  • A thread is a flow of control executing a program
  • It is a sequence of instructions executed on a CPU core
  • The nice thing with threads is that a process can consist of one but also multiple threads
  • In other words, we can have for a single program several execution flows running on different cores and sharing a single address space

Threads

  • A thread is a sequence of instructions executed on a CPU core
  • A process consists of one or more threads
  • Each process in the system has an address space
  • All threads in the same process share the same address space
8 / 42
  • A thread is a flow of control executing a program
  • It is a sequence of instructions executed on a CPU core
  • The nice thing with threads is that a process can consist of one but also multiple threads
  • In other words, we can have for a single program several execution flows running on different cores and sharing a single address space
  • In our example process A has 3 threads, and they all see the green and orange data.
  • Process B has 4 threads, and they see the same red and yellow data

Threads

  • A thread is a sequence of instructions executed on a CPU core
  • A process consists of one or more threads
  • Each process in the system has an address space
  • All threads in the same process share the same address space

Threads communicate using shared memory

9 / 42
  • A thread is a flow of control executing a program
  • It is a sequence of instructions executed on a CPU core
  • The nice thing with threads is that a process can consist of one but also multiple threads
  • In other words, we can have for a single program several execution flows running on different cores and sharing a single address space
  • Seeing the same address space is very convenient for communications, that can be achieved through global variables or pointers to anywhere in the address space

Threads

  • Can program with threads in a plethora of languages:
    • C/C++/Fortran using the POSIX threads library (Pthreads)
    • Java
    • Python
    • Go
    • Haskell
    • Rust
    • etc.
10 / 42

Threads

  • Can program with threads in a plethora of languages:
    • C/C++/Fortran using the POSIX threads library (Pthreads)
    • Java
    • Python
    • Go
    • Haskell
    • Rust
    • etc.
  • Thread support in higher-level languages generally builds upon pthreads
    • Most of these languages are built in or at least contain some C/C++ elements
10 / 42
  • Threads can be created in many languages, including C/Fortran using the POSIX threads (Pthread) library
  • But also Java, as we will see in detail in this course
  • And many other languages

Threads in C/C++ with Pthread

  • POSIX thread library
  • Expose a standardised C/C++ API to create and manage multiple threads within a process
  • Standard program: OS creates automatically a single thread
11 / 42

Threads in C/C++ with Pthread

  • Use pthread_create() to create and launch a thread
    • Indicate as parameters:
      • Which function the thread should run
      • Optionally what should be passed as parameter to this function
12 / 42

Threads in C/C++ with Pthread

  • Use pthread_create() to create and launch a thread
    • Indicate as parameters:
      • Which function the thread should run
      • Optionally what should be passed as parameter to this function
  • pthread_exit() to have the calling thread exit
13 / 42

Threads in C/C++ with Pthread

  • Pthread -- POSIX thread library
  • Use pthread_create() to create and launch a thread
    • Indicate as parameters:
      • Which function the thread should run
      • Optionally what should be passed as parameter to this function
  • pthread_exit() to have the calling thread exit
  • pthread_join() to wait for another thread to finish
14 / 42

Threads in C/C++ with Pthread

  • A good chunk of this course, including labs 1 and 2, will focus on shared memory programming in C/C++ with pthreads
  • man pthread_*1 and Google “pthreads” for lots of documentation

1 First install the relevant development man pages:

sudo apt install manpages-dev manpages-posix manpages-posix-dev
15 / 42

Threads in C/C++ with Pthread

// Compile with:
// gcc pthread.c -o pthread -lpthread
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h> // pthread header
#define NOWORKERS 5
// Function executed by all threads
void *thread_fn(void *arg) {
int id = (int)(long)arg;
printf("Thread %d running\n", id);
pthread_exit(NULL); // exit
// never reached
}
int main(void) {
// Each thread is controlled through a
// pthread_t data structure
pthread_t workers[NOWORKERS];
// Create and launch the threads
for(int i=0; i<NOWORKERS; i++)
if(pthread_create(&workers[i], NULL,
thread_fn, (void *)(long)i)) {
perror("pthread_create");
return -1;
}
// Wait for threads to finish
for (int i = 0; i < NOWORKERS; i++)
if(pthread_join(workers[i], NULL)) {
perror("pthread_join");
return -1;
}
printf("All done\n");
}

03-shared-memory-programming/pthread.c

16 / 42

Threads in C/C++ with Pthread

17 / 42

Threads in C/C++ with Pthread

18 / 42

Threads in Java

class MyThread extends Thread { // create a class that inheritates from Thread
int id;
MyThread(int id) { this.id = id; }
// override the method run() to define the code executed by the thread:
public void run() { System.out.println("Thread " + id + " running"); }
}
class Main {
public static void main(String[] args) {
int NOWORKERS = 5;
MyThread[] threads = new MyThread[NOWORKERS];
for (int i = 0; i < NOWORKERS; i++)
threads[i] = new MyThread(i);
for (int i = 0; i < NOWORKERS; i++)
threads[i].start(); // start executing the threads
for (int i = 0; i < NOWORKERS; i++)
try { threads[i].join(); } catch (InterruptedException e) { /* do nothing */ }
System.out.println("All done");
}
}
// compile and launch with:
//javac java-thread.java && java Main

03-shared-memory-programming/java-thread.java

19 / 42
  • Let's see an example
  • We have the MyThread class inheriting from Thread
  • In the constructor we just initialized an identifier integer member variable
  • And we implement the run method, it just prints the fact that the thread is running
  • In the main function we create an array of objects from this class
  • Each is given a different id
  • Then we launch them all in a loop with start, and we wait for each to finish with join

Threads in Python

#!/usr/bin/python3
import threading
NOWORKERS=5
def print_hello(id):
print("Thread " + str(id) + " running")
threads = []
for i in range(NOWORKERS):
thread = threading.Thread(target=print_hello, args=(i,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All done.")

03-shared-memory-programming/python.py

20 / 42

Threads in Rust

use std::thread;
const NOWORKERS: u32 = 5;
fn print_hello(id: u32) {
println!("Thread {} running", id);
}
fn main() {
let mut handles = vec![];
for i in 0..NOWORKERS {
let handle = thread::spawn(move || { print_hello(i); }); // `move ||` transfers
handles.push(handle); // ownership of i inside
} // the closure
for handle in handles {
handle.join().unwrap(); // unwrap panics if the thread's execution returns an error
}
println!("All done.");
}
// With Rust toolchain installed, compile with:
// rustc <source file>

03-shared-memory-programming/rust.rs

21 / 42

Examples Output

For all examples:

Thread 1 running
Thread 0 running
Thread 2 running
Thread 4 running
Thread 3 running
All done
  • No control over the order of execution!
    • The OS scheduler decides, it's nondeterministic
22 / 42

Examples Output

For all examples:

Thread 1 running
Thread 0 running
Thread 2 running
Thread 4 running
Thread 3 running
All done
  • No control over the order of execution!
    • The OS scheduler decides, it's nondeterministic

A possible scheduling scenario:

22 / 42

Examples Output

For all examples:

Thread 1 running
Thread 0 running
Thread 2 running
Thread 4 running
Thread 3 running
All done
  • No control over the order of execution!
    • The OS scheduler decides, it's nondeterministic

A possible scheduling scenario:

Another one on 1 core:

22 / 42
  • This is an example of the output we can get, for both the java and the C example
  • If we run the program multiple times we get different order of execution
  • This is normal, we have no control over this as it is the OS scheduler that decides
  • Of course this can be problematic in some situations where we really need to set up a proper order of execution, for example when a thread needs to accomplish a certain task before another thread can start doing its job
  • Later in the course we will see how to use synchronisation mechanisms to manage this

Data Parallelism

Dividing Work between Threads

23 / 42

Data Parallelism

  • Simple form of parallelism commonly found in many applications
    • Common in computational science applications
  • Divide computation into (nearly) equal sized chunks
  • Works best when there are no data dependencies between chunks
24 / 42
  • As we saw previously, data parallelism is an easy form of parallelism found in many applications such as computational science
  • There is data parallelism when we have some data that is structured in such a way that the operations to be performed on it can easily be parralelised
  • This is very suitable for parallelism, the goal is to divide the computations into chunks, computed in parallel

Data Parallelism

  • Simple form of parallelism commonly found in many applications
    • Common in computational science applications
  • Divide computation into (nearly) equal sized chunks
  • Works best when there are no data dependencies between chunks
  • Exploited in multithreading but also at the instruction level in vector/array (SIMD) processors: CPUs (SSE, AVX), and in GPGPUs
24 / 42
  • As we saw previously, data parallelism is an easy form of parallelism found in many applications such as computational science
  • There is data parallelism when we have some data that is structured in such a way that the operations to be performed on it can easily be parralelised
  • This is very suitable for parallelism, the goal is to divide the computations into chunks, computed in parallel
  • You can see an example on the slide in the form of a sum between two arrays
  • We sum each element of same index in A and B and put the result in C
  • For example we can have a different thread realise each addition, or fewer threads each taking care of a subset of the additions
  • This kind of parallelism is exploited in vector and array processors
  • General purpose CPU architectures have vector instructions extensions allowing things like applying the same operation on all elements of an array
  • For example Intel x86-64 use to have SSE and now AVX
  • From a very high level perspective, GPUs also work in that way

Data Parallelism Example

  • Matrix multiplication is a good example
    • No dependencies between the computation of each member of the result matrix
25 / 42
  • Let's see a second example more in details, matrix-matrix multiplication
  • We'll use a square matrix for the sake of simplicity

Data Parallelism Example

  • Matrix multiplication is a good example
    • No dependencies between the computation of each member of the result matrix
26 / 42
  • Let's see a second example more in details, matrix-matrix multiplication
  • We'll use a square matrix for the sake of simplicity
  • Remember that with matrix multiplication, each element of the result matrix is computed as the sum of the multiplication of the elements in one column of the first matrix with the element in one line of the second matrix
  • So how can we parallelise this operation?

Data Parallelism Example

  • Matrix multiplication is a good example
    • No dependencies between the computation of each member of the result matrix
    • n2 parallel threads (1 per result element)
27 / 42
  • Let's see a second example more in details, matrix-matrix multiplication
  • We'll use a square matrix for the sake of simplicity
  • First we can have 1 thread per element of the result matrix, each thread computing the value of the element in question
  • With a n x n matrix that gives us n squared threads

Data Parallelism Example

  • Matrix multiplication is a good example
    • No dependencies between the computation of each member of the result matrix
    • n2parallel threads (1 per result element)
    • n parallel threads (1 per row/column of result)
28 / 42
  • Let's see a second example more in details, matrix-matrix multiplication
  • We'll use a square matrix for the sake of simplicity
  • If we don't have a lot of cores we may want to create fewer threads, it's generally not very efficient to have more threads than cores
  • So another strategy is to have a thread per row or per column of the result matrix

Data Parallelism Example

  • Matrix multiplication is a good example
    • No dependencies between the computation of each member of the result matrix
    • n2parallel threads (1 per result element)
    • n parallel threads (1 per row/column of result)
    • p parallel threads, each computing q rows/columns of the result, where pq = n
29 / 42
  • Let's see a second example more in details, matrix-matrix multiplication
  • We'll use a square matrix for the sake of simplicity
  • Finally, we can also create an arbitrary number of threads by dividing the number of elements of the result matrix by the number of threads that we want and have each thread take care of a subset of the elements

Implicit vs. Explicit Parallelism

30 / 42

Implicit vs. Explicit Parallelism

  • When parallelising a program an important consideration is that of the programmer's effort:
    • What is the best strategy according to the situation?
      • Does the programmer need to be an expert to make that choice?
    • How to indicate the chosen strategy in the code?
      • If we already have a sequential version of the program, how much code refactoring & new code implementation is needed?
31 / 42

NxN Parallel Matrix Multiplication

#define N 1000
int A[N][N]; int B[N][N]; int C[N][N];
int main(int argc, char **argv) {
/* init matrices here */
for(int i=0; i<N; i++)
for(int j=0; j<N; j++) {
C[i][j] = 0;
for(int k=0; k<N; k++)
C[i][j] += A[i][k] * B[k][j];
}
return 0;
}

03-shared-memory-programming/matmult.c

  • Basic sequential code for matrix multiplication
32 / 42

NxN Parallel Matrix Multiplication

#include <omp.h>
#define N 1000
int A[N][N]; int B[N][N]; int C[N][N];
int main(int argc, char **argv) {
/* init matrices here */
#pragma omp parallel
{
/* First loop parralelised */
for(int i=0; i<N; i++)
for(int j=0; j<N; j++) {
C[i][j] = 0;
for(int k=0; k<N; k++)
C[i][j] += A[i][k] * B[k][j];
}
}
return 0;
}

03-shared-memory-programming/openmp.c

  • Automatic parallelisation using OpenMP
    • Very low programmer effort
    • More on OpenMP later
  • Compile and run: gcc openmp.c -o prog -fopenmp && OMP_NUM_THREADS=8 ./prog

Parallelising with pthread would require much more code rewrite

33 / 42
  • So still on our matrix-matrix multiplication example, here is an example of a cool thing we can do in C with a library called open MP
  • We can parallelise the matrix multiplication operation in a very simple way by adding a simple annotation in the code
  • All the iteration of the first outer loop will be done in parallel by different threads
  • The framework is taking care of managing the threads, and also to spawn the ideal number of threads according to the machine executing the program

Explicit vs. Implicit Parallelism

  • Explicit parallelism
    • The programmer explicitly spells out what should be done in parallel/sequence
    • Code modifications needed if sequential program already available
    • Examples: reworking a program to use threads or processes for parallel execution
34 / 42
  • This notion of programmer's effort lead us to the concepts of explicit vs. implicit parallelism
  • With explicit parallelism, the programmer has to write more or less code to indicate what should be done in parallel and what should be done ins sequence
    • This ranges from writing all the thread code manually, or just putting some annotations in sequential program s(OpenMP)

Explicit vs. Implicit Parallelism

  • Explicit parallelism
    • The programmer explicitly spells out what should be done in parallel/sequence
    • Code modifications needed if sequential program already available
    • Examples: reworking a program to use threads or processes for parallel execution
  • Implicit parallelism
    • No effort from the programmer, system works out parallelism by itself
    • No code modification over an already existing sequential program
    • Done for example by some languages able to make strong assumptions about data sharing (e.g. pure functions), or with ILP
34 / 42
  • This notion of programmer's effort lead us to the concepts of explicit vs. implicit parallelism
  • With explicit parallelism, the programmer has to write more or less code to indicate what should be done in parallel and what should be done ins sequence
    • This ranges from writing all the thread code manually, or just putting some annotations in sequential program s(OpenMP)
  • And with implicit parallelism, the system work out the parallelism fully automatically
    • Some languages (e.g. functional languages) have different views of computation which allow a default assumption of parallelism (no updatable shared state between functions)

Example Code for Implicit Parallelism

Fortran 77 sequential array addition:

DO I = 1, N
DO J = 1, N
C(J, I) = A(J, I) + B(J, I)
END DO
END DO

In Fortran 90 with implicit parallelism:

C = A + B
35 / 42

Automatic Parallelisation

  • In an ideal world, all parallelism would be implicit: the compiler and/or the hardware would take an ordinary sequential program and derive the parallelism automatically
36 / 42
  • So of course implicit parallelism is the best type of parallelism because the programmer does not have to do anything
  • If we have a sequential program it would be great if the compiler can automatically extract all the parallelism!

Automatic Parallelisation

  • In an ideal world, all parallelism would be implicit: the compiler and/or the hardware would take an ordinary sequential program and derive the parallelism automatically
    • Manufacturers of pre-multicore parallel machines invested considerably in such technology
    • Can do quite well if the programs are simple enough but dependency analysis can be very hard
    • Must be conservative
      • If you cannot be certain that parallel version computes correct result, can't parallelise
36 / 42
  • So of course implicit parallelism is the best type of parallelism because the programmer does not have to do anything
  • If we have a sequential program it would be great if the compiler can automatically extract all the parallelism!
  • There was a lot of effort invested in such technologies at the time of the first parallel machines, before the multicore era
  • It works well on small programs but in the general case, analysing dependencies in order to define what can be done in parallel and what needs to be done sequentially becomes very hard
  • And of course the compiler needs to be conservative not to break the program, i.e. if it is not 100% sure that two steps can be run in parallel they need to run sequentially
  • So overall performance gains through implicit parallelism are quite limited, and to get major gains one need to go the explicit way

Example Problems for Parallelisation

  • Can the compiler automatically parallelise the execution of these loops' iterations?
    • I.e. run all or some iterations in parallel
for (int i = 0 ; i < n-3 ; i++) {
a[i] = a[i+3] + b[i] ; // at iteration i, read dependency with index i+3
}
for (int i = 5 ; i < n ; i++) {
a[i] += a[i-5] * 2 ; // at iteration i, read dependency with index i-5
}
for (int i = 0 ; i < n ; i++) {
a[i] = a[i + j] + 1 ; // at iteration i, read dependency with index ???
}
37 / 42
  • Now here are a few examples of dependencies that a compiler may face when trying to extract parallelism automatically
  • They all regard parallelising all of some iterations of a loop
  • In the first loop 3 slots after the index computed at each iteration we have a read dependency
  • In the second loop we have another read dependency, this time 5 slots before the index computed at each iteration
  • Then in the last case, still a read dependency, we don't know if the slot read is before or after the one being computed
  • This is quite important as we are going to see

Automatic Parallelisation

for (int i=0; i<n-3; i++)
a[i] = a[i+3] + b[i];
  • If we parallelise all iterations and iteration 3 runs before iteration 0 we break the program
  • Positive offset: we read at each iteration what was in the array before the loop started
  • We never read a value computed by the loop itself
38 / 42

Automatic Parallelisation

for (int i=0; i<n-3; i++)
a[i] = a[i+3] + b[i];
  • If we parallelise all iterations and iteration 3 runs before iteration 0 we break the program
  • Positive offset: we read at each iteration what was in the array before the loop started
  • We never read a value computed by the loop itself
Can parallelise all iterations by making a new version of array a
parrallel_for(int i=0; i<n-3; i++)
new_a[i] = a[i+3] + b[i];
a = new_a;
38 / 42
  • Let's have a look at the first example, it's the simplest
  • Because regarding the dependency, the index read is superior to the index written, what we read at each iteration is what was in the array before we even started the loop
  • At no point we read an index that have been computed by this loop
  • However if we parallelise each iteration this could happen if let's say iteration 3 runs before iteration 0
  • So the solution is to internally make a new version of the array and read from an unmodified copy

Automatic Parallelisation

for (int i = 5 ; i < n ; i++) {
a[i] += a[i-5] * 2 ;
}
  • Previous trick does not work here: this time we read values computed by the loop itself
  • At each iteration i we read what was computed by the loop at iteration i-5
39 / 42

Automatic Parallelisation

for (int i = 5 ; i < n ; i++) {
a[i] += a[i-5] * 2 ;
}
  • Previous trick does not work here: this time we read values computed by the loop itself
  • At each iteration i we read what was computed by the loop at iteration i-5
  • Solution: limit parallelism to 5
40 / 42
  • Now with a dependency such as this one, the trick from the previous slide won't work!
  • Because what we read at iteration i is supposed to have been written 5 iteration before, we can't rely on a read-only copy
  • Also, parallelising all iteration will break the program
  • We observe that at a given time, there is no dependency between sets of 5 iterations, such as the five first ones.
  • The solution here is to limit the parallelism to 5

Shared Memory

  • For this lecture we assumed that threads share memory
    • I.e. they all have access to a common address space
    • Multiple threads can read and write in the same memory location (variable, buffer, etc.) through
      • Global variable
      • Pointers and references
41 / 42

Shared Memory

  • For this lecture we assumed that threads share memory
    • I.e. they all have access to a common address space
    • Multiple threads can read and write in the same memory location (variable, buffer, etc.) through
      • Global variable
      • Pointers and references
  • No shared memory?
    • Need to communicate via message passing
    • Close to a distributed system
    • We'll talk briefly about MPI (Message Passing Interface) later in the course unit
41 / 42
  • One last thing
  • Everything in this lecture has been on the basis that threads share memory
  • In other words they can all access the same memory, and they will all see the same data at a given address
  • Language scope still apply, so threads can declare local variables that are private, but shared stuff is accessible from every thread
  • If you don’t have shared memory, threads have to communicate via messages – more like a distributed system. We will talk about MPI in another lecture

Summary and Next Lecture

  • In shared memory systems, a parallel program can use threads
    • Threads communicate implicitly by reading and writing to a common address space
  • A simple form of parallelism: data parallelism
    • Apply similar operations on chunks of a data set
    • Efficient when there is no data dependency
  • Automatic parallelisation can be limited by such dependencies
  • Shared memory is a practical form of implicit communication
    • It's great that multicore today share memory right?
    • Next lecture will discuss why this isn’t as simple as it sounds!
42 / 42

Threads

2 / 42
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow