class: center, middle background-image: url(include/title-background.svg) # COMP35112 Chip Multiprocessors
# .white[Message Passing Interface] .white[Pierre Olivier] ??? - Hello everyone - In this video I will talk briefly about MPI - It's a standard defining a set of operations allowing to write distributed applications that can be massively parrallel --- # Threads vs. OpenMP and MPI - Multithreading as we saw it is not a panacea 1. Shared-memory limits us to the machine boundary 2. Lots of effort to divide parallel work - To address these problems, two standard parallel programming frameworks: - **MPI: Message Passing Interface** (addresses 2., this lecture) - **OpenMP: Open MultiProcessing** (addresses 1., seen in a future lecture) ??? - Before jumping into MPI let's speak a bit about multithreading - Until now most of the programming work we have been doing involve threads - A first issue with the shared memory model adopted by multithreading is that the number of cores in your machine is a hard limit: you cannot use the computing resources of several machines in parallel - A second problem when developing with POSIX threads is that dividing the work to be done in parallel between threads can be a lot of effort, especially if you have a lot of parallel regions - In many cases you have an already existing sequential application and you want to parallelise it, it can take a lot of work to do so with threads - To address these problems, we'll see two standard parallel programming frameworks that are in widespread use: - MPI, which stands for Message Passing Interface - It is a technology addressing the first problem, letting you run a parallel application over multiple machines - And OpenMP, Open MultiProcessing - It addresses the second issue, by reducing the amount of engineering effort to parallelise a shared memory application - We'll cover MPI here, and in a future lecture we'll talk about OpenMP --- # MPI - **Message Passing Interface**: - A **standard**: set of basic functions used to create portable parallel applications - Used to program a wide range of parallel architectures: - Shared memory machines, **supercomputers**, **clusters** - Relies on **message passing** for communication - **It is the most widely used API for parallel programming** - Downside: an existing application needs to be redesigned with MPI in mind to be ported ??? - So what is MPI? - It is a standard that describe a set of basic functions that can be used to create parallel applications - These functions are available in libraries for C and Fortran - Such application are portable between various parallel systems that support the standard - So MPI is used to program many types of systems, not only shared memory machines as we have seen until now - But also custom massively parallel machines also called manycores or supercomputers - MPI can be used to run a single application over multiple machines, a cluster of computers - An important thing to note is that, for communication between parallel execution units, MPI relies on message passing and not shared memory - This allows an MPI application to exploit parallelism beyond a single machine boundary - For these reasons MPI is the most widely used API for parallel programming - Still programming in MPI has a cost: your application must be designed from scratch with message passing in mind, so porting existing applications is costly --- # Messages, Processes, Communicators .leftcol[
] .rightcol[ - Messages: rank of sending/receiving processes + tag for type - Process groups + communicators (e.g. `MPI_COMM_WORLD`) to restrict communications ] ??? - In MPI each parallel unit of execution is called a process - Each process is identified by its **rank**, which is an integer - Processes exchange **messages**, characterised by - the rank of the sending and receiving processes - as well as an integer tag that allows to define message "types" - Processes are grouped, they are initially within a single group which can be split later - Messages can be send over what is called a **communicator** linked to a process group in order to restrict the communication to a subset of the processes - This allows code modularity, for example when you are using a library that has itself a set of communicating processes, you don't want these to send or receive messages to/from these particular processes --- # MPI Hello World ```c #include
#include
int main(int argc, char** argv) { MPI_Init(NULL, NULL); /* Initialise MPI */ /* Retrieve the world size i.e. the total number of processes */ int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); /* Retrieve the curent process rank (id) */ int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* Retreive the machine name */ char machine_name[MPI_MAX_PROCESSOR_NAME]; int nlen; MPI_Get_processor_name(machine_name, &nlen); printf("Hello world from process of rank %d, out of %d processes on machine" " %s\n", my_rank, world_size, machine_name); /* Exit */ MPI_Finalize(); } ``` .codelink[
`13-mpi/mpi-hello.c`
] ??? - Here is a minimal MPI application in C - This code will be executed by each process of our application - We start by initialising the MPI library - Then we get the total number of processes running for this application with MPI_Comm_size - And we get the identifier of the current process, its rank - Then we get the name of the machine into a character array, and finally we print all these information before exiting --- # MPI Hello World - To install the MPICH implementation on Ubuntu/Debian: ```bash sudo apt install mpich ``` - To compile the aforementioned example, use the provided gcc wrapper: ```bash mpicc listing1.c -o listing1 ``` - To run: ```bash mpirun listing1 # Default: will run a single process mpirun -n 2 listing1 # Manually specify the number of processes to run ``` ??? - To compile this application you need the MPI library installed on your system - You have some instruction on the slide to do this for Debian or Ubuntu systems - You can compile using mpicc - And to execute use mpirun - By default, depending on the version of the library, it will run only a single process, or a number of processes equals to your number of cores, but you can also specify that manually --- # Blocking `send` and `recv` ```c MPI_Send(void* data, int cnt, MPI_Datatype t, int dst, int tag, MPI_Comm com); MPI_Recv(void* data, int cnt, MPI_Datatype t, int src, int tag, MPI_Comm com, MPI_Status* st); ``` - `data`: pointer to the data to send/receive - `cnt`: number of items to send/receive - `t`: type of data - e.g. `MPI_INT`, `MPI_DOUBLE`, etc. - `dst`: rank of the receiving process - `src`: rank of the sending process - `tag`: integer, in `recv` can be `MPI_ANY_TAG` - `com`: the communicator, e.g. `MPI_COMM_WORLD` - `st`: status information ??? - To send and receive messages we have the MPI_Send and MPI_Recv functions - They take the following parameters - Data is a pointer to a buffer containing the data to send and receive - As you can see it's a void star so it can point to data structure of any type - Count is the number of items to send or receive, in the case you want to transmit an array - t is the type of the data exchanged, there are some macros for integers, floating point numbers, and so on - dst is the rank of the receiving process for send - and src is the rank of the sending process for receive - Note that there are some macros to specify sending and receiving from any process - Com is the communicator describing the subset of processes that can participate to this communication - And st is some status information that can be checked after receive complete --- # MPI `send`/`recv` Example - A counter "bounces" randomly between processes, gets incremented at each hop
??? - So let's study a simple example of an application that creates N processes - At the beginning the first process of rank 0 creates a counter and initialises it to 0 - Then it sends this value to another random process, let's say 2 - 2 receives the value, increments it by one, and send the result to another random process - And rinse and repeat --- # MPI `send`/`recv` Example ```c #define BOUNCE_LIMIT 20 int main(int argc, char** argv) { MPI_Init(NULL, NULL); int my_rank, world_size; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &world_size); if (world_size < 2) { fprintf(stderr, "World need to be >= 2 processes\n"); MPI_Abort(MPI_COMM_WORLD, 1); } if(my_rank == 0) { int payload = 0; int target_rank = 0; while(target_rank == my_rank) target_rank = rand()%world_size; printf("[%d] sent int payload %d to %d\n", my_rank, payload, target_rank); MPI_Send(&payload, 1, MPI_INT, target_rank, 0, MPI_COMM_WORLD); } ``` .codelink[
`13-mpi/send-recv.c`
] The rest on the code follows in the next slide ??? - This is the first part of the code - Remember this code will be executed by all processes of our application - We start by getting the total number of processes and our rank - Then the following block of code is only executed by one process, the one with rank 0 - We initialize the counter and take a target process at random - The while loop makes sure that 0 does not send a message to itself - Then we send the message: we pass a pointer to the payload, specify that we send one element, that it is an integer, give the rank of the target process, and specify the MPI_COMM_WORLD communicator which corresponds to all processes in the system --- ```c while(1) { int payload; MPI_Recv(&payload, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); if(payload == -1) { printf("[%d] received stop signal, exiting\n", my_rank); break; } if(payload == BOUNCE_LIMIT) { int stop_payload = -1; printf("[%d] broadcasting stop signal\n", my_rank, payload); for(int i=0; i
`13-mpi/send-recv.c`
] ??? - The rest of the code is executed by all processes - We enter an infinite loop - We start by waiting for a message reception with `MPI_Recv` - We specify the address of a buffer that will receive the payload value, and that it will be a single int element, coming from any process - When the payload reaches a certain limit we want to exit the application so we broadcast a special payload of -1 to all processes - Upon reception of -1 as the payload, each process will exit the infinite loop - In normal operation, a process receiving the payload will increment it, and will choose a random process to send it to --- # Collective Operations - `MPI_Barrier`: barrier synchronisation - `MPI_Bcast`: broadcast - `MPI_Scatter`: broadcast parts of a single piece of data to different processes - `MPI_Gather`: inverse of ‘scatter’ - `MPI_Reduce`: a reduction operation - etc. ??? - In addition to send and receive you have various collective operations - They allow to easily create barriers - to broadcast a message to multiple processes - While broadcast sends the same piece of data to multiple processes, scatter breaks a piece of data into parts and send each part to a different process - Gather is the inverse of scatter and allows to aggregate elements sent from multiple process into a single piece of data - We also have Reduce, that allow to combine values sent by different processes with a reduction operation, for example an addition - And so on - Using such operations, many programs make little use of the more primitive Send and Recv operations --- # Other MPI Features - Asynchronous (non-blocking) communication: `MPI_Isend()` and `MPI_Irecv()` - Combined with `MPI_Wait()` and/or `MPI_Test()` - Persistent communication - Modes of communication - Standard - Synchronous - Buffered - “One-sided” messaging - `MPI_Put()`, `MPI_Get()` ??? - MPI also has other features, including non blocking communications with Isend and Ireceive - These functions do not wait for the request completion and they return directly, so in general they are used in combination with MPI_Test to check the completion of a request, and MPI_Wait to wait for a completion - You also have a feature called persistent communication that is useful to define messages that are sent repeatedly with the same arguments - Furthermore, there are 3 communication modes: - With the standard one, a call to send returns when the message has been sent but not necessarily yet received - With the synchronous one, a call to send returns when the message has been received - And with the buffered one, the send returns even before the message is sent, something that will be done later - Finally MPI also allows a process to directly write in the memory of another process using put and get --- # Summary - MPI: standard to develop parallel applications - Exploit parallelism **beyond the boundaries of a single physical machine** - Widely used - Cost: significant effort to rewrite legacy serial applications with MPI - Guide to run applications on a cluster: https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ - Coming up: **OpenMP** - A framework to parallelise shared memory applications with as few effort as possible ??? - To conclude, MPI is a widely used standard to develop parallel applications - Contrary to shared memory programming, it allows such applications to exploit the parallelism offered by clusters, in other words to go beyond the core limit of a single physical machine - However this come at a cost: when porting legacy application, there is a significant engineering effort needed to redesign them with MPI in mind - Also note that the examples we saw here ran multiple processes on a single machine - If you want a basic guide on how to run an MPI application on a cluster of machines, check out the link on the slide - Soon, we'll cover OpenMP - It's a framework for shared memory programming, and one of its goal is to make parallelising existing applications as simple as possible