class: center, middle background-image: url(include/title-background.svg) # COMP35112 Chip Multiprocessors
# .white[Shared Memory Multiprocessors] .white[Pierre Olivier] ??? - Hello everyone - In the previous lecture we introduced how to program with threads that share memory for communication - In this video, we are going to talk about how the hardware is setup to ensure that threads running on different cores can share memory by seeing a common address space - In particular we will introduce the issue of cache coherency on multicore processor --- # Multiprocessor Structure .leftcol[
] .rightcol[ .medium[ - Most general purpose multiprocessors are shared memory - **easier to program**, however hardware is **more complex** - Let's study the scalability issues of cache coherence systems - Focusing on bus-based ones ]] ??? - The majority of general purpose multiprocessors are shared memory - In this model all the cores have a unified view on memory, for example here they can all read and write the data at address x in a coherent way - This is by opposition to distributed memory systems where each core or processor has its own local memory and does not necessarily has a direct and coherent access to other processor's memory - Thus shared memory multiprocessors are dominant because they are easier to program - However shared memory hardware is usually more complex and this lead to the particular problem of bus-based cache coherency systems that do not really scale beyond a certain number of cores - Intel/AMD, and others, have developed solutions such as Intel QuickPath and UltraPath Interconnect as well as AMD Coherent Hyper-transport - In this video we will introduce the problem of cache coherency in shared memory multiprocessor systems, focusing first on bus-based systems - And in the next lecture we'll see how it is managed --- # Caches - A high performance uniprocessor:
- Cache: **fast** and **small** local memory holding recently used data and instruction ??? - Recall that a high performance uniprocessor has the following structure - Main memory is far too slow to keep up with modern processor speed it can take up to hundreds of cycles to access, versus the CPU registers that are accessed instantaneously - So another type of on-chip memory is introduced, the cache - It is much faster than main memory, being accessed in a few cycles - It is also expensive so its size is relatively small, and thus the cache is used to maintain a subset of the program data and instructions --- # Caches - A high performance uniprocessor:
- There may be multiple levels (L1, L2, L3) ??? - Cache can have multiple levels: generally in multiprocessors we have a level 1 and sometimes level 2 caches that are local to each core, and a shared last level cache --- # Caches - A high performance uniprocessor:
??? - If an entire program data-set can fit in the cache, the CPU can run at full speed - However it is rarely the case on modern applications and new data/instructions needed by the program have to be fetched from memory (on each cache miss) --- # Caches - A high performance uniprocessor:
??? - Also, newly written data in cache must eventually be written back to main memory --- # The Cache Coherency Problem - What happens with multiprocessors?
??? - With just one CPU there is no problem, data just written to the cache can be read correctly whether or not it has been written to memory - But things get more complicated when we have multiple processors - Indeed several CPUs may share data, i.e. one can write a value that the other needs to read - How does that work with the cache? --- # The Cache Coherency Problem - What happens with multiprocessors?
??? - So consider the following situation - We have a data x in RAM - CPU A first reads it, then updates it in its own cache into x prime - Then later we have CPU B that wishes to read the same data --- # The Cache Coherency Problem .leftcol[ - Apparently obvious solution: 'write through’ policy? - Every write is updated in memory - Involves a lot of memory accesses, **negating cache benefits** ] .rightcol[
] ??? - An apparently obvious solution would be to ensure that every write is updated in memory - That's a write through cache policy - However, this would mean that every time we write we need to write to memory - And every time we read we also need to fetch from memory in case the data was updated - This is very slow and negates the cache benefits, thus it's not a good idea --- # The Cache Coherency Problem - **Cache-to-cache communication**? - How to avoid separate cache copies, i.e. **how to maintain cache coherency?** - It gets complex, we need to develop a model - Topic of the next lecture ??? - So how can we overcome these issues? - Can we communicate cache-to-cache rather than always go through memory? - In other words, when a new value is written in one cache, all other values somehow located in other caches somehow would need to be either updated or invalidated - Another issue is: what if two processors try to write to the same location - In other words how to avoid having two separate cache copies? - This is what we refer to by cache coherency - So things are getting complex and we need to develop a model - How to efficiently achieve cache coherency in a shared memory multiprocessor is the topic of the next lecture