class: center, middle background-image: url(include/title-background.svg) # COMP35112 Chip Multiprocessors
# .white[MESI/MOESI Cache Coherence] .white[Pierre Olivier] ??? - Hello everyone - In the previous lecture we have covered the MSI cache coherence protocol - In this short video we are going to see a few optimisations implemented on top of MSI - These optimisations gives us two new protocols, MESI and MOESI --- # Unnecessary Communication - Bus: critical shared resource, unnecessary use impacts performance ??? - So in a bus-based cache coherence system, the bus itself is a critical resource because it is shared by all cores - Only one component can use the bus at a time, so unnecessary use of the bus is a waste of time and impacts performance - In some scenarios, MSI can send a lot of unnecessary requests on the bus --
??? - Take for example the following case - One core, core 2, has the data in the shared state - And all other cores have the data in the invalid state - If there is a write to core 2 we transition from invalid-invalid-invalid shared to invalid-invalid-invalid modified - MSI would still broadcast an invalidate request on the bus to all cores even if it's unnecessary -- -----
??? - However in another scenario, for example when all cores have the data in the shared state, and there is a write on any of these cores - The broadcast is actually needed - The the central problem is that with MSI the core that writes does not know the status of the data on the other cores so it blindly broadcast invalidate message - How can we differentiate these cases? --- # Optimising for Non-Shared Values - Distinguish between the two shared cases
??? - We need to distinguish between the two shared cases: - In the first case a cache holds the only copy of a value which is in sync with memory (in other words it is not modified) - In the second case a cache holds a copy of the value which is in sync with memory and there are also other copies in other caches - In the first case we do not need to send an invalidate on write whereas in the second an invalidate is needed --- # MESI Protocol - The unshared case is very common - **Split the S state into:** - **E: exclusive** - Switch to E after a read causing a fetch from memory - **S: (truly) shared** - Switch to S after a read that gets value from another cache
??? - So the unshared case is actually very common - In real application, the majority of variables are unshared, for example all the thread thread's local variables - With MESI we therefore split the S state into two states that corresponds to the two cases we have seen on the previous slide: - **E, which stands for exclusive**, in which a cache has the only copy in sync with memory - and **S, which stands for truly shared** - The two states are easy to determine - We switch to E after a read caused a fetch from memory, as shown on the picture on the left - And we switch to S after a read that gets value from another cache, as shown on the right hand side of the picture - MESI is a simple extension but it yields a **significant reduction in bus usage** - Therefore in practice MESI is more widely used than MSI - We won't cover details here - A notable point is that a cache line eviction on a remote core can cause a line in the local core being in state S to be the only remaining copy - We should theoretically switch to E but in practice it's hard to detect, so we stay in S --- # MOESI Protocol
- Split the M state into two **Modified** and **Owned** ??? - MOESI is a further optimisation in which we split the M state into two - First M, for modified - It is the same as before, the cache contains a copy which differs from that in memory but there are no other copies - Then we have O for owned - It means that the cache contains a copy which differs from that in memory and there may be copies in other caches which are in state S - and these have the same value as the owner --- # MOESI Protocol
- Owner has exclusive rights to make changes - Broadcast the changes to the shared copies - **No memory writeback needed** - Writeback only when data in O or M is evicted ??? - This allows the latest value to be shared without having to write it back to memory immediately - The owner is the only one that can make changes without sending an invalidate message - When it writes the owner broadcasts the changes to the other copies, without the need for a costly writeback in memory - Only when a cache line in state O or M gets evicted will any write back to memory be done --- # Summary - Bus-based cache systems - can improve performance by reducing bus usage - 2 optimisations: MESI, MOESI - Still, bus-based systems can't scale to large multiprocessor counts - Next video: directory-based coherence systems ??? - To summarise, in this video we saw that we can improve the performance of bus-based cache coherency system by reducing the bus usage - This is done by introducing more states in the protocol - We saw two examples, MESI and MOESI - Even with such optimisations, bus-based coherency cannot scale to large number of compute units - In the next lecture we will see a radically different approach, which is directory-based cache coherency