COMP35112 - MESI/MOESI Cache Coherence

class: center, middle
background-image: url(include/title-background.svg)
# COMP35112 Chip Multiprocessors
<br/>
<br/>
# .white[MESI/MOESI Cache Coherence]

.white[Pierre Olivier]

???

- Hello everyone
- In the previous lecture we have covered the MSI cache coherence protocol
- In this short video we are going to see a few optimisations implemented
  on top of MSI
- These optimisations gives us two new protocols, MESI and MOESI

---
# Unnecessary Communication

- Bus: critical shared resource, unnecessary use impacts performance

???

- So in a bus-based cache coherence system, the bus itself is a critical
  resource because it is shared by all cores
- Only one component can use the bus at a time, so unnecessary use of the bus
  is a waste of time and impacts performance
- In some scenarios, MSI can send a lot of unnecessary requests on the bus

???
- Take for example the following case
- One core, core 2, has the data in the shared state
- And all other cores have the data in the invalid state
- If there is a write to core 2 we transition from invalid-invalid-invalid
  shared to invalid-invalid-invalid modified
- MSI would still broadcast an invalidate request on the bus to all cores
  even if it's unnecessary

--
-----
<div style="text-align:center"><img src="include/unnecessary-comm-2.svg" width=700 /></div>

???

- However in another scenario, for example when all cores have the data in the
  shared state, and there is a write on any of these cores
- The broadcast is actually needed
- The the central problem is that with MSI the core that writes does not know
  the status of the data on the other cores so it blindly broadcast invalidate
  message
- How can we differentiate these cases?

---
# Optimising for Non-Shared Values

- Distinguish between the two shared cases

???

- We need to distinguish between the two shared cases:
  - In the first case a cache holds the only copy of a value which is in
    sync with memory (in other words it is not modified)
  - In the second case a cache holds a copy of the value which is in sync
    with memory and there are also other copies in other caches
- In the first case we do not need to send an invalidate on write whereas in
    the second an invalidate is needed

---
# MESI Protocol

- The unshared case is very common
- **Split the S state into:**
  - **E: exclusive**
      - Switch to E after a read causing a fetch from memory
  - **S: (truly) shared**
      - Switch to S after a read that gets value from another cache

???

- So the unshared case is actually very common
- In real application, the majority of variables are unshared, for example all
  the thread thread's local variables
- With MESI we therefore split the S state into two states that corresponds to
  the two cases we have seen on the previous slide:
  - **E, which stands for exclusive**, in which a cache has the only copy in
    sync with memory
  - and **S, which stands for truly shared**
- The two states are easy to determine
- We switch to E after a read caused a fetch from memory, as shown on the
  picture on the left
- And we switch to S after a read that gets value from another cache, as shown
  on the right hand side of the picture
- MESI is a simple extension but it yields a **significant reduction in bus
  usage**
- Therefore in practice MESI is more widely used than MSI
- We won't cover details here
- A notable point is that a cache line eviction on a remote core can cause a
  line in the local core being in state S to be the only remaining copy
- We should theoretically switch to E but in practice it's hard to detect, so 
  we stay in S

---
# MOESI Protocol

- Split the M state into two **Modified** and **Owned**

???
- MOESI is a further optimisation in which we split the M state into two
- First M, for modified
- It is the same as before, the cache contains a copy which differs from that
  in memory but there are no other copies
- Then we have O for owned
- It means that the cache contains a copy which differs from that in memory and
  there may be copies in other caches which are in state S
- and these have the same value as the owner

---
# MOESI Protocol

- Owner has exclusive rights to make changes
  - Broadcast the changes to the shared copies
      - **No memory writeback needed**
  - Writeback only when data in O or M is evicted

???
- This allows the latest value to be shared without having to write it back to
  memory immediately
- The owner is the only one that can make changes without sending an invalidate
  message
- When it writes the owner broadcasts the changes to the other copies, without
  the need for a costly writeback in memory
- Only when a cache line in state O or M gets evicted will any write back to
  memory be done

---
# Summary

- Bus-based cache systems
  - can improve performance by reducing bus usage
  - 2 optimisations: MESI, MOESI
- Still, bus-based systems can't scale to large multiprocessor counts
- Next video: directory-based coherence systems

???
- To summarise, in this video we saw that we can improve the performance of
  bus-based cache coherency system by reducing the bus usage
- This is done by introducing more states in the protocol
- We saw two examples, MESI and MOESI
- Even with such optimisations, bus-based coherency cannot scale to large number
  of compute units
- In the next lecture we will see a radically different approach, which is
  directory-based cache coherency