インテル® VTune™ Amplifier 2018 ヘルプ
This metric represents a fraction of cycles during which an application could be stalled due to approaching bandwidth limits of the main memory (DRAM). This metric does not aggregate requests from other threads/cores/sockets (see Uncore counters for that). Consider improving data locality in NUMA multi-socket systems.
A significant fraction of cycles were stalled due to to approaching bandwidth limits of the main memory (DRAM).
Improve data accesses to reduce cacheline transfers from/to memory using these possible techniques:
Consume all bytes of each cacheline before it is evicted (for example, reorder structure elements and split non-hot ones).
Merge compute-limited and bandwidth-limited loops.
Use NUMA optimizations on a multi-socket system.
Note: software prefetches do not help a bandwidth-limited application.