CPU caches with examples for ARM Cortex-M

CPU data cache modes

  • Write-back. During write operation data is loaded only to the cache. The real write to memory is deferred until the cache is full and space is required for new data.
  • Write-through. Writing to a cache and a memory occur “simultaneously”.
  • Sequential and frequent access to the same memory address can degrade performance.
  • You still need to do a cache invalidate after the end of DMA operations.
  • There is the “Data corruption in a sequence of Write-Through stores and loads” bug in some versions of Cortex-M7

How to set a cache mode in ARM Cortex-M?

Caches modes comparison. Tests.

Non-cachable memory VS. Write-back

#define ITERS         100
#define BLOCK_LEN 4096
#define BLOCKS 16
/* So it's just an arbitrary buffer in external SDRAM memory. */
#define DATA_ADDR 0x60100000
***for (i = 0; i < ITERS * BLOCKS * 8; i++) {
dst = (uint8_t *) DATA_ADDR;
for (j = 0; j < BLOCK_LEN; j++) {
val = VALUE;
*dst = val;
val = *dst;
dst++;
}
}
for (i = 0; i < ITERS * BLOCKS; i++) {
for (j = 0; j < BLOCK_LEN; j++) {
/* 16 lines */
arr[i]++;
arr[i]++;
***
arr[i]++;
}
}
Non-cacheable: 4s 743ms
Write-back : 4s 187ms
for (i = 0; i < ITERS * BLOCKS; i++) {
for (j = 0; j < BLOCK_LEN; j++) {
arr[i + 0 ]++;
***
arr[i + 3 ]++;
arr[i + 4 ]++;
arr[i + 100]++;
arr[i + 6 ]++;
arr[i + 7 ]++;
***
arr[i + 15]++;
}
}
Non-cacheable: 11s 371ms
Write-back: : 4s 551ms
for (i = 0; i < ITERS * BLOCKS; i++) {
for (j = 0; j < BLOCK_LEN; j++) {
arr[i + 0 ]++;
***
arr[i + 4 ]++;
arr[i + 100]++;
arr[i + 6 ]++;
***
arr[i + 9 ]++;
arr[i + 200]++;
arr[i + 11]++;
arr[i + 12]++;
***
arr[i + 15]++;
}
}
Non-cacheable: 12s 62ms
Write-back : 4s 551ms

When ‘write-allocate’ is better to use?

for (i = 0; i < ITERS * BLOCKS; i++) {
for (j = 0; j < BLOCK_LEN; j++) {
arr[j + 0 ] = VALUE;
***
arr[j + 7 ] = VALUE;
arr[j + 8 ] = arr[i % 1024 + (j % 256) * 128];
arr[j + 9 ] = VALUE;
***
arr[j + 15 ] = VALUE;
}
}
Write-back                  : 4s 720ms
Write-back no write allocate: 4s 888ms

When ‘no-write-allocate’ is better to use?

for (i = 0; i < ITERS * BLOCKS; i++) {
for (j = 0; j < BLOCK_LEN; j++) {
arr_wr[i * BLOCK_LEN ] = arr_rd[j + 0 ];
arr_wr[i * BLOCK_LEN + j*32 + 1 ] = arr_rd[j + 1 ];
arr_wr[i * BLOCK_LEN + j*64 + 2 ] = arr_rd[j + 2 ];
arr_wr[i * BLOCK_LEN + j*128 + 3] = arr_rd[j + 3 ];
arr_wr[i * BLOCK_LEN + j*32 + 4 ] = arr_rd[j + 4 ];
***
arr_wr[i * BLOCK_LEN + j*32 + 15] = arr_rd[j + 15 ];
}
}
Write-back                  : 7s 601ms
Write-back no write allocate: 7s 599ms

Real-world applications with and without CPU caches

PING

Non-cachable :  ~0.246 sec
Write-back : ~0.140 sec

OpenCV

gettimeofday(&tv_start, NULL);cedge.create(image.size(), image.type());
cvtColor(image, gray, COLOR_BGR2GRAY);
blur(gray, edge, Size(3,3));
Canny(edge, edge, edgeThresh, edgeThresh*3, 3);
cedge = Scalar::all(0);
image.copyTo(cedge, edge);gettimeofday(&tv_cur, NULL);
timersub(&tv_cur, &tv_start, &tv_cur);
> edges fruits.png 20
Processing time 0s 926ms
Framebuffer: 800x480 32bpp
Image: 512x269; Threshold=20
> edges fruits.png 20
Processing time 0s 134ms
Framebuffer: 800x480 32bpp
Image: 512x269; Threshold=20

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store