검색결과 리스트
글
Cache Optimization
First of all, cache miss 의 종류
1. Compulsory(강제 미스) : The very first access to a block cannot be in the cache.
2. Capacity(용량 미스) : Cache cannot contain all block which is neccessary during the execution
3. Conflict(충돌 미스) : Because of the placement strategy.
Cache를 optimization 시키는 3가지 : hit를 많이 시키자, hit 시켰을때 시간이 많이 걸리지 않게 하자, 설사 miss가 되었더라도 miss penalty를 줄여보자. 입니다
Basic Optimization
1. Reducing hit time (first-level cache)
1) Giving reading priority over Write : Write-back, Write-through cache, write buffer
2) Avoiding address transition during cache indexing : Virtual index physical tag cache
2. Reducing miss penalty (below the first-level cache)
1) Multilevel Cache : Don't need to access memory (First-level cache should be small for the fast clock cycle time)
3. Reducing Miss Rate
1) Larger block size : spatial locality, compulsory misses↓, miss penalty↑, capacity and conflict misses ↑
2) Bigger cache : Capacity miss ↓, hit time ↑, clock speed ↓
3) Higher associativity : Conflict miss ↓, hit time ↑
Advanced cache optimization
1. Reducing hit time (first-level cache)
1) small and simple
small
- many index, clock speed(propagation) ↓ => hit time↓
- small cache takes less time to index
simple
- high associative => => hit time↓
- direct mapped cache can overlap tag check with data transmission since no choice
2) way predition : each cache block has block predictor bits
=> How to combine fast hit time of direct-mapped with lower conflict misses of 2-way set associative cache (direct cache can overlap tag check and data transmission, high associative cache can reduce miss rate)
=> 1. multiplexer set early to select desired block; only 1 tag comparison
3) Trace cache : A trace cache stores instructions either after they have been decoded, or as they are retired. Trace path contains only instruction which are actually used. This allows the instruction fetch unit to fetch several basic blocks, without having to worry about branches in the execution flow.
2. Increasing cache bandwidth(cache structure)
1) pipelined cache : yeah "Miss under miss", "hit under multiple miss".
2) multibanked cache : can support simultaneous accesses, mapping that works well is sequential interleaving
3) nonblocking cache : hit under miss(until WAW or WAR hazard is occured)
3. Reducing miss penalty
1) Critical word first, Early restart : Request critical word first and as soon as the requested word of the block is arrived, send it to CPU (beneficial only when one block size is large)
2) Merging write buffer : merge because multi-word writes is more efficient
usually multiword writes are faseter than writes performed on word at a time
figure: four stores to sequential addresses would fill the buffer at one word per entry, even though theses four words when merged exactly fit within a single entry of the write buffer
4. Reducing miss rate
1) Compiler optimization : Reorder the procedure in memory so as to reduce conflict misses
- Merging arrays
- Loop interchange
- Loop fusion
- Blocking
5. Parellelism
1) Hardware prefetch : hardware stream buffer based on past cache access pattern
2) Compiler prefetch : special instruction (pref 0, 20($4)) is inserted into the program by compiler to bring the desired block into the L1 Dcache
'Learning stuff' 카테고리의 다른 글
SIMD (0) | 2013.01.29 |
---|---|
verilog $setup, $hold, $width (0) | 2013.01.29 |
matlab index (0) | 2012.12.12 |
matlab_classdef (0) | 2012.12.12 |
C++ 생성자 (0) | 2012.12.12 |