Important: It is important that you complete the Pre-Lab portion of this assignment before you start the In Lab portion.
Auto-prefetch can decrease the miss rate if the prefetched blocks are used, but may also increase the amount of traffic between the main memory and the cache if the blocks go unused. In fact, if the prefetched blocks go unused, and bump other blocks out of the cache that would have been used, the miss rate can actually go up rather than down. A prefetch strategy is highly effective if it causes a substantial decrease in the miss rate with only a minor increase in memory traffic. For example, the miss rate might go down by 50% while the memory traffic goes up by 1%. In the other scenarios, where the miss rate doesn't go down much and/or the traffic goes up a lot, we can term the prefetch strategy ineffective (if the miss rate doesn't go down by much), costly (if the miss rate is substantially reduced, but at a high price in memory traffic), or counterproductive (if miss rate is marginally improved at best, and traffic goes way up).
The question you are to answer is this: what is the relationship between cache block size and total cache size and the cost and effectiveness of auto-prefetch? That is, if you consider a two-dimensional space of possible caches, where one dimension is block size and the other is total cache size, and you were to map out for each point in that two-dimensional space the percent increase in memory traffic that auto-prefetching would cause, what would it look like? Now suppose you mapped out for each point in the two-dimensional space the percent reduction (or increase!) in misses that auto-prefetching would cause, what does that look like? Putting these together, in what regions of the space is auto-prefetching cheap and highly effective? In which regions is it ineffective to the extent of being pointless? In which is it out and out harmful?
Before starting the lab, you should decide and write down the following two things:
We have available on-line traces from three benchmarks (tex, gcc, and
spice). Each of the traces contains about a million memory
references. The traces reside in the directory ~max/MC48, and
have filenames tex.din.gz, cc1.din.gz, and
spice.din.gz respectively. The .gz suffix indicates that they
are compressed using the gzip program (to save disk space); to
decompress them at the same time you run dineroIII, you can use a
command that decompresses the file and "pipes" the decompressed
version directly into dineroIII. On one of our Linux PCs, the command
would be
zcat ~max/MC48/name.din.gz | ~max/MC48/dineroIII optionswhere name is the name of the trace you want to use and options is a list of dineroIII parameters as specified in the accompanying manual page for dineroIII. Note that at a bare minimum you need to specify the
-b option and either
the -u option or the -i and -d
options. The -f option will also be particularly
relevant to your investigation. Remember that the block size for the
-b option is measured in bytes, so needs to be a multiple
of four.
You should observe the improvement prefetching makes in the number of misses, specifically the so-called ``demand'' misses, i.e., the misses on those blocks actually requested by the CPU, rather than by the prefetching. You should also observe the increase in memory traffic caused by unnecessary prefetches. Further, it would be interesting to see whether instruction and data references are equally suited to auto-prefetching. (DineroIII prints out separate statistics in each category as well as totals.)
for size in 16k 32k 64k; do
for block in 4 8; do
for bench in cc1 spice tex; do
for prefetch in d a; do
echo -n $bench $size $block $prefetch
zcat ~max/MC48/"$bench".din.gz | ~max/MC48/dineroIII -b$block -d$size \
-i$size -f$prefetch > "$bench"."$size"."$block"."$prefetch"
echo ""
done
done
done
done
Note that although you could literally type something like the above
into a shell window it is probably more sensible to store it in a file
and then run it from the file. (Be sure to end the file with a
newline after the last "end", i.e., press the enter key after the last
"end".) If you have your script in a file called
script, you could run it using the command
bash script
declare -i a # Optional: declared a and d to be integers
declare -i d # (This just effects whether whitespace gets removed.)
for file in *.a; do
base=${file%.*}
a=`head -26 $base.a | tail -1 | cut -f 3`
d=`head -22 $base.d | tail -1 | cut -f 3`
result=`echo "scale = 2 ; ($d - $a) / $d" | bc`
echo "$base: ($d - $a) / $d = $result"
done
declare -i a # Optional: declared a and d to be integers
declare -i d
for file in *.a; do
base=${file%.*} # remove the extension (that is the .a) from the file name
# head is a program which gets the first n lines of a file (or standard input)
# tail gets the last n lines of a file (or standard input)
# cut extracts a range of characters (-c) or tab-separated fields (-f)
a=`head -26 $base.a | tail -1 | cut -f 3`
d=`head -22 $base.d | tail -1 | cut -f 3`
# bc is a simple calculator. The "scale" is the number of digits after the decimal point to print.
result=`echo "scale = 2 ; ($d - $a) / $d" | bc`
echo "$base: ($d - $a) / $d = $result"
done
dc.
For example, in
set ratio = `dc -e "2 k $d $a / p"`the
-e means execute the argument, 2 k sets
the precision to 2 decimal places, $d $a / divides
$d by $a, and p prints the result.