Designing High Performant Systems for AI

1 [45+10 points]

Convolution of an image with a filter is a key operation in Convolutional Neural Networks(CNN). Consider the following algorithm to perform convolution between an H × W imageI using a k × k kernel K to produce output image O.

Listing 1: Naive Convolution Algorithm

1 for (Ix = 0; Ix < W; Ix++)

2 for (Iy = 0 ; Iy < H; Iy++)

3 for (Kx = 0; Kx < k; Kx++)

4 for (Ky = 0; Ky <k; Ky++) {

5 If (Ix+kx < W && Iy+Ky < H )

6 O[Ix][Iy] = O[Ix][Iy] + I[Ix+Kx][Iy+Ky]*K[Kx][Ky];

7 }

a) What are the dimensions of the output image O? [4 points]

b) What are the average data reuse rates of the elements of I, K and O in the algorithmabove? Data reuse rate is defined as the average of the ratio of the number of computeoperations performed on a data item and the number of times a data item is fetched.Assume there is no cache. Explicitly specify any other assumptions regarding localmemory that you make. [4.5 points]

c) What is the average data reuse rate of the entire algorithm? [1.5 points]Now consider the processor-memory architecture considered in the class. 2 pipelinesrunning at 2 Ghz. Memory latency of 10 ns (20 cycles) and a memory bandwidth of 64 bitsat 1 Ghz. Assume each data item is 32 bit.

c) Calculate the Sustained Performance (Best Case) of the algorithm. Use assumption 3that we discussed in the class for calculations. Do not worry too much about the edgecases (when If case is false). [5 points]

d) Calculate the Sustained Performance (Worst Case) of the algorithm. Again use assumption 3 that we discussed in the class for calculations. [5 points]

Assume you have a cache of size k24. Now, answer the following questions.

e) If we redesign the algorithm such that chunks of size k24of input image I are fetchedinto the cache and all the convolution operations in which this chunk is involved areperformed, what will be the actual data reuse rates of O and K. Note that the datareuse rate of I will be k2 as we are ignoring the edge cases for simplicity. Also, notethat there can be multiple such potential redesigns, so explain the idea of your redesignclearly. [10 points]

f) If we redesign the algorithm such that chunks of size k24of filter K are fetched into thecache and all the convolution operations in which this chunk is involved are performed,what will be the actual data reuse rates of O and I. Note that the data reuse rate ofK will be H × W. Also, note that there can be multiple such potential redesigns, soexplain the idea of your redesign clearly. [10 points]

g) Which of the above two redesigns of the algorithm will lead to maximizing the overalldata reuse, i.e., data reuse for all the data items combined ? [5 points]

h) (Extra Credits towards overall grade) Can you redesign the algorithm so that the datareuse rate of O turns out to be k2? Write the pseudo code for the same to receive theextra credits. [10 points]

Leave a Reply

Your email address will not be published. Required fields are marked *