Month: October 2023

Parrallel Gol

Programming Project #3: A parallel distributed implementation of Conway’s Game of Life (GoL) // orat least my version of it!

Assignment type: You can work in team sizes of at most two including yourself. If you want to workindividually, that’s fine too. But either way you should fill out the cover sheet. If you are working in teams,please ensure you are compliant with the submission requirements (see last section).

You are expected to the Pleiades cluster for this project.

Preliminaries: This project assumes that you have already understood how to write and test a simplehelloworld.c (https://wsu.instructure.com/courses/1650006/files/108598122?wrap=1)(https://wsu.instructure.com/courses/1650006/files/108598122/download?download_frd=1) MPI program onthe cluster. Please read the documentation(https://wsu.instructure.com/courses/1650006/pages/clusterinstructions) on how to use the Pleiadescluster, and how to compile and run an MPI job on the cluster before you work on this assignment.

For this project you may need to use one or more of the following MPI communication primitives:

System Integration

Background Information: Study Abroad – Assignment 2 by Dr. Peter

In recent years, the opportunity to incorporate international study into their degrees has become increasinglyaccessible for Australian students. Pursuing a study abroad experience is viewed as a pathway to enhancedemployability after graduation by students and as a valuable point of distinction among job applicants byemployers. Typically lasting one semester, these study abroad programs can also extend up to a year for thoseseeking a more extended international academic experience.

The Application Process:

The high-level process architecture of a study abroad application covers the core, support, and managementprocess. The core processes include activities of a student study abroad application and the studentadministration team revolving around an application (figure 1).

Project Brief (4 marks)

Provide a description on: the group meeting schedule or plan, each member’s role in the project, identifying risksand ways to manage risks, ways to communicate between members, and a log recording each meeting,discussion and dialogue between members in relation to the project. Note: all members need to contribute to theproject. The description and log need to be submitted together with the four tasks.

Task 1 Collaborative Business Process and EAI (4 marks)

When a student puts an application to study a course aboard, we need check her/his degree program structure,the relevant information about the course available in the partner’s university, to judge if the selected coursemeets the required credit points and the contents, and to check if her/his degree program allows such selection.A degree program structure normally contains the total credit points required, the core units required at differentlevel; while the unit information includes the prerequisite units and offering semester. Different universities mayhave different degree rules and course requirements.

MIPS Assembly

Part 1

Write a MIPS assembly language subroutine called PrintName that displays your username tothe console in SPIM. Use the syscalls discussed in class and in Appendix A of the textbook toaccomplish this. Your program should be named project_1_part_1.s. Submit ONLY thesubroutine, using the template found on the Projects page of the course website. Yoursubroutine will be graded by making multiple subroutine calls (using jal PrintName) toPrintName to ensure that it works repeatedly.

Part 2

Write a MIPS assembly language subroutine called GetCode that asks the user to enter a 7 bitcode consisting of ones and zeros. When the user is finished entering the data, they should hitthe Enter key. The data should be stored in memory as a NULL terminated ASCII string at theaddress passed into the routine in register a1. The user should be prompted for the data bydisplaying a prompt to the console asking them to enter the data. These prompts can be stored inthe beginning of the data segment, and should not reside outside of the range 0x10000000through 0x1000FFFF in memory. Use the syscalls discussed in class and in Appendix A of thetextbook to implement this subroutine. Your program should be named project_1_part_2.s.Submit ONLY the subroutine, using the template found on the Projects page of the coursewebsite. Your subroutine will be graded by making multiple subroutine calls (using jalGetCode) to GetCode to ensure that it works repeatedly.

Part 3

Write a MIPS assembly language subroutine called GetOnes that accepts a pointer to a NULLterminated ASCII string in register a1. The routine should count the number of ones (ASCIIvalue 0x31) in the string, and return that number is register v0. The string should not bemodified by your routine. Your program should be named project_1_part_3.s. Submit ONLYthe subroutine, using the template found on the Projects page of the course website. Yoursubroutine will be graded by making multiple subroutine calls (using jal GetOnes) to GetOnes toensure that it works repeatedly.

Part 4

Write a MIPS assembly language program that tests Parts 2 and 3. The program should allowthe user to repeatedly enter a code (calling GetCode) and obtain the number of ones in the code(calling GetOnes). When the number of ones are obtained, that result should be displayed on thescreen. The program should end when the user decides to quit. Your program should be namedproject_1_part_4.s.

Designing High Performant Systems for AI

1 [45+10 points]

Convolution of an image with a filter is a key operation in Convolutional Neural Networks(CNN). Consider the following algorithm to perform convolution between an H × W imageI using a k × k kernel K to produce output image O.

Listing 1: Naive Convolution Algorithm

1 for (Ix = 0; Ix < W; Ix++)

2 for (Iy = 0 ; Iy < H; Iy++)

3 for (Kx = 0; Kx < k; Kx++)

4 for (Ky = 0; Ky <k; Ky++) {

5 If (Ix+kx < W && Iy+Ky < H )

6 O[Ix][Iy] = O[Ix][Iy] + I[Ix+Kx][Iy+Ky]*K[Kx][Ky];

7 }

a) What are the dimensions of the output image O? [4 points]

b) What are the average data reuse rates of the elements of I, K and O in the algorithmabove? Data reuse rate is defined as the average of the ratio of the number of computeoperations performed on a data item and the number of times a data item is fetched.Assume there is no cache. Explicitly specify any other assumptions regarding localmemory that you make. [4.5 points]

c) What is the average data reuse rate of the entire algorithm? [1.5 points]Now consider the processor-memory architecture considered in the class. 2 pipelinesrunning at 2 Ghz. Memory latency of 10 ns (20 cycles) and a memory bandwidth of 64 bitsat 1 Ghz. Assume each data item is 32 bit.

c) Calculate the Sustained Performance (Best Case) of the algorithm. Use assumption 3that we discussed in the class for calculations. Do not worry too much about the edgecases (when If case is false). [5 points]

d) Calculate the Sustained Performance (Worst Case) of the algorithm. Again use assumption 3 that we discussed in the class for calculations. [5 points]

Assume you have a cache of size k24. Now, answer the following questions.

e) If we redesign the algorithm such that chunks of size k24of input image I are fetchedinto the cache and all the convolution operations in which this chunk is involved areperformed, what will be the actual data reuse rates of O and K. Note that the datareuse rate of I will be k2 as we are ignoring the edge cases for simplicity. Also, notethat there can be multiple such potential redesigns, so explain the idea of your redesignclearly. [10 points]

f) If we redesign the algorithm such that chunks of size k24of filter K are fetched into thecache and all the convolution operations in which this chunk is involved are performed,what will be the actual data reuse rates of O and I. Note that the data reuse rate ofK will be H × W. Also, note that there can be multiple such potential redesigns, soexplain the idea of your redesign clearly. [10 points]

g) Which of the above two redesigns of the algorithm will lead to maximizing the overalldata reuse, i.e., data reuse for all the data items combined ? [5 points]

h) (Extra Credits towards overall grade) Can you redesign the algorithm so that the datareuse rate of O turns out to be k2? Write the pseudo code for the same to receive theextra credits. [10 points]

Queue the Stacking of the Deque

In lecture, we have seen that a deque can serve both as a stack and as a queuedepending on usage. We have also seen that a deque can be implemented using alinked list or an array to store its data. We will employ both of those features in thisproject.

Because we will use a deque as the storage medium for our queues and stacks, wewill begin there. You will implement two deque classes, one using an array to storethe contents and one using a linked list to store the contents. Notice that regardlessof how you store the data, the six operations that define a deque must be present:two pushes, two pops, and two peeks. If something calls itself a deque, it mustprovide at least those six methods. The programming pattern that enforces this iscalled an interface or more generally an abstract base class. We provide a completeabstract base class called Deque in the file Deque.py. Notice that the file does notcontain implementations; it only defines the functions that must be present. If youattempt to inherit from Deque, the child class must contain the methods listedin the abstract base class, or Python will refuse to construct the object andterminate. This is done via the special method __subclasshook__:

def __subclasshook__(child): required_methods = {‘push_front’, ‘push_back’, \ ‘peek_front’, ‘peek_back’, \’pop_front’, ‘pop_back’} if required_methods <= child.__dict__.keys(): return True return False

Python will call this method to see if a child class is allowed to inherit from the Dequeclass—that is, are the six required methods a subset of the methods in the child’snamespace? (In Python a set B is a subset of set A if and only if A <= B is True.) If allsix methods are there, __subclasshook__ will return True, indicating that child lookslike a deque and instantiation can proceed. If any method is missing,__subclasshook__ will return False, indicating that the child does not qualify as adeque and instantiation cannot proceed. This file is complete; do not modifyDeque.py.

Linked_List_Deque and Array_Deque should both inherit from Deque.Linked_List_Deque’s constructor will initialize an empty linked list for the data.Array_Deque’s constructor will initialize an empty array for the data. The six requiredmethods will have different implementations in each class because we interact witharrays and linked lists differently. Except for performance differences, the user ofyour deque should not be able to tell whether the implantation used a linked list oran array. Their functionality (and string representations) must be identical.