Month: October 2017

Memory Management

Assignment Deliverables
The deliverables for this assignment are the following files:
proj07.makefile
– the makefile which produces “proj07”
proj07.student.c
– the source code file for your solution
Be sure to use the specified file names and to submit your files for grading via the CSE Handin system before the
project deadline.
Assignment Specifications
The program will simulate some of the steps that an operating system takes to manage a virtual memory system
which uses demand paging and a fixed allocation of page frames. The simulator will gather statistics about the
behavior of a single user process under a given page replacement algorithm.
The system to be simulated contains 1,048,576 bytes of RAM which is divided into 256 page frames. Logical
addresses are 16 bits in length.
1. Your program will process a file which contains the simulation parameters, as well as zero or more memory
references. The first line of the file will contain the page replacement algorithm to be studied (the character string
“FIFO”, “LRU” or “Clock”), and the second line will contain the number of page frames allocated to the process.
Each additional line of the file (if any) will contain the following information about a memory reference:
a) logical address being referenced (4 hexadecimal digits, with leading zeroes)
b) operation being performed (one character; R for read and W for write)
Items within a line will be separated by exactly one space, and each line will terminate with a newline.
2. For each memory reference in the file, your program will display one line with the following information:
a) logical address being referenced (4 hexadecimal digits, with leading zeroes)
b) page number (1 hexadecimal digit)
c) page offset (3 hexadecimal digits, with leading zeroes)
d) operation being performed (one character; R for read and W for write)
e) page fault flag (one character; F for page fault, blank otherwise)
f) write back flag (two characters; WB for write back, blanks otherwise)
g) physical address (5 hexadecimal digits, with leading zeroes)
Items within a line will be separated by exactly one space, and each line will terminate with a newline.
3. After the simulation is completed, your program will display the following:
a) simulation parameters (page replacement algorithm and number of frames allocated)
b) total number of memory references
c) total number of read operations
d) total number of write operations
The summary information will be appropriately labeled and formatted.
4. Your program will display the contents of the page table at the start of the simulation, after every N memory
references, and at the end of the simulation. The page table will not be displayed twice in succession (there must
be some intervening output related to memory references).
The display will contain one line for each page table entry:
a) index of the page table entry (one hexadecimal digit)
b) V bit (one character; 0 for not valid, 1 for valid)
c) R bit (one character; 0 for not referenced, 1 for referenced)
d) M bit (one character; 0 for not modified, 1 for modified)
e) frame number stored in that page table entry (2 hexadecimal digits, with leading zeroes)
Items within a line will be separated by exactly one space, and each line will terminate with a newline. The page
table display will begin and end with a blank line, and will include appropriate column headers.
The value of N will be a positive integer value; any other value (such as 0) will prevent the program from displaying
the contents of the page table.
5. Your program will accept two command-line arguments: an integer representing the value of N and the name
of the input file.
6. The three page replacement algorithms which your program will support are FIFO, LRU and Clock.
For FIFO page replacement, the page which has been in primary storage for the longest time will be selected as the
victim.
For LRU page replacement, the page which has been referenced least recently will be selected as the victim.
For Clock page replacement, your program will use the strategy described in the Stallings textbook.
For all of the page replacement algorithms, your program will use “less than” to break any ties. For example, if
frame 42 and frame 43 are both empty when a page fault occurs, your program will select frame 42.
7. The program will include appropriate error-handling.
Assignment Notes
1. You must use “g++” to translate your source code file in the CSE Linux environment.
2. The page frames allocated to the process being simulated will be consecutive and will begin with frame 20
hexadecimal.

Basic data sorter

Abstract
This is the second in a series of projects that will involve sorting a large amount of data. In this second phase, you will write a multi-process C program to sort a list of records of movies from imdb alphabetically by the data in a given column. You will make use of the concepts learned in lecture including file/directory access, forking processes, etc.

Introduction
File formats for this part of the project are the same as in the first. The CSV file with movie metadata will remain the same. The sorting algorithm will also remain the same. If you properly modularized your code in Project 0, you should be able to reuse almost all of your code.

In this project, you will read in a directory name and walk through the directory structure to find all .csv files. There may be multiple levels of directories that you will need to recurse through. You will then fork child processes to sort each of the files and output the results to a different file. You should NOT use exec for this project. You should write one program that, when copied from the parent to the child process, will continue running. You can use the return value of fork() in a conditional to choose different blocks of code to run within your code. You will want to make sure to prevent zombie and orphan child processes. You will also want to make sure you to not create fork bombs that will bring down machines. In all cases of bad input, you should fail gracefully (e.g. no segfaults).

You will output metadata about your processes to STDOUT. This metadata will show the total number of processes created and the pids of all created processes.

Methodology
a. Parameters
Your code will read in a set of parameters via the command line. Records will be stored in .csv files in the provided directory. As mentioned above, directory structures may have multiple levels and you must find all .csv files. Your code should ignore non .csv files and .csv files that do not have the correct format of the movie_metadata csv (e.g. csv files that have other random data in them).

Remember, the first record (line) is the column headings and should not be sorted as data. Your code must take in a command-line parameter to determine which value type (column) to sort on. If that parameter is not present (?-> throw an error, or default behavior). The first argument to your program will be ‘-c’ to indicate sorting by column and the second will be the column name:

./sorter -c food

Be sure to check the arguments are there and that they correspond to a listed value type (column heading) in the CSV.

For this phase you’ll extend your flags from one to three. The second parameter to your program will be ‘-d’ indicating the directory the program should search for .csv files. This parameter is optional. The default behavior will search the current directory.

./sorter -c food -d thisdir/thatdir

The third parameter to your program will be ‘-o’ indicating the output directory for the sorted versions of the input file. This parameter is optional. The default behavior will be to output in the same directory as the source file.

./sorter -c  movie_title -d thisdir -o thatdir

b. Operation
Your code will be reading in and traversing the entire directory. In order to run your code to test it, you will need to open each CSV and read it for processing:

./sorter -c  movie_title -d thisdir -o thatdir

Your code’s output will be a series of new CSV files outputted to the file whose name is the name of the CSV file sorted, with “-sorted-<fieldname>” postpended to the name.

e.g: ‘movie_metadata.csv’ sorted on the field ‘movie_title’ will result in a file named “movie_metadata-sorted-movie_title.csv”.

On each new file in a directory you encounter, you should fork() a child process to do the actual sorting.

On each new directory you encounter, you should fork() a child process to process the directory.

To STDOUT, output the following metadata in the shown order:

Initial PID: XXXXX
PIDS of all child processes: AAA,BBB,CCC,DDD,EEE,FFF, etc
Total number of processes: ZZZZZ

You may assume the total number of files and directories will not exceed 255.

c. Structure

Your code should use Mergesort to do the actual sorting of records. It is a powerful algorithm with an excellent average case.

Results:
Submit your “sorter.c”, “sorter.h” and “mergesort.c” as well as any other source files your header file references.

Document your design, assumptions, difficulties you had and testing procedure. Include any test CSV files you used in your documentation. Be sure to also include a short description of how to use your code. Look at the man pages for suggestions on format and content. Do not neglect your header file. Be sure to describe the contents of it and why you needed them.

Here are some extra credit options for you. You can choose to do either, both, or neither.

Extra Credit (10 points):
Interpret the -c option as a comma separated list of fields. Cascade your sort over the fields. For example, if the command is:

./sorter -c food,drink,price

The sorter should sort first on the field food, then on drink, then on price. E.g. if there are multiple records where food is “Vegetable”, the order of these records would then be sorted by the drink field. If there are multiple records where food is “Vegetable”, and drink is “Water”, then those records would further be sorted by the drink field.

Document your algorithm and show examples of some runs of your program on test data.

Extra Credit (10 points):
Create a metadata output file in addition to STDOUT. In this file, output the hierarchy of forked process ids and their associated directory/file names. If one views this file, it should be clear which processes forked which processes, and be able to follow the execution of code. Document the format of your file, and the design of the code that produces the output. Consider using the linux utilities like gnuplot to visualize your data. This might require the use of shared memory/pipe constructs to transfer data back and forth between processes.

Homework 3

About homework 3
Homework 3 and Homework 4 are the two parts of a homework on the topic of Data Flow Analyses. This
is the rst part which is a practical programming exercise developing an example data ow analysis with a
tool. The second part (i.e. homework 4) will be a handwritten assignment on the theory behind data ow
analyses.
Note: homework 3 is basically a freebie. The design is almost trivial, and a framework is given to you
to turn your design into an implementation with very little eort. We expect an average-good student to
do this in about 2 hours. So, a group of 4 students should have no problem to do this in virtually no time.
What will you do with all the extra times on your hands? Start doing your course projects!
Problem 1
Upwards Exposed Uses.
Upwards exposed uses serves a similar purpose to reaching denitions analysis.
Instead of nding which denitions reach each point, we nd which uses of a variable have not yet been
matched with one or more denitions. Given a control ow graph V, E the use of the variable x at vertex
v is upwards exposed at a vertex u (i.e., x, v ∈ U EU (u)) if
x is read at v, and
there exists a path from u to v along which no vertex assigns to x.
(a) (10 points) Formally dene the property space for this problem (i.e. a set and its corresponding meet
operator).
(b) (20 points) Dene the (statement) transformer functions for this analysis (stick to the simple language
that is used on the slides for all the other example analyses that you have seen).
(c) (10 points) Fully dene the analysis by characterizing whether it is forward or backward, what the
initial nodes are, and what value is given to these initial nodes and all the other nodes before the
default iterative work-list algorithm starts.
(d) (60 points) Having done all of the above properly, implement your design in SOOT and get an analyzer
for the upward exposed uses analysis.
Assignment Format and Guidelines on Submission
For this assignment, you will be implementing data ow analyses using SOOT, which was introduced to you
during the last tutorial.
For the problem you are given the base Java code. You will modify the le UpwardExposedUses.java.
You will not edit any other le. You are expected to submit only the le UpwardExposedUses.java.
(Any helper or additional classes you may need to create should be created within this le itself).
You must create your UpwardExposedUses class by extending any of the Data Flow Analysis classes
provided by Soot, and overriding the appropriate functions accordingly. You must not use any library or

package (other than Soot) or code from any such library or package- all code you submit must be your own.

Input-Output Format
The input will be in the form of a Java .class le. For the output you should print all the upward exposed
uses to a le called exposed-uses.txt in the same location from where the code is run.
Note that you should not print the upward exposed uses to STDOUT. Any output to STDOUT or
STDERR will be ignored. If the program creates any .jimple/.jimp les in the directory sootOutput, let
them stay there; you should not delete them.
Printing upward exposed uses
You should print, to the le exposed-uses.txt, all upward exposed uses with each upward exposed use
taking exactly 3 lines. If the the use of the variable x at node v is upwards exposed at the exit of node u
(i.e. if x, v ∈ U EU (u)) then you would print on the rst line the node u, on the second line the node v,
and on the third line the variable x. Make sure you have the order correct. You should make sure that each
node ts within a single line. If any node takes up more than one line, remove all newline characters from
the output of the node.
z = 2
y = 10
x = y*z
For example in the case of the above code, there are just three nodes in the graph- z = 2 y = 10, and x =
y*z, in which case the output to the le exposed-uses.txt should be:
z = 2
x = y*z
z
y = 10
x = y*z
z
y = 10
x = y*z
y
While actually running your code for the above case the names of the nodes, branch labels, and variables
in the control ow graph passed to your analysis class may vary slightly depending on Soot’s pre-processing,
and so the names of the nodes and variables printed to the le may be dierent; that is okay.
You may print the exposed uses in any order. But you should cover all upward exposed uses for the
exit points of all nodes and you should not repeat the same exposed use twice in the le. You are given a
particular test case and its expected output in the base code.
Evaluation
We will test your answer on CDF/CSLab machines by placing your submitted UpwardExposedUses.java
le in the base code source directory, compiling it, and running it on examples. The code should compile
and run as in the test.sh script. You should make sure your UpwardExposedUses.java le compiles and
runs with the provided base code on CDF/CSLab machines.
The code will be tested on dierent examples. For each correct output, you will get an equal fraction
of the problem points. In other words, if your code returns the correct output on all tests, you will get a
full mark, and for every wrong answer you will lose (proportionally) some points. Beyond this, there are no

partial marks for any partial eorts.

If the code fails to compile, returns a runtime error due to your fault, or fails to print the
results in the correct format to the exposed-uses.txt le you may not get any marks.
Summary of submission les
A PDF le for parts (a-c). Call it hw3.pdf.
The le UpwardExposedUses.java as described above for part (d).

Computer Science Homework

Please download
hw5_files.zip. and unzip it into the directory for your HW5. You will nd multiple data les
and a utility module, hw5util.py that will help with Part 1. We also provide you with a simple
test program hw5part1_test.py that demonstrates how to use the utility module to read in the
Part 1 data les.
The goal of this assignment is to work further with loops, lists and les. You are already familiar
with loops and lists. Files will be covered on Monday.
In Part 1 of the homework, you will use the utility module to read data from a le as shown in the
test program and you are required to use a double (i.e. nested) loop. In Part 2 of the homework,
you will need to read and process the data yourself, but cannot use a double loop. You may, of
course, use as many single loops as you like. Failure to follow these guidelines will lose you points.
Test your programs with smaller data les rst – use predictions_short.txt for Part 1 and
temp_data_short.txt for Part 2. This will allow you to more easily track if your programs are
working correctly or not before you try the full data sets les. In particular, the full temperature
le is quite large as you will nd out.
As always, make sure you follow the program structure guidelines. You will be graded on program
correctness as well as good program structure.
Remember as well that we will be continuing to test homeworks for similarity. So, follow our
guidelines for the acceptable levels of collaboration. You can download the guidelines from Piazza
in the resources section if you need a refresher.
Can you predict the future … of soccer?
Note that this part is intended as an exercise for double loops. You must use double loops for full
credit on this part!
A good friend of mine runs a soccer league before all major tournaments. Instructions for the
league are given at the end of this section for your entertainment. All participants have to guess all
games before the tournament starts and then when it ends, players are scored based on how well
they guessed. The scoring is simple:
Guessing the outcome of the game correctly (win/lose/draw) is 2 points.
Guessing the correct number of goals for either team is worth 1 point each.
The score for each player is the total score she gets for her guesses for all the games.
The player who has the highest score wins the league.

Your job in this homework is to ask the user for the name of a prediction le containing the guesses
by the league participants for a particular soccer tournament. Use the utility module we give
you to read the le and retrieve the data for the dierent players. As Dragnet would say, “The 

predictions you are about to read are real. The names have been changed to protect the innocent.”
(Are any of you familiar with Dragnet?) Your program will then:
Compute and print the scores for each player
Print the winner (or winners if there is a tie)
Compute and print the total score for each game across all the players
Print the hardest game or games to predict. This is the game or games with the lowest total
score.
Note that the rst two and the last two requests are almost identical, but iterate dierently.
Here is some more detail. The utility module we give you will read the prediction data into three
lists. For example, consider the following test program:

Python Basic

Overview and Requirements

For this assignment, you will implement a series of functions that test and implement sorting in linked lists containing numeric data (not any other data type).

Your implementation is permitted to include, but does not have to include, your own “helper functions” which you may implement and must comment clearly.

Your implementation is not permitted to include:

  • input from the user;
  • output to the screen (e.g., debugging statements) beyond that in the testing script;
  • any usage of Python’s built-in listtype in any way beyond that in the testing script.

Any assignment submission which uses Python’s built-in list type will receive a grade of zero.

Terminology

Non-decreasing order

A collection ordered such that each element is greater than or equal to the previous element (i.e., sorted with duplicates). For example, [1, 3, 5, 5, 7, 9, 9] is in non-decreasing order but [1, 3, 5, 3] is not.

Starting Code

Download a copy of Assn2Skeleton.py from OnQ and rename it Assignment2.py for your assignment submission. It contains some functions that will be useful for testing and the “stubs” of the functions you have to write for this assignment.

Your functions MAY NOT call these functions directly or use the Python built-in list type in any way. You may write additional helper functions of your own and use them in your required functions if you wish.

Functions

isSorted (4 points)

Write a function called isSorted which takes a linked list of numbers as a parameter and returns True if the list is sorted in non-decreasing order and False otherwise.

For example, if you create a linked list that is equivalent to [2,3,5,9], calling isSorted with that list should return True. But for a linked list equivalent to [2,3,9,5], it should return False.

Your function must not change the list or print anything, just return a Boolean result.

noDuplicates (4 points)

Write a function called noDuplicates which takes a linked list of numbers as a parameter. You may assume that this list is sorted in non-decreasing order. Your function must remove any duplicates from the list and return a pointer to the changed version of this list.

For example, if linkedListOfNumbers is a linked list equivalent to [2,2,3,5,5,5,5,7,7] and you call linkedListOfNumbers = noDuplicates(linkedListOfNumbers), linkedListOfNumbers should now contain to a linked list equivalent to [2,3,5,7].

insertInOrder (4 points)

Write a function called insertInOrder which takes two parameters:

1.a linked list of numbers, which you may assume is sorted in non-decreasing order; and,

2.a new value to add to the list (you may assume it’s a number)

Your function must add the value to the list and return a pointer to the changed version of the list. If the list already contains the value, add it as a duplicate.

For example, if linkedListOfNumbers is a linked list equivalent to [1,3,5,7] and you call linkedListOfNumbers = insertInOrder(linkedListOfNumbers, 5), linkedListOfNumbers should now point to a linked list equivalent to [1,3,5,5,7].

sortLinkedList (4 points)

This function must take a linked list of numbers as a parameter and return a pointer to a new list which contains the same values as the original list but is sorted in non-decreasing order. The linked list passed as a parameter must not be changed.

For example, if linkedListOfNumbers is a linked list equivalent to [4,2,6,3,4] and you call orderedLinkedListOfNumbers = sort(linkedListOfNumbers), orderedLinkedListOfNumbers should point to a linked list equivalent to [2,3,4,4,6] but linkedListOfNumbersshould still be [4,2,6,3,4] after the call.

The simplest way to do this sort is to use an algorithm called an “insertion sort”. Start with an empty list and add each element of the parameter list into the new list, putting it in the correct position to keep the new list in order. Once you’ve got insertInOrder working, this sort should be fairly easy.

Testing Script

Please download a copy of Assn2Testing.py from OnQ. This script will help you determine if your solution meets the administrative requirements for this assignment.

If you run it with your solution and it loads and runs without errors, it means that you have got all the right names, numbers of parameters, etc, and that your functions work correctly on one small test case. It is by no means a guarantee that your functions will work correctly with all examples, so you should also try your own test cases as well.

If you’re not able to write all of the functions for this assignment, this testing script can still be useful. You can comment out the parts relating to the functions you were not able to complete.

Database

Consider two relations A and B. A is of size 10,000 disk pages, and B is of size 1,000
pages. Consider the following SQL statement:
SELECT *
FROM A, B
WHERE A.a = B.a;
We wish to evaluate an equijoin between A and B, with an equality condition A.a = B.a.
There are 502 buffer pages available for this operation. Both relations are stored as
simple heap files. Neither relation has any indexes built on it.
Consider alternative join strategies described below and calculate the cost of each
alternative. Evaluate the algorithms using the number of disk I/O’s as the cost. For each
strategy, provide the formulae you use to calculate your cost estimates.
a) Page-oriented Nested Loops Join. Consider A as the outer relation. (1 mark)
b) Block-oriented Nested Loops Join. Consider A as the outer relation. (1 mark)
c) Sort-Merge Join (1 mark)
d) Hash Join (1 mark)
e) What would the lowest possible I/O cost be for joining A and B using any join algorithm
and how much buffer space would be needed to achieve this cost? Explain briefly. (1
mark)
Question 2 (5 marks)
Consider a relation with the following schema:
Executives (id: integer, name:string, title:string, level: integer)
The Executives relation consists of 100,000 tuples stored in disk pages. The relation is stored
as simple heap file and each page stores 100 tuples. There are 10 distinct titles in the
Executives hierarchy and 20 distinct levels ranging from 0-20.
Suppose that the following SQL query is executed frequently using the given relation:
SELECT E.ename
FROM Executives
WHERE E.title = “CEO” and E.level > 15;
Your job is to analyze the query plans given below and estimate the cost of the best plan
utilizing the information given about different indexes in each part.
a) Compute the estimated result size and the reduction factor (selectivity) of this query (1
mark)
b) Compute the estimated cost of the best plan assuming that a clustered B+ tree index
on (title, level) is (the only index) available. Suppose there are 200 index pages, and
the index uses Alternative 2. Discuss and calculate alternative plans. (1 mark)
c) Compute the estimated cost of the best plan assuming that an unclustered B+ tree
index on (level) is (the only index) available. Suppose there are 200 index pages, and
the index uses Alternative 2. Discuss and calculate alternative plans. (1 mark)
d) Compute the estimated cost of the best plan assuming that an unclustered Hash index
on (title) is (the only index) available. The index uses Alternative 2. Discuss and
calculate alternative plans. (1 mark)
e) Compute the estimated cost of the best plan assuming that an unclustered Hash index
on (level) is (the only index) available. The index uses Alternative 2. Discuss and
calculate alternative plans. (1 mark)
Question 3 (10 marks)
Consider the following relational schema and SQL query. The schema captures information
about employees, departments, and company finances (organized on a per department
basis).
Emp(eid: integer, did: integer, sal: integer, hobby: char(20))
Dept(did: integer, dname: char(20), floor: integer, phone: char(10))
Finance(did: integer, budget: real, sales: real, expenses: real)
Consider the following query:
SELECT D.dname, F.budget
FROM Emp E, Dept D, Finance F
WHERE E.did=D.did AND D.did=F.did
AND
E.sal ≥ 59000 AND E.hobby = ‘yodeling’;
The system’s statistics indicate that employee salaries range from 10,000 to 60,000, and
employees enjoy 200 different hobbies. There are a total of 50,000 employees and 5,000
departments (each with corresponding financial record in the Finance relation) in the
database. Each relation fits 100 tuples in a page. Suppose there exists a clustered B+ tree
index on (Emp.did) of size 50 pages.
a) Compute the estimated result size and the reduction factors (selectivity) of this query
(2 marks)
b) Compute the cost of the plans shown below. Assume that sorting of any relation (if
required) can be done in 2 passes: 1st pass to produce sorted runs and 2nd pass to
merge runs. Similarly hash join can be done in 2 passes: 1st pass to produce partitions,
2nd pass to join corresponding partitions. NLJ is a Page-oriented Nested Loops Join.
Assume that did is the candidate key, and that 100 tuples of a resulting join between
Emp and Dept fit in a page. Similarly, 100 tuples of a resulting join between Finance
and Dept fit in a page. (8 marks, 2 marks per plan)
Formatting Requirements
For each question, present an answer in the following format:
Show the question number and question in black text
Show your answer in blue text
For each of the calculations provide the formulae you used to calculate your cost estimates.

Java Program

This programming assignment has two programs:
1) a Java application (HourGlass) that displays an hourglass shape on the console outputting individual
hourglass and filler surrounding characters one character at a time using nested loops, and
2) extending the Mickey program from PA2 to be more Object-Oriented and add some additional functionality.
You are required to provide a text file named README, NOT Readme.txt, README.pdf, or
README.docx, etc. with your assignment in your pa3 directory. There should be no file extension after the file
name “README”. Your README should include the following sections:
Program Description ( 3 points ) :
Explain how the user can run and interact with each program. What sort of inputs or events does it take
and what are the expected outputs or results? How did you test your program? How well do you think
your program was tested?
Write your README as if it was intended for a 5 year old or your grandmother. Do not assume your
reader is a computer science major. The more detailed the explanation, the more points you will
receive.
Short Response ( 7 points ) : Answer the following questions:
Vim related Questions:
1. How do you jump to a certain line number in your code with a single command in vim?
For example, how do you jump directly to line 20? (Not arrow keys)
2. What command sets auto-indentation in vim? What command turns off (unsets)
auto-indentation in vim?
3. What is the command to undo changes in vim?
4. In vim, in command mode, how do you move forward one word in a line of code?
Move back one word? (Not arrow keys)
Unix/Linux related Questions:
5. How can you remove all .class files in a Unix directory with a single command?
6. How do you remove a Unix directory? Define the commands you would run on the terminal
when the directory is empty and when it is not empty. How do these command(s) differ?
7. What is the command to clear a Unix terminal screen?