Author: cs daixie

Prelab Assignment

You must meet all base requirements in the syllabus to receive any credit.
Work in your Prelab06 directory, and do not copy any data les into that folder.
Remember to commit & push the required le(s) to GitHub periodically. We will only grade
what’s been pushed to GitHub!
Do not push to GitHub any le that is not required. You will lose points if your repository
contains more les than required!
Make sure you le compiles. You will not receive any credit if your le does not compile.
Name and spell the le, and the functions, exactly as instructed. Your scripts will be graded by an
automated process. You will lose some points, per our discretion, for any function that does
not match the expected name.
Make sure your output from all functions match the given examples. You will not receive points
for a question whose output mismatches the expected result.
Unless otherwise specied, you cannot use any external library, but you can use any module in the
Python Standard Library to solve this lab, i.e. anything under:
Make sure you are using Python 3.7 for your Prelab.

Desperately Seeking Sutton

One aspect of research in reinforcement learning (or any scientific field) is the replication of
previously published results. There are a few benefits you might reap from replicating papers.
One benefit of replication is that it augments your understanding of the material. Another benefit
is that it puts you in a good position both to extend existing literature and consider new
contributions to your field. Replication is also often challenging. You may find that values of key
parameters are missing, that described methods are ambiguous, or even that there are subtle
errors. Sometimes obtaining the same pattern of results is not possible.
For this project, you will read Richard Sutton’s 1988 paper
. Then you will create an implementation and replication of the results
found in figures 3, 4, and 5. (It might also be informative to compare these results with those in
Chapter 7 of Sutton’s textbook: “
You will present your work in a written report of a maximum of 5 pages. The report should
include a description of the experiment replicated, how the experiment was implemented (the
environment, algorithms, etc), and the outcome of the experiment. You should describe how well
your results match the results given in the paper as well as any significant differences. Describe
any pitfalls you ran into while trying to replicate the experiment from the paper (e.g. unclear
parameters, contradictory descriptions of the procedure to follow, results that differ wildly from
the published results). What steps did you take to overcome those pitfalls? What assumptions did
you make? And, why were these assumptions justified? Add anything else that you think is
relevant to discuss.

Generic Sequences

The purpose of this project is to develop a generic class representing the sequence
data structure. For storage of data the class shall use an ArrayList object. You will also
review how to construct and use Java classes as well as obtain experience with
software design and testing.
ABET Program Learning Outcomes:
The ability to recognize the need for data structures and choose the appropriate
data structure (1, 2, 6)
The Problem
In this project you implement a sequence ADT as defined in the book. However, your
class will be ArrayList based rather than array based. The fields will be chosen
accordingly and in the application class you will test and demonstrate the behavior of
the sequence ADT.
Design, Implementation
The reading book provides good help and patterns for design and implementation.
However, as opposed to the concrete types used in the book, you shall implement a
Sequence class for the data structure. In another class named Applications
you shall test and exercise the Sequence methods. For this purpose the generic code
will be run with the concrete type Integer from the Java library.
1. Sequence class specification
This class will represent a set sequence, that is duplicate values are not
supposed to be in the sequence. You do not have to ensure this property at
adding elements, we may consider this a precondition for all methods. You may
opt for implementing a bag sequence, in that case you should change the field,
see comment below. The specification of a non-generic DoubleArraySeq class is
fairly well detailed in your reading, pp 150
– 158. Follow that design when
adequate and care for the differences as consequences of generic
implementation and ArrayList storage.

Vertebral Column Data Set

This Biomedical data set was built by Dr. Henrique da Mota during a medical residence
period in Lyon, France. Each patient in the data set is represented in the data set
by six biomechanical attributes derived from the shape and orientation of the pelvis
and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle,
sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is
used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and
Abnormal (AB). In this exercise, we only focus on a binary classication task NO=0
and AB=1.
(a) Download the Vertebral Column Data Set from: https://archive.ics.uci.
(b) Pre-Processing and Exploratory data analysis:
i. Make scatterplots of the independent variables in the dataset. Use color to
show Classes 0 and 1.
ii. Make boxplots for each of the independent variables.
Use color to show
Classes 0 and 1 (see ISLR p. 129).
iii. Select the rst 70 rows of Class 0 and the rst 140 rows of Class 1 as the
training set and the rest of the data as the test set.
(c) Classication using KNN on Vertebral Column Data Set
i. Write code for k-nearest neighbors with Euclidean metric (or use a software
ii. Test all the data in the test database with k nearest neighbors. Take de-
cisions by majority polling.
Plot train and test errors in terms of k for
k ∈ {208, 205, . . . , 7, 4, 1, } (in reverse order). You are welcome to use smaller
increments of k. Which k is the most suitable k among those values? Cal-
culate the confusion matrix, true positive rate, true negative rate, precision,
and F -score when k = k.1
iii. Since the computation time depends on the size of the training set, one may
only use a subset of the training set. Plot the best test error rate,which
is obtained by some value of k, against the size of training set, when the
size of training set is N ∈ {10, 20, 30, . . . , 210}.Note: for each N , select
your training set by choosing the rst N/3 rows of Class 0 and the rst
N N/3 rows of Class 1 in the training set you creatd in 1(b)iii. Also, for
each N , select the optimal k from a set starting from k = 1, increasing by 5.
For example, if N = 200, the optimal k is selected from {1, 6, 11, . . . , 196}.
This plot is called a Learning Curve.
Let us further explore some variants of KNN.

Practice Generic Nodes

In this assignment you are to implement a generic Node class with a type parameter for the
elements stored in a linked list.
Exercises – Requirements
1. (2 pts) Add a generic Node class to your Java project.
2. (16 pts total) In the class
(2 pts) Declare the usual fields data and link; follow the book specification but use
the generic type parameter as necessary
(8 pts) Define the accessor methods and mutator methods for both fields)
(6 pts) Define an initializer constructor, see the book
Remember the generic syntax: at the definition the name of the constructor is
Node, however every time you use Node for a type you must postfix the generic;
when the constructor is used to instantiate a Node object a concrete type for the
generic parameter like Node<String> must be used
3. (56 pts total, 8 pts each) Define the Node class methods as required below.
Define the addNodeAfter( ) method as explained in the PP presentation, use the
generic type as needed.
Define removeNodeAfter( ) method as explained in the PP presentation, use the
generic type as needed.
Define a non-static listLength( ) method; no parameter; the method iterates thru
the list from the calling object (this), counts the nodes and returns the count; use
while loop for iteration
Define a static listLength( ) method, see the book; must use its own generic type;
takes a node parameter; the method iterates thru the list from parameter node),
counts the nodes and returns the count; use for loop for iteration
Define a static listSearch ( ) method, see the book; must use its own generic type;
takes a node parameter, and a data parameter, both generic; the method iterates
thru the list from parameter node, checks if data is target (use the equals( )


You may work with a single partner, but you will have to write your own
solution. If you do work with a partner, please list both of your names on
your assignment. You are not allowed to discuss the assignment with anyone
other than your partner, the TAs, and the instructor.
Question 1) (5 points) Write out Church’s Thesis in your own words.
Question 2) (30 points)
Let L = {x ∈ {a, b, c}|x = anbncn, n ∈ N}. You will need to write out a
TM Mthat decides L.
a) (15 points) Describe in words how Mworks. That is, use descrip-
tions like “Write a ’#’ and move left” or “Move right until you see a
blank character”.
b) (15 points) Formally describe your TM, using the terminology
from example 3.9 in Sipser M2.
Question 3) (25 points)
a) (20 points) Let Mbe the automaton from Figure 3.8 in Sipser.
Suppose that Mis given an input of 0000. Give the rst 10 congu-
rations that Mwill take.
b) (5 points) Suppose that Mis given an arbitrary input x and
allowed to run, generating the congurations C0, C1, . . .. What is the
largest number of characters by which any Cand Ci+1 can dier?
Question 4) (20 points) Use dovetailing to show that the language
L= { M |M halts on at least one input}
is recognizable.
Question 5) (20 points) Show that any innite recognizable language has
an innite decidable subset.

Image Processing

Topic: Image enhancement in the spatial domain. (Chapter 3 of the text book) using
Matlab Image Processing Toolbox
1. Intensity transformation: Select a gray level transformation method (Gamma
correction, histogram equalization, etc.) and apply it to images. Discuss the
2. Spatial Filtering:
a. Select a linear smoothing filter and apply it to a noisy image. Vary the
filter size (3 x 3, 5 x 5, etc.) and compare the results.
b. Select an edge extraction filter and apply it to both noisy and noise-free
image and discuss your results.
c. Select an image sharpening filter and apply it to different kind of
images and discuss your findings.
d. Apply a non-linear order-statistic filter (e.g. median filter) on the noisy
image in part (a) and compare the results.
Report must contain the following information:
a. A brief discussion of the concept.
b. Spatial filters and parameter (if any) used.
c. Results (figures, tables, etc.) and concluding remarks.
d. Matlab code with necessary comments.
Note: You need to know how to add noise of different types to a clean image.

Image classification, dimensionality reduction

(a) Submit a zip file of the completed iPython notebooks to Tabula. Make sure to put comments in your
code explaining what you have done and what you have observed from the results.
(b) This assignment will contribute to 25% of your overall mark.
This assignment consists of 7 exercises which are split into 4 parts. The first 3 parts can be found in the lab
of Classification while the last part can be found in the lab of Dimensionality Reduction. Please download
the corresponding ipynb files from the module web page
Part 1: Binary Classification of MNIST Dataset
In the first set of tasks, you will evaluate a number of popular classifiers for the task of recognising
handwritten digits from MNIST dataset. Specifically, we will focus on distinguishing between 7 and
9 which are known to be a hard pair. You will use the scikit-learn libraries.
Binary Classification in scikit-learn
All classifiers in scikit-learn follow a common pattern that makes life much easier. Follow these steps for all
the tasks below.
1. Instantiate the classifier with appropriate parameters
2. Train/fit the classifier with training data and correct labels
3. Test the classifier with unseen data
4. Evaluate the performance of classifier

Compute and Elastic Scaling

In this assignment you will be using a web application that is provided to you (MiniTwit – a
simple Twitter clone) and deploying it in EC2 using many of the features you would want to use
for a service of this type. First you will set up your AWS account to be ready to create EC2
instances (virtual servers), and then you will create an instance and SSH to it. Then you will run
the MiniTwit application and see that you can reach and use the app from your web browser.
Next you will set up an EC2 instance to automatically start the MiniTwit application when the
instance launches, so that you don’t need to SSH in and run it manually. Then you will
configure Elastic Load Balancing (ELB) to distribute user (browser) connections between
multiple MiniTwit servers, and configure Auto Scaling to scale the number of MiniTwit servers
out and in, depending on the load (i.e., how many browser requests are hitting them each
Also, here are a few top-level documentation pages that you may use now and in future
projects too.
Part 1: Setting up EC2
In this part, you will create a single EC2 instance and run a simple web application on it using
the Flask framework for Python. Before you start, you will need your AWS account fully set up,
per the instructions that were emailed to you.
First, log in to AWS and go to the EC2 service. (You can use the AWS search bar, or expand “All
Services” and find EC2 in the Compute section.) If you’re in the right place, you should see a
toolbar on the left with options like Instances, AMIs, Volumes, Security Groups, etc. The first
thing you need to do is make sure you’re connecting to the correct Region (i.e., the correct data
center – Amazon has several datacenters worldwide for running AWS services). On the upper
right, between your account name and “Support” it should say the name of a location (e.g.,
Oregon, Ohio, N. Virginia, etc.). If it does not say Oregon, then click on it and choose the option
“US West (Oregon)” from the drop down. It should now say Oregon, and you will be using the
Oregon Region (data center) from now on. (You shouldn’t need to do this ever again, unless
you change it, but it’s always a good idea to double check each time you log in.)
Create a Key Pair
The next thing you need to do is go to the “Key Paris” section (under the NETWORK & SECURITY
heading). You should see a button at the top of the main pane called “Create Key Pair” – click on
this. You will be prompted to give it a name – name it whatever you think makes sense. (It won’t
matter much, since you’ll probably only have one key pair for now.) Your browser will then
prompt you to download a file with a .pem extension – this file contains your private key, which
you will use to SSH to your EC2 instances. Save your .pem file somewhere safe – if you lose it,
you cannot recover it! Should you lose your .pem file, your key pair will become useless; you
will need to create a new key pair then, save the new .pem file, and associate all your EC2
instances with the new key pair.
You will need to be sure the permissions are set correctly on your .pem file – if not, SSH may
refuse to connect! On Linux or MacOS, open a command prompt and change to the directory
containing your .pem file. Then set the permissions to read-only for the file owner with this
chmod 400 <pem key file>
Configure a Security Group
Finally, configure a default Security Group. This is basically a firewall, which restricts what kind
of network traffic from which hosts (machines on the network) is allowed to reach your EC2
instances. Click on “Security Groups” (under the NETWORK & SECURITY heading). There should
be one security group already there with a name along the lines of “default”. Select that  

security group. In the details section below the list of security groups, you will see
“Description”, “Inbound”, “Outbound”, etc. Click on the “Outbound” tab. It should say
something like “All traffic – All – All –”, which basically means instances using this
security group can send any network traffic to anywhere. This is fine; what they’re allowed to
receive is the real security concern, so now click on “Inbound”. You should see a rule that says
it will accept all traffic of all types incoming from a particular Security Group ID, and that ID
happens to be the ID of this group – in other words, instances in this security group can send
anything to other instances in the same security group. We’ll need to add a couple things to let
us reach these instances from outside the cloud, though. Click on the Edit button, and then use
the Add Rule button to add the following incoming rules:
SSH – TCP – 22 – Custom –
a. this allows SSH to your instances from campus computers, such as
b. the “Description” field is optional, but you may want to enter something to remind
yourself of what this rule is for when you look at it again weeks or months later
SSH – TCP – 22 – My IP
c. choosing “My IP” instead of “Custom” will automatically fill in your current IP
d. this allows SSH to your instances from the computer you are currently using – this
may be redundant if you are on campus, but setting this up with your home IP
address may be convenient for working from home
e. Note: if you change to a different computer, this does not follow you! so make sure
you’re on the computer you want to use for your work when you set up this rule.
Custom TCP Rule – TCP – 5000 –
f. this allows connecting to your instances from anywhere on port 5000, which we will
use instead of the default HTTP port 80 in order to keep our service a little more
private (although not really because anyone can easily find port 5000 by trial-and-
error) – in effect, the service on port 5000 will be publicly accessible!
Now that all that’s done, you probably won’t need to mess with it any more. (Maybe
occasionally, but not frequently… e.g., if you want to update the “My IP” rule for a new
Part 2: Launching an EC2 Instance
Now that everything’s set up, let’s launch an EC2 instance and run a web service. Click on
“Instances” (under the INSTANCES section), then click on the “Launch Instance” button. Now
you will have to choose which AMI (disk image) to use for your EC2 instance’s virtual hard drive.
On the left are several categories – choose the “My AMIs” category, then check the “Shared
with me” option.
You should see an image called “Flask Base” (possibly with a version number, like “Flask Base
v1.3”). If you do not see this AMI listed, please contact the instructor.

Homework 3

You are expected to solve the problems in the easier section on your own. For the harder section you are expected to
think about a problem alone for at least 24 hours before discussing it with others. Discussions are meant to help you
understand the problem better. You should not ask someone for solutions. If we nd evidence of cheating then you
will be reported to the university and could be subject to disciplinary action. Be prepared to come and explain your
solution to us in person if we feel the need. You can discuss the harder section in groups of 1, 2 or 3. In the end you
must write up your own solution and write names and RUIDs of students you collaborated with. For each problem,
justify your answer. Write formal proofs when asked for. Submit a single .pdf le on sakai containing your homework.
The name of the le should be your netid. Late homeworks will not be accepted.
In this homework whenever the problem mentions a string s of length n, you can assume that each of the n
characters is one of the 26 lowercase alphabets. If the problem mentions a bit string of length n, then each character is
either 0 or 1.
Easy Questions. 5 points each.
For each of the problems in the easy section it is enough to describe in English the idea of your algorithm, dene the
memo table, write the base case and the recursive step to ll the table, and analyze the run time. No need for pseudo
Given a string s of length n, design an algorithm that outputs the smallest number k such that s = w1w. . . wk
where each wis a palindrome. In other words, nd the smallest k such that s can be written as a concatenation
of k palindromes. For the denition of a palindrome see practice problems. For example if s = “add” then the
algorithm should output k = 2 since we can take w=“a” and w=“dd”. On the other hand, if s = “ada”, then
the algorithm should output k = 1.
Given two strings of length n and m, design an algorithm that outputs the length of the longest common substring
of the two strings. If A[1, . . . n] is the array representing a string, then any contiguous chunk A[i, . . . , j] is a
substring of A.
Given an array A[1, . . . , n] of integers, design a dynamic programming algorithm running in time O(n) to nd
indices i, j such that the subarray A[i, . . . , j] has the maximum sum.
You are given a set of n non-negative integers, and a target integer K. You need to nd out of there exists a
subset of the n integers that adds up to K. Design a dynamic programming algorithm for this problem that runs
in time O(nK).
Given a string of length n, a subsequence is any non empty subset of characters read from left to right. For
example, if A = atc then a, t, c, at, tc, ac, atc are all subsequnces of A. Given two strings of length n, m,
design an algorithm that outputs the length of the longest common subsequence (LCS) of the two strings.
Modify the algorithm for the longest common subsequence (LCS) problem that you designed above such that it
runs in time O(nm) but only uses O(min(n, m)) space.

Harder Questions
Weekend Planning (20 pts)
It’s Friday night and you have n parties to go to. Party i has a start time si, end time t> sand a value v≥ 0. Think
of the value as an indicator of how excited you are about that party. You want to pick a subset of the parties to attend so
that you get maximum total value. The only constraint is that in order to get a value vfrom party i you need to attend
the party from start to nish. Hence you cannot drop in midway and leave before the party ends. This means that if
two parties overlap, you can only attend one of them. For example, if party 1 has a start time of 7pm and ends at 9pm
and party 2 starts at 7:30pm and ends at 10pm, you can only attend one of them. On the other hand, if party 2 starts at
9pm or later then you can attend both. Given as input start times, end times and values of each of the n parties, design
an O(n log n) time algorithm to plan your night optimally. Describe in English the idea of your algorithm. If you use
dynamic programming dene the memo table, base case and recursive steps. You don’t need to write the pseudo code
or provide a proof of correctness.
Knapsack (20 pts)
In the knapsack problem we have n items, where each item i has an integer value vand an integer price pi, and we
also have a budget of B. The goal is to nd the maximum total value of items that can be picked without exceeding the
budget. In particular, out of all sets of items whose total price is at most B, we want the set of highest total value. In
class, we gave a dynamic programming algorithm to solve this problem whose running time was O(nB). One issue,
though, is that if the prices are large, then O(nB) may not be so good. In this problem, we want you to come up with
an alternative algorithm whose running time is O(nV ), where V is the value of the optimal solution. So, this would be
a better algorithm if the prices are much larger than the values. Describe in English the idea of your algorithm. Dene
the memo table that you will use and write the base case and recursive step needed to ll the table. Finally write the
pseudo code. You don’t need to provide a proof of correctness.
[Note: Your algorithm should work even if V is not known in advance, but you may want to rst assume you are given
V up front and then afterwards gure out how to remove that requirement.]
How to make money via DP (30 pts)
An option (specically, an “American call option”) gives the holder the right to purchase some underlying asset (e.g.,
one share of IBM) at some specied exercise price (e.g., $200) within some specied time period (e.g., 1 year). For
instance, if the price of IBM went up to $212 in this period, you could exercise the option and then sell the stock,
making $12 (or you could keep the option, hoping the price will increase further before the option expires).
If you have a probabilistic model for how a stock will behave, then this gives you a well dened notion of the value
of a given option: the expected prot for having the option if you were to follow the optimal strategy for deciding
when to exercise it under that model.
For example, to take an easy case, suppose we have an option to buy one share of IBM at $200 that expires right
now. If IBM is currently going for $210, then the value of this option is $10. If IBM is currently going for $190, then
the value of this option is 0 (we wouldn’t want to exercise it). However, suppose the option expires tomorrow, and
suppose our model says that each day, with probability 1/4 the share price goes up by $20, and with probability 3/4
the share price goes down by $10 . In that case, if IBM is currently worth $190, then the value of this option is $2.50
because that is our expected gain if we use the optimal strategy “wait until tomorrow, and then exercise the option if
IBM went up” (our expected gain under this strategy would be
× 10 +
× 0). If IBM is currently worth $210, then
the value of the option is $10 (because if you work it out, our optimal strategy in this case is to exercise the option
right away).
Formally, the value of an option is the expected prot it will produce under the optimal strategy for using the
option, given our probabilistic model for the stock. Note that the optimal strategy need not commit in advance to what
day the option will be exercised: the date it gets exercised (if ever) may depend on how the stock has performed so