Author: cs daixie

Queue the Stacking of the Deque

In lecture, we have seen that a deque can serve both as a stack and as a queuedepending on usage. We have also seen that a deque can be implemented using alinked list or an array to store its data. We will employ both of those features in thisproject.

Because we will use a deque as the storage medium for our queues and stacks, wewill begin there. You will implement two deque classes, one using an array to storethe contents and one using a linked list to store the contents. Notice that regardlessof how you store the data, the six operations that define a deque must be present:two pushes, two pops, and two peeks. If something calls itself a deque, it mustprovide at least those six methods. The programming pattern that enforces this iscalled an interface or more generally an abstract base class. We provide a completeabstract base class called Deque in the file Deque.py. Notice that the file does notcontain implementations; it only defines the functions that must be present. If youattempt to inherit from Deque, the child class must contain the methods listedin the abstract base class, or Python will refuse to construct the object andterminate. This is done via the special method __subclasshook__:

def __subclasshook__(child): required_methods = {‘push_front’, ‘push_back’, \ ‘peek_front’, ‘peek_back’, \’pop_front’, ‘pop_back’} if required_methods <= child.__dict__.keys(): return True return False

Python will call this method to see if a child class is allowed to inherit from the Dequeclass—that is, are the six required methods a subset of the methods in the child’snamespace? (In Python a set B is a subset of set A if and only if A <= B is True.) If allsix methods are there, __subclasshook__ will return True, indicating that child lookslike a deque and instantiation can proceed. If any method is missing,__subclasshook__ will return False, indicating that the child does not qualify as adeque and instantiation cannot proceed. This file is complete; do not modifyDeque.py.

Linked_List_Deque and Array_Deque should both inherit from Deque.Linked_List_Deque’s constructor will initialize an empty linked list for the data.Array_Deque’s constructor will initialize an empty array for the data. The six requiredmethods will have different implementations in each class because we interact witharrays and linked lists differently. Except for performance differences, the user ofyour deque should not be able to tell whether the implantation used a linked list oran array. Their functionality (and string representations) must be identical.

Algorithmic Game Theory and Applications

Each question counts for 50 points, for a total of 100 points for AN-
SWERING *** TWO *** QUESTIONS ONLY. (DO NOT SUBMIT AN-
SWERS FOR MORE THAN TWO QUESTIONS. ONLY TWO WILL BE
MARKED.)
1. (a) (25 points) Consider 2-player strategic form game, G, speci ed
by the following \bimatrix”:
2664
(7; 4) (9; 5) (2; 3) (2; 6)
(4; 1) (9; 2) (2; 1) (3; 6)
(8; 7) (6; 7) (5; 8) (2; 7)
(3; 3) (7; 4) (4; 6) (2; 4)
3775
As usual, player 1 is the row player, and player 2 is the column
player. If the content of the bimatrix at row i and column j is
the pair (a; b), then u1(i; j) = a and u2(i; j) = b.
Compute all of the Nash equilibria (NEs) of this game G, together
with the expected payo to each player in each NE. Explain why
any pro le x that you claim is an NE of G, is indeed an NE of G,
and furthermore, explain why there are no other (pure or mixed)
NEs of G, other than the pro le(s) you claim are NE(s) of G.
(b) (25 points) Consider a nite game, G, with pure strategy sets
S1; : : : ; Sn for the n players, and with a payo function ui(s)
for each player i 2 f1; : : : ; ng that assigns a payo to each pure
strategy pro le s = (s1; : : : ; sn) 2 S = S1  S2  : : :  Sn.
Now consider a di erent n-player game, G0, which has exactly the
same strategy sets S1; : : : ; Sn, as G, but where the payo function

u0i(s) for each player i di ers from ui(s) as follows:
u0i(s) = ui(s) + gi(s􀀀i)
where, for each player i, gi : S􀀀i ! R is a function (any function)
that depends only on the other players’ pure strategies and not
on player i’s own pure strategy.
Prove that G and G0 have exactly the same set of Nash equi-
libria. In other words, prove that a mixed strategy pro le x =
(x1; : : : ; xn) is a NE for G if and only if x is a NE for G0.
(Note, however, that the same NE may yield entirely di erent
payo s to the di erent players in G and in G0.)

2. (a) (20 points) Consider the 2-player zero-sum game given by the
following payo matrix for player 1 (the row player):
266664
5 6 4 7 3
6 4 5 3 8
7 2 5 4 6
4 7 3 5 7
3 5 8 6 3
377775
Compute both the minimax value for this game, as well as a min-
imax pro le (NE), i.e. a pair of minmaximizer and maxminimizer
strategies for players 1 and 2, respectively.
(You can, for example, use the linear programming solver package
linprog in MATLAB, available on DICE machines, to do this.
To run MATLAB, type \matlab” at the shell command prompt.
For guidance on using the linprog package, see:
http://uk.mathworks.com/help/optim/ug/linprog.html.)
(b) (30 points) Recall from Lecture 7 on LP duality, the symmetric 2-
player zero-sum game, G, for which the (skew-symmetric) payo
matrix (in block form) for player 1 is:
B =24
0 A 􀀀b
􀀀AT 0 c
bT 􀀀cT 0
35
Suppose that there exist vectors x0 2 Rn and y0 2 Rm, such
that Ax0 < b, x0  0, AT y0 > c and y0  0. (Note the two
strict inequalities.) Prove that for the game G, every minmaxi-
mizer strategy w = (y; x; z) for player 1 (and hence also every
maxminimizer strategy for player 2, since the game is symmetric)
has the property that z > 0, i.e., the last pure strategy is played
with positive probability. (Recall that this was one of the missing
steps in our sketch proof in the lecture that the minimax theorem
implies the LP duality theorem.)
(Hint: Let w = (y; x; z) be a maxminimizer strategy for player
2 in the game G. Note that the value of any symmetric 2-player
zero-sum game must be equal to zero. This implies, by the min-
imax theorem, that Bw  0. Suppose, for contradiction, that
z = 0. What does this imply about Ax , AT y, and bT y􀀀cT x?
Then if y 6= 0, show that this implies (y)T (Ax0 􀀀 b) < 0. In
turn, show that it also implies (x)T (AT y0 􀀀 c) > 0. Use these
and related facts to conclude a contradiction.)

3. Consider the following simple 2-player zero-sum games, where the pay-
o table for Player 1 (the row player) is given by:
A =  1 􀀀1
􀀀1 1 
We can view this as a game where each player chooses \heads” (H) or
\tails” (T), where the rst strategy for each player is denoted H and
the second strategy is denoted T.
(a) (10 points) First, what is the unique Nash equilibrium, or equiva-
lently the unique minimax pro le of mixed strategies for the two
players, in this game? And what is the minimax value of that
game?
(b) (40 points) Now, suppose that the two players play the same game
you have chosen in part (a), against each other, over and over
again, for ever, and suppose that both of them use the following
method in order to update their own strategy after each round of
the game.
i. At the beginning, in the rst round, each player chooses ei-
ther of the pure strategies, H or T, arbitrarily, and plays
that.
ii. After each round, each player i accumulates statistics on how
its opponent has played until now, meaning how many Heads
and how many Tails have been played by the opponent, over
all rounds of the game played thusfar. Suppose these num-
bers are N Heads and M Tails.
Then player i uses these statistics to \guess” its opponents
\statistical mixed strategy” as follows. It assumes that its op-
ponent will next play a mixed strategy , which plays Heads
with probability N=(N+M) and plays Tails with probability
M=(N +M).
Under the assumption that its opponent is playing the \sta-
tistical mixed strategy” , in the next round player i plays a
pure strategy (H or T) that is a pure best response to .
If both H and T are a best response at any round, then
player i can use any tie breaking rule it wish in order to
determine the pure strategy it plays in the next round.
iii. They repeat playing like this forever.

Prove that, regardless how the two players start playing the game
in the rst round, the \statistical mixed strategies” of both play-
ers in this method of repeatedly playing the game will, in the long
run, as the number of rounds goes to in nity, converge to their
mixed strategies in the unique Nash equilibrium of the game.
You are allowed to show that this holds using any speci c tie
breaking rule that you want. Please specify the precise tie break-
ing rule you have used. (It turns out that it holds true for any
tie breaking rule. But some tie breaking rules make the proof a
lot easier than others.)

4. (a) (40 points) One variant of the Farkas Lemma says the following:
Farkas Lemma A linear system of inequalities Ax  b has a
solution x if and only if there is no vector y satisfying y  0 and
yTA = 0 (i.e., 0 in every coordinate) and such that yT b < 0.
Prove this Farkas Lemma with the aid of Fourier-Motzkin elim-
ination. (Hint: One direction of the \if and only if” is easy.
For the other direction, use induction on the number of columns
of A, using the fact that Fourier-Motzkin elimination \works”.
Note basically that each round of Fourier-Motzkin elimination
can \eliminate one variable” by pre-multiplying a given system
of linear inequalities by a non-negative matrix.)
(b) (10 points) Recall that in the Strong Duality Theorem one possi-
ble case (case 4, in the theorem as stated on our lecture slides)
is that both the primal LP and its dual LP are infeasible. Give
an example of a primal LP and its dual LP, for which both are
infeasible (and argue why they are both infeasible).

Machine Learning

You may use any programming l anguage you l ike (Python, C++, C, Java… ) . All programming must be
done individually f rom rst principles. You are only permitted to use existing t ools f or simple l inear algebra
such as matrix multiplication/inversion. Do NOT use any toolkit that performs machine learning
functions and do NOT collaborate with your classmates. Cite any resources that were used.
In t his project you will practice the basics of Machine Learning Classi cation by creating a K-NN clas-si er
for two datasets. You will also practice good practices f or how to describe, evaluate, and write up a report on
the classi er performance.
It i s expected that your project report may require 2 pages per dataset i f you are good about making
interesting gures and making them not too l arge, or 3 pages i f your gures are big.
Datasets: The project will explore two datasets, the famous MNIST dataset of very small pictures of
handwritten numbers, and a dataset that explores the prevelance of diabetes in a native american tribe
named the Pima. You can access the datasets here:
1. https://www.kaggle.com/uciml/pima-indians-diabetes-database
2. https://www.kaggle.com/c/digit-recognizer/data
Programming Task: For each dataset, you must create a K-NN classi er that uses the training data to
build a classi er, and evaluate and report on the classi er performance.
(30 points) Dataset details: Describe the data and some simple visualizations (for images, a few exam-
ples from each category; for other data, perhaps some scatter plots or histograms that show a big picture of
the data). Describe your training/test split for K-NN and justify your choices.
(15 points) Algorithm Description: K-NN is a very clear algorithm, so here describe any data pre-
processing, feature scaling, distance metrics, or otherwise that you did.
(45 points) Algorithm Results: Show the accuracy of your algorithm|in the case of the Pima Dataset,
show accuracy with tables showing false positive, false negative, true positive and true negatives. For the
Pima Dataset, use three di erent distance metrics and compare the results.
In the case of the MNIST digits show the complete confusion matrix. Choose a single digit to measure
accuracy and show how that number varies as a function of K.
(10 points) Runtime: Describe the run-time of your algorithm and also share the actual “wall-clock”
time that it took to compute your results.

C++ Program Design

Raven Elevators Inc. (REI), a manufacturer of elevators, has hired you to build them a simulator for their line of elevator control systems. You happily accept the task, eager to impress your new employer with your development skills. REI asked you to implement it in Qt C++, but before doing the implementation REI requested that you first deliver use cases, design documentation, traceability matrix, and C++ class interfaces.
Learning objectives:
 Designing and expressing your design in UML
 Verifying consistency between use cases and design
 Building a requirements traceability matrix
 Designing for variability in elevator allocation strategy
Deliverables:
 Use cases (can borrow from A1 & grading feedback)
 Design documentation – structure and behavior:
o UML Class diagram
o Sequence diagrams for these scenarios: 1 Basic use cases and 5 safety features
o Activity or state diagram (where relevant)
o Textual explanation of your design decisions including use of design patterns, if any.  C++ header files (interfaces and significant variables)  Traceability matrix
Your design should include passenger and sensor actors driving the elevator controller responses that are displayed through a simple GUI.
Your design should accommodate for variability in elevator allocation strategy (handle 2 or more allocation strategies).

Elevator system specification (same as Assignment 1)
<Paragraph 1> A building is serviced by M elevators (also called cars). On each of the N floors is a pair of buttons marked “up” and “down. When a button is pressed it illuminates, and remains illuminated, until an elevator arrives to transport the customers who, at this floor, have requested an elevator going in a certain direction. When the elevator arrives, it rings a bell, opens its doors (the elevator and floor doors) for a fixed time (10 seconds) allowing people to exit or board, rings the bell again, closes its doors and proceeds to another floor. Once on-board passengers select one or more destination floors using a panel of buttons; there is one button for every floor. The elevator has a display which shows passengers the current floor of the elevator. There is also a pair of buttons on the elevator control panel marked “open door” and “close door”. These buttons can be used by a passenger to override the default timing of the doors. The door will remain open beyond its default period if the “open door” button is held depressed; the doors can be closed prematurely by pressing the “door close” button. Inside the elevator there is also a help button linked to building safety service.
<Paragraph 2> Each elevator has a sensor that notifies it when it arrives at a floor. The elevator control system should ensure that the group of elevators services all (floor and on-board) requests expeditiously.
<Paragraph 3> Each elevator has a display and an audio system. The display shows the current floor number and warning messages that are synced with audio warnings.
Safety features:
<Paragraph 4> Help: The control system receives a “Help” alarm signal from an elevator indicating that the “Help” button has been pressed. In that case, the passenger is connected to building safety service through a voice connection. If there is no response from building safety within 5 seconds or if there is no response from a passenger a 911 emergency call is placed.
<Paragraph 5> Door obstacles: If the light sensor is interrupted when the door is closing, the control system stops the door from closing and opens it. If this occurs repeatedly over a short period of time, a warning is sounded over the audio system and a text message is displayed.
<Paragraph 6> Fire: The control system receives a “Fire” alarm signal from the building and commands all elevators to move to a safe floor. Similarly, a “Fire” alarm signal from the elevator itself will cause that elevator to go to a safe floor. In both cases an audio and text message are presented to passengers informing them of an emergency and asking them to disembark once the safe floor is reached.
<Paragraph 7> Overload: The control system receives an “Overload” alarm signal from an elevator if the sensors indicate that the passenger or cargo load exceeds the carrying capacity. In that case, the elevator does not move and an audio and a text messages are presented to passengers asking for the load to be reduced before attempting to move again.
<Paragraph 8 > Power out: The control system receives a “Power Out” alarm signal. In that case, an audio and a text messages are presented to passengers informing them of the power outage. Each elevator is then moved to a safe floor and passengers are asked to disembark via audio and text messages. The battery backup power is sufficient to do all of this.

Finding water level changes via satellite imagery

Please read this assignment carefully.
This coursework is concerned with the creation of a library to analyse data from different instruments onboard
a satellite. The instruments provide daily updates from a piece of land that can be used to track
water levels at different locations.
This assignment asks you to work collaboratively within your team to create a package. For that you will
need to write some code for querying, loading and analysing a dataset about different portions of land in
a map. We will describe how the code must behave, but it is up to you to fill in the implementation. The
package needs to follow all the good practices learnt in the course, that is, the package should: be version
controlled; include tests; provide documentation and doctests; set up command line interfaces; and
be installable. Besides this, you will also need to modify an existing implementation of a provided script
to make it more readable, more efficient, and measure its performance.
The collaboration aspect should be organised and managed using GitHub.
The exercise will be semi-automatically marked, so it is very important that your solution adheres to the
correct interface, file and folder name convention and structure, as defined in the rubric below. An otherwise
valid solution that doesn’t work with our marking tool will not be given credit.
For this assignment, you can use the Python standard library and other libraries you may wish to use (but
make sure they are clearly set as dependencies when installing your package). Your code should work with
Python 3.9.
First, we set out the problem we are solving. Next, we specify the target for your solution in detail. Finally,
to assist you in creating a good solution, we state the marking scheme we will use.
1 Background information
The Irish Space Agency has launched Aigean, an Earth observation satellite to monitor an area around
Lough Ree. Recently, rainfall has decreased in the area, and during the latest years droughts have become
more frequent and more severe. With the instruments on board Aigean the scientific community will be
able to obtain better data about the water levels and the erosion of the land, and therefore will be able to
generate more accurate predictions.
However, the Irish Space Agency sadly hasn’t provided any software tools to do this analysis!
Thankfully, Aoife O’Callaghan, a geology PhD student at the Athlone City Institute, has set the objective
to solve this problem by creating an open-source package to analyse Aigean data. Aoife has some ideas of
what she would like the package to do, but she doesn’t have a research software development background
beyond how to install and use Python libraries. That’s why Aoife has contacted you!
You and your group members agree this is a great tool to offer to the community and have decided to put all
your brains together to come up with an easy-to-use Python library to analyse and visualise Aigean satellite
data.

What do we know? What do we have? What do we want?
1. Aigean has multiple instruments, as an starting point we only need to focus on the imagers and the
radar.
2. There are three imagers on board of the spacecraft. Their only differences are in their resolution (how
much area they cover per pixel) and their field-of-view (how much they can see in a single image).
3. The three imagers are called: Lir, Manannan and Fand. •Lir has the largest field-of-view, but the smaller resolution with a pixel size of 20 m per pixel; •Manannan provides a smaller field-of-view with a better resolution of 10 m per pixel; and •Fand has the smallest field-of-view but a very high resolution of 1 m per pixel.
4. The radar is called Ecne and it provides three measurements for the deepest areas in the region.
5. Each instrument provides data in a different format, but the imagers share a common set of metadata.
6. A number of images are taken every day, however not all the land is fully covered in a single day, it
depends on the satellite orbits. Ecne, however, takes always measurements of the same points.
7. All the data is available at the Irish Space Agency webservice archive.
8. The Python library – aigeanpy – should be able to query, download, open, process and visualise the
satellite images.
9. We want to create three command line tools to provide access to some functionality from outside
Python.
10. We have a script from a post-doc of Aoife’s group that implements the so-called k-means algorithm for
clustering data points. We want to include it in our library too! It will help people to analyse different
land areas based in their parameters.
11. We are also interested on how to make our code, specifically the k-means algorithm, more efficient.
This will be used to analyse Ecne’s data.
12. We want this tool to be used by any researcher, so it needs to be easy to install and use. This includes
having good documentation about how to use it and how to acknowledge it in the publications that
benefit from it.
13. And we also want to make it easier to others to contribute so we need to provide information about
how we would like others to contribute.
Let’s look at what we’ve got access to already:
1.1 The data archive webservice
The Irish Space Agency data archive is located at: https://dokku-app.dokku.arc.ucl.ac.uk/isa-archive/ and
their main page provides some information about how to query this service.
The website offers two services. One is used to query the catalogue, and the other to download a file from
the archive.
The results from the query service are provided as JSON files with the properties of the observations found
in the specified time range (and instruments). These files include information about the date and time of
the observations, the instrument used, the field of view observed and the filename where that observation is
stored. We can download that files using the filename as an argument to the download service. The format
from the observation files vary depending on the instrument (specified in the following section).
Read the information on the archive website to understand how to query the service, what parameters are
accepted and what are the defaults.
We need to create a set of tools within the Python package to query and download the files. They need to
be available from aigeanpy.net.query_isa and aigeanpy.net.download_isa. They must accept
all the parameters listed on the website. Additionally, the download_isa need to allow the user to specify
where to download the file (save_dir).

1.2 Different instruments, different file types
Data from each instrument is provided in a different type of file.
Lir uses the Advanced Scientific Data Format (ASDF). The asdf Python library can read them and extract
the data and metadata from these files.
Manannan uses Hierarchical Data Format 5 (HDF5). As with the asdf, this type of file contains the data
and the metadata together. The h5py Python library can load them.
The Fand instrument stores the data in npy format and the metadata in JSON files. npy files can be read
from NumPy’s load and the Python Standard Library provides support to load JSON files. The archive
provides that pair of files in a single zip file (for which Python Standard Library also provides a module to
load: zipfile).
Finally, the Ecne instrument doesn’t take images, but infers some measurements of the 300 deepest areas in
the region. The measurements are turbulence, salinity and algal density for these points. They are stored
in CSV.
􀄎 Ideally a user shouldn’t need to unzip the file before loading it with the library. The io.BytesIO
class can help you to load the file in memory. Take a look at how it’s used on the exemplar at the
beginning of our course notes.
1.2.1 Getting the coordinates right
Arrays are stored in Python as (rows, columns). However, we normally refer to places in a map as (x, y)
coordinates (with x running from left to right, and y running from bottom to top). Also when displaying an
image in matplotlib with imshow, by default, you’d get the axis as its origin is in the top-left corner and
positive y-values going downwards. For this library, we will need to manage two type of coordinate systems:
pixels and earth.

Figure 1: Difference between the two coordinate systems. The plot in the left shows the default when
visualising an array, the (0, 0) is on the top left corner. On the right, the same array is shown as a map,
which a set of (x, y) coordinates are represented within a pixel. In this case each pixel corresponds to 10
meters and the origin is within the second row, and the first column.
To ease the conversion between the coordinate systems you’ll need to create two helper functions, which are
called earth_to_pixel and pixel_to_earth.

Each image will come with an array of a particular size (and shape) and the metadata will provide the
resolution (in meters per pixel), the earth x- and y-coordinates (in meters) as the (lower, upper)
boundaries for each axis. The field of view (i.e., the difference between the boundaries) divided by the
resolution should give you the shape of the array (in the (columns, rows) order).

Core Scanner

Overview
The goal of this project is to build a scanner for a version of the Core language, a pretend
language we will be discussing in class.
For this project you are given the following:
ˆ \3341 Project 1.pdf” – This handout. Make sure you read it completely and handle all
requirements in your implementation. You are encouraged to post any questions on
Piazza.
ˆ \main.c”, \core.h”, \scanner.h”, \scanner.c” – I have outlines the project in this les,
and give you some of the code you will need. Make no changes to to \core.h” le.
You can make changes to \main.c” as long as those changes do not change break my
tester.sh script.
ˆ You may create additional les to contain any additional classes or methods you want
to create.
ˆ \tester.sh” – This is a script I wrote to help you test your project. It is very similar
to the script that will be used to grade your project, so if your project works correctly
with this script you are probably doing well. The only guarantee I give is that this
script will work on stdlinux.
ˆ Folder \Correct” – This contains some correct inputs and their expected outputs. The
\tester.sh” script will test your code against these examples.
ˆ Folder \Error” – This contains some inputs that should generate error messages. The
\tester.sh” script will test your code against these examples.
The following are some constraints on your implementation:
ˆ Do not use scanner generators (e.g. lex, ex, jlex, j ex, ect) or parser generators
(e.g. yacc, CUP, ect)
ˆ Use only the standard libraries of C. This is the reference I like to use:
https://en.cppreference.com/w/c/header
Your submission should compile and run in the standard linux environment the CSE
department provides (stdlinux). I will leave it up to you to decide what IDE you will use or
if you will develope your code locally or remotely, but as a nal step before submitting your
code please make sure it works on stdlinux. Use the subscribe command – make sure
you are subscribed to GCC-10.1.0. The graders will not spend any time xing your
code – if it does not compile on stdlinux, your project will not be graded and you
will get a 0.

Your Scanner
You are responsible for writing a scanner, which will take as input a text le and output a
stream of \tokens” from the core.h enumeration. You scanner must implement the following
functions:
ˆ scanner open and scanner close: These functions open the le, nd the rst token, and
release memory when we are done scanning.
ˆ currentToken: This function should return the token the scanner is currently on, with-
out consuming that token.
ˆ nextToken: This function should advance the scanner to the next token in the stream
(the next token becomes the current token).
ˆ getId: If the current token is ID, then this function should return the string value of
the identi er. If the current token is not ID, behavior is unde ned.
ˆ getConst: If the current token is CONST, then this function should return the value
of the constant. If the current token is not CONST, behavior is unde ned.
All of these functions will be necessary for the parser you will write in the second project.
You are free to create additional functions.
To make things easier for you, you may assume no token is made of more than 20
characters. Also, I suggest using calloc to allocate memory, instead of malloc.
Input
The input to the scanner will come from a single ASCII text le. The name of this le will
be given as a command line argument to the main function.
The scanner should process the sequence of ASCII characters in this le and should
produce the appropriate sequence of tokens. There are two options for how your scanner can
operate:
(1) the scanner can read the entire character stream from the le, tokenize it, stores all the
tokens in some list or array and calls to currentToken and nextToken simply walk through
the list
or
(2) the scanner reads from the le only enough characters to construct the rst token, and
then later reads from the le on demand as the currentToken or nextToken functions are
called.
Real world scanners typically work as described in (2). In your implementation, you can
implement (1) or (2), whichever you prefer.
Once your scanner has scanned the entire le, it should return the EOS token (End Of
Stream).

Invalid Input
Your scanner should recognize and reject invalid input with a meaningful error message. The
scanner should make sure that the input stream of characters represents a valid sequence of
tokens. For example, characters such as ` ‘ and ‘%’ are not allowed in the input stream. If
your scanner encounters a problem, it should print a meaningful error message to standard
out (please use the format “ERROR: Something meaningful here”) and return the ERROR
token so the main program halts.
The Language
The Core language consists of 4 kinds of strings, which you will need to tokenize:
ˆ Keywords:
and begin do else end if in integer
is new not or out procedure record then while
ˆ Identi ers:
Begins with a letter (uppercase or lowercase) followed by zero or more letters/digits.
Refer to this regular expression once we cover regular expressions:
(aj : : : jzjAj : : : jZ)(aj : : : jzjAj : : : jZj0j1j : : : j9)*
ˆ Constants:
Integers from 0 to 1009 (inclusive)
ˆ Symbols:
+ – * / := = < : ; . , ( ) [ ]
Your scanner walk through the input character stream, recognize strings from the lan-
guage, and return the appropriate token from the enumeration in \Core.java” or \Core.py”.
If there is any situation in which it is unclear to you which token should be returned, please
ask for clari cation on Piazza.
Write your scanner with these rules in mind:
1. The language is case sensitive, and the keywords take precedence over the identi ers.
For example, \begin” should produce the token BEGIN (not ID), but \bEgIn” should
produce the token ID.
2. Strings in the language may or may not be separated by whitespaces. For example the
character stream may contain the string \x=10″ or the string \x = 10″, and both of
these should generate the token sequence ID EQUAL CONST.
3. Always take the greedy approach. For example, the string \whilewhile” should produce
an ID token instead of two WHILE tokens, string “123” should produce a single CONST
token, and string \:=” should produce ASSIGN.

4. Keyword/identi er strings end with either whitespace or a non-digit/letter character.
For example:
(a) the string \while (” and the string \while(” should both result in the WHILE and
LPAREN tokens.
(b) the string \while 12″ should result in the WHILE and CONST tokens, but the
string \while12″ should result in the ID token.
5. Constant strings end with any non-digit character. For example:
(a) the string \120while” or \120 while” should result in the CONST and WHILE
tokens.
6. Symbols may or may not be separated from other strings by whitespace. For example:
(a) String \++while<= =12=” should result in the token sequence ADD ADD
WHILE LESS EQUAL EQUAL CONST EQUAL.
Let me know if you think of any situations not covered here.
Testing Your Project
I have provided some test cases. For each correct test case there are two les (for example
4.code and 4.expected). On stdlinux you can redirect the output of the main program to a
le, then use the di command to see is there is any di erence between your output and the
expected output. For an example of how to do this you can take a look at the script le
“tester.sh”.
The test cases are weak. You should do additional testing with your own test cases. Feel
free to create and post additional test cases on piazza.
Project Submission
On or before 11:59 pm January 27th, you should submit to the Carmen dropbox for Project
1 a single zip le containing the following:
ˆ All your .java or .py les.
ˆ An ASCII text le named README.txt that contains:
{ Your name on top
{ The names of all les you are submitting and a brief description stating what each
le contains
{ Any special features or comments on your project
{ Any known bugs in your scanner  If the time stamp on your submission is 12:00 am on January 28th or later, you will
receive a 10% reduction per day, for up to three days. If your submission is more than 3
days late, it will not be accepted and you will receive zero points for this project. If you
resubmit your project, only the latest submission will be considered.
Grading
The project is worth 100 points. Correct functioning of the scanner is worth 65 points. The
handling of errors is worth 20 points. The implementation style and documentation are
worth 15 points.

Data Engineering

Answer these questions in best of your abilities within 24 hours. If you are stuck on a technical aspect, write down in words how you would go about answering this questions and what other information would you need.

  1. Attached is a sample dataset with 100,000 rows of random placements, and media exposure by impressions. What queries would you use to analyze the data? (Hint: you need to think about cleaning the data first. Common data problems include duplicates, missing, errors in the data)

 

  1. Now that you identified issues with the data, do you notice anything particular about this dataset? What queries would you use to investigate? (Hint: think about what you know about digital marketing and do you think these would be good exposures?)

 

  1. Segment the media exposures into 5 groups. What queries would you use to help you with this? Create a histogram in your answer.

 

  1. How many green T-shirts do you think are sold in a year? (This is an open ended question, let us know how you would go about this figure)

Namespaces

Exercise 1: CAD and Container Namespaces
To avoid name conflicts, programmers can place their classes in a namespace. For example the standard library is placed in a namespace called std. You should put your classes in your own namespace. Thus place the CAD classes (Shape,Point, Line, etc) in the namespace: YourName::CAD Place the container classes (Array) in the namespace: YourName::Containers Now access the classes in your own namespace using:
• Full class name including namespace for the Point used in the Array class. Note that you can use only the CAD part of the namespace without the YourName part because the Point is also in the YourName namespace.
• In the main program, the full namespace for Point class.
• In the main program, using declaration for using a single class (Line).
• In the main program, using declaration for a complete namespace (Containers).
• In the main program, using the Circle class by creating a shorter alias for the YourName::CAD namespace.

The Free Store

Exercise 1: Dynamically Creating Objects
Until now, we created objects on the stack. The stack is limited in size thus when creating many objects, the stack will overflow. Also arrays created on the stack can only have a fixed size determined at compile time. To overcome these problems we have to create objects on the heap using new. In the main program, create Point objects on the heap with new using the default constructor, constructor with coordinates and the copy constructor and assign it to pointer (Point*) variables. Note that you can’t directly pass a pointer variable as argument to the copy constructor. The pointer must first be dereferenced when passing it to the copy constructor. (Point* p2=new Point(*p1)). Next call the Distance() function on the pointers and try to send the Point pointers to cout. You need to dereference the pointer when passing it as argument. Don’t forget to delete the object created with new. Test the main program. Next, we will create an array of points. First ask the user for the size of the array and read it using cin. Then try to create an array on the stack using the entered size. You will see a compiler error. Arrays on the stack must have the size set at compile time. Now create the array on the heap using new. Can you use other constructors than the default constructor for the objects created in the array? Don’t forget to delete the array after use. Don’t forget the square brackets when deleting arrays! Some C++ compilers (e.g. GCC) support variable length arrays (VLA) where the size can be determined at run-time. However this is a C99 feature that is not in the C++ standard. Because VLA is not in the C++ standard you should avoid its usage since it will lead to less portable code.

Exercise 2: Creating Array of Pointers
In this exercise we make it a little more complex. With an array of Points, all points are created with the default constructor. When creating an array of pointers, each element in the array must be created separately but can be created with other constructors than the default constructor. Thus creating an array of pointers is a two step process:
• Create an array of Point pointers with 3 elements on the heap.
• Create for each element in the array a point on the heap.
• Iterate the array and print each point in the array.
• Iterate the array again and delete each point in the array.
• Delete the array itself.
Also make a drawing of the memory layout.

Exercise 3: Creating Array Class
It is easy to forget to delete memory created with new. A good practice is to let a class manage memory. Then the client of that class does not have to manage memory and can’t forget to delete memory. So instead of creating a C array with new, you can use an array class that handle memory for you. In this exercise we are going to create an array class for Point objects (see Figure 5):
• Add a source- and header file for the Array class to your current project.
• Add a data member for a dynamic C array of Point objects (Point* m_data).
• Add a data member for the size of the array.
• Add a default constructor that allocates for example 10 elements.
• Add a constructor with a size argument. The implementation should allocate the number of elements specified by thesize input argument.
• Add a copy constructor. Keep in mind that just copying the C array pointer will create an Array object that shares the data with the original Array object. Thus you need to allocate a new C array with the same size and copy each element separately.
• Add a destructor. It should delete the internal C array. Don’t forget the square brackets.
• Add an assignment operator. Keep in mind you can’t copy only the C array pointers just as in the case of the copy constructor.
• Also don’t forget to delete the old C array and allocate new memory before copying the elements. This is because C arrays can’t grow.
Further check if the source object is not the same as the this object. If you don’t check it, then a statement like arr1=arr1 will go wrong. The internal C array will then be deleted before it is copied.
• Add a Size() function that returns the size of the array.
• Add a SetElement() function that sets an element. When the index is out of bounds, ignore the “set”. We will add better error handling later.
• Add a GetElement() function. You can return the element by reference since the returned element has a longer lifetime than the GetElement() function. When the index is out of bounds, return the first element. We will add better error handling later.
• You can also add a square bracket operator. Return a reference so the [] operator can be used for both reading and writing elements. When the index is out of bounds, return the first element. We will add better error handling later.
Point& operator [] (int index);
• In the main program, test the Array class.

Basic Operator Overloading

Exercise 1: Adding Operators to the Point class
By supporting operators, you can make your classes easier and more natural to use. However you must not “overuse” operators. Only use operators if the functionality of the operator is clear without reading documentation. Thus adding mathematical operators to a complex number class is good but using a + operator with a double as an argument on a point to increase the x-coordinate is questionable. So use operators with care. In this exercise we add a few operators to the Point class. Most operators do not change the original objects but return the result as a new object. Normally only the = operator and += and variants change the original object. Add the following operators:

Exercise 2: Ostream << Operator
It would be nice if you could send a point or a line directly to the cout object without calling the ToString() method, just as with the primitive types. This is possible by adding a << operator function that has on the left an std::ostream and on the right the point or line object. Since you can’t add a member function to the std::ostream class, you have to create it as a global function (outside the class definition, but inside the class header file):
ostream& operator << (ostream& os, const Point& p); // Send to ostream.
The implementation uses the << operator to send data to the os input argument. Since it is a global function, you can’t access the private members of Point. To simplify things, you can use the ToString() method of Point to get the string to send to the os argument. Also create a similar << operator for printing lines (and circles if you made a circle class). Adapt the test program to test the << operator for points and lines.

Exercise 3: Constructors as conversion operator
In this exercise we are going to do a little experiment. First add to the Point class a constructor that accepts one double as an argument. The implementation should set both the x- and y-coordinate to this value. Next try the following code in the test program:
Point p(1.0, 1.0);
if (p==1.0) cout<<“Equal!”<<endl;
else cout<<“Not equal”<<endl;
Will this code compile and can you explain why? It turns out that the Point constructor with the single double argument is implicitly used to convert the number in the if statement to a Point object. Thus constructors are used as implicit conversion operators. To prevent the usage of constructors are implicit conversion operators, declare the constructor as explicit:
explicit Point(double value);
Now try to compile the program again and you will see that now the if statement gives a compiler error. You can still use the constructor as conversion operator but then explicitly:
if (p==(Point)1.0) cout<<“Equal!”<<endl;

Exercise 4: Friends
Normally, only member functions of a class can access the private members of that class. Global functions and other classes can’t access the private members of that class. You can make an exception on that rule by declaring certain functions or classes as friend. For example the << operator for sending the point or line to the std::ostream class had to be a global function and thus can’t access the private members. Move the << operator of Point and Line inside the class definition and declare it as friend. The function remains a global function but it can now access the data members directly without the need for calling the getters or ToString() function. Normally, friends are to be avoided because it violates the data hiding principle. But in case of global operator functions it makes sense because you would actually want to make those global operator functions as member function but this was not possible.