Author: cs daixie

Opengl Project

Introduction This project requires you to draw something fun in 3D. It must be your own creation. You can use pre-canned 3D objects, such as the ones that GLUT provides or OBJ files, but in addition to your own, not instead of. You must create your own geometry. You can explicitly list your own 3D coordinates, or you can use equations and define the 3D shapes procedurally. You must have at least 100 x-y-z coordinates. These can be divided up among multiple of your objects. The scene must have 3D thickness, nothing that is completely planar. You must use at least 5 different colors. The 3D rotation and scaling from the sample program must still be working.

Prefix

You are given n strings s1, s2, · · · , sn and q queries. In i th query, you are given a string ti , please find

out how many strings in s1, s2, · · · , sn begins with ti .

Football Match

While the FIFA World Cup is being held in Qatar, BLGG is organizing a football tournament in LGU,

too.

There are n teams in this tournament, numbered from 1 to n. Each team has its popularity, and the

popularity of team i is ai . A match between i and j will gain ai × aj MOD M attractions.

When a football team loses a match, it will be eliminated from the tournament. At the end, the team

left standing will be the champion of this tournament.

BLGG is wondering that what the maximum sum of the attractions of the (n 1) matches.

Median Search Tree

If the sorted array of all the values in the set is {ai} n

i=1, let t = n/2, then the median 2k values are

{atk+1, · · · , at+k}.

Barbara has got a set of values with size of 2k initially. Barbara wants to do m operations on it. Each

operation belongs to the following 3 types:

  • 1 w: insert a value w.
  • 2: output all the median 2k values, i.e. atk+p, 1 p 2k.
  • 3 p: delete the p-th value among median 2k values, i.e. atk+p.

We guarantee that all the values will be distinct and the size of the set is always at least 2k.

Special Shortest Path

City C consists of n nodes, representing different places. There are m edges between these nodes. For

the edge ei = (ui , vi , wi), there is a bidirectional(undirected) trail connecting ui and vi with length of

wi .

For a path P = {pi}, consisting of edges p1, p2, p3, · · · , pk, the length of each edge is li = wpi . Normally,

passing the edge pi with length li will cost li units of energy. Specially, if li = K · li1, then passing

this edge will only cost (K 1) · li1 units of energy.

Alice is starting from the node 1. Alice wants to know how many units of energy it will take at least

to visit the node x, for any x. If x is unreachable from the start point(node 1), you should output 1

as the result.

Chess Game

The program should respect the rules of chess, for example,

  • the movement of pieces (including castling and en-passant),
  • piece promotion, check
  • checkmate
  • stalemate

Please review the rules of chess to verify your understanding of the game!

  • You can implement your system on any platform and language you want as long as

it is available in our labs (can be run in our labs). You may have to show me/TA

how it works in some cases.

  • The program must use a game tree search scheme with alpha-beta pruning.

Furthermore, the program should permit user-supplied control parameters, for

example, the depth of search.

  • Put effort towards designing an effective board evaluation function. You should

research the literature on computer chess to find strategies used by other systems.

You can borrow ideas from the literature (properly acknowledged in your report).

I also encourage you to try your own ideas!

  • The program should interact with a human player. Both human vs human and

human vs ai options should be available. Moves should be given via board

coordinates. At the minimum, the program should dump out the current board as an

ASCII table (e.g., upper case = black, lower case = white, space = “-“,). Although

a graphical user interface is not required, an effective GUI will be positively

considered during evaluation.

  • Your program should permit any board setup to be used initially (This is good for

testing purposes).

  • An option is that your program should dump out the game in terms of a standard

chess output text file.

Na¨ıve Bayes Classifier

1 Objective

Construct a na¨ıve Bayes classifier to classify email messages as spam or not spam (“ham”). A Bayesian decision rule chooses the hypothesis that maximizes P(Spam|x) vs P(∼Spam|x) for email x.

Use any computer language you like. I recommend using Python as it includes ready access to a number of powerful packages and libraries for natural language processing (NLP). I include a few Python 3.6 excerpts to get you started. I also describe a several tools from the NLTK (natural language toolkit) to pre-process data to improve classification.

2 Na¨ıve (conditionally independent) classification

Suppose that you have a dataset {xN }. Each xk ∈ {xN } is a separate email for this assignment. Each of the N data points xk = (f1, f2, . . . , fn) ∈ Pattern space = X where f1, f2, . . . are called features. You extract features from each data point. Features in an email may include the list of “words” (tokens) in the message body. The goal in Bayesian classification is to compute two probabilities

P(Spam|x) and P(∼Spam|x) for each email. It classifies each email as “spam” or “not spam” by choosing the hypothesis with higher probability.

Na¨ıve Bayes assumes that features for x are independent given its class. P(Spam|x) is difficult to compute in general. Expand with the definition of conditional probability P(Spam|x) = P(Spam ∩ x) P(x) . (1) Look at the denominator P(x). P(x) equals the probability of a particular email given the universe of all possible emails. This is very difficult to calculate. But it is just a number between 0 and 1 since it is a probability. It just “normalizes” P(Spam ∩x). Now look at the numerator P(Spam ∩x). First expand x into its features {fn}. Each feature is an event that can occur or not (i.e. the word is in an email or not). So P(Spam ∩ x) P(x) ∝ P(Spam ∩ x) = P(Spam ∩ f1 ∩ · · · ∩ fn) (2) = P(Spam) · P(f1 ∩ · · · ∩ fn|Spam) (3) Apply the multiplication theorem (HW2, 1.c) to the second term to give P(f1 ∩ · · · ∩ fn|Spam) = P(Spam) · P(f1|Spam) · P(f2|Spam ∩ f1)· · · P(fn|Spam ∩ fn−1 ∩ · · · f2 ∩ f1). (4) But now you are still stuck computing a big product of complicated conditional probabilities. Na¨ıve Bayes classification makes an assumption that features are conditionally independent. This means that P(fj |fk ∩ Spam) = P(fj |Spam) (5) if j 6= k. This means that the probability you observe one feature (i.e. word) is independent of observing another word given the email is spam. This is a na¨ıve assumption and weakens your model. But you can now simplify the above to

Text Data for Sentiment Analysis

1 Overview

In this assignment you will implement the Naive Bayes algorithm with maximum likelihood and MAP solutions and evaluate it using cross validation on the task of sentiment analysis (as in identifying positive/negative product reviews).

2 Text Data for Sentiment Analysis

We will be using the “Sentiment Labelled Sentences Data Set”1 that includes sentences labelled with sentiment (1 for positive and 0 for negative) extracted from three domains imdb.com, amazon.com, yelp.com. These form 3 datasets for the assignment.

Each dataset is given in a single file, where each example is in one line of that file. Each such example is given as a list of space separated words, followed by a tab character (\t), followed by the label, and then by a newline (\n). Here is an example from the yelp dataset:

3 Implementation

3.1 Naive Bayes for Text Categorization

In this assignment you will implement “Naive Bayes for text categorization” as discussed in class. In our application every “document” is one sentence as explained above. The description in this section assumes that a dataset has been split into separate train and test sets.

Given a training set for Naive Bayes you need to parse each example and record the counts for class and for word given class for all the necessary combinations. These counts constitute the learning process since they determine the prediction of Naive Bayes (for both maximum likelihood and MAP solutions).

Now, given the test set, you parse each example, calculate the scores for each class and test the prediction. Note that products of small numbers (probabilities) will quickly lead to underflow problems. Due to that you should work with sum of log probabilities instead of product of probabilities. Recall that a · b · c > d · e · f iff log a + log b + log c > log d + log e + log f so that working with the logarithms is sufficient. However, note that unless your programming environment handles infinity natively, will need to handle ln(0) , −∞ as a special case in your code.

Important point for prediction: If a word in a test example did not appear in the training set at all (i.e. in any of the classes), then simply skip that word when calculating the score for this example. However, if the word did appear with some class but not the other then use the counts you have (zero for one class but non zero for the other).

network

Question 1 (Delay, 18%).

As shown in the figure below, a file of size F = 1000 + S bytes is transmitted on an end-to-end connection over four links, where S is the last three digits of your student number. For example, if your student number is 490123456, then S = 456 and F = 1456 bytes.

Each link is 100 km. The signal prorogation speed is 2 × 108 m/s. Assume that a header of 40 bytes is added to each packet. The bandwidth of all links is R = 1 Mbps at the beginning. The nodes use the store-and-forward scheme. (Ignore processing delays at each node.)

(0) What is your student number? Warning: If you use another student’s number as S value to answer the question, the following sub-questions will not be marked and you will get 0 in Question 1.

(1) How long does it take to transmit the file if the whole file is transmitted as a single packet. Now assume that the bandwidth of link B − C and D − E become 0.5 Mbps. Answer (2)–(4).

(2) Repeat (1).

(3) We would like to break the file into smaller packets to decrease the overall delay in the store-and-forward scheme. Assume that each time you break the file to make a new packet, you have to add 40 bytes as the header of the new packet. Repeat (2) when we break the file into N = 4 packets.

(4) What should be the optimal size of the packets to have the minimum overall delay to deliver the whole file? Find the overall delay. Hint: Since the link B − C has a smaller bandwidth compared with A − B, packets could be queued for some time!

Sheep & Wolves

In this mini-project, you’ll implement an agent that can solve the Sheep and Wolves problem for an arbitrary number of initial wolves and sheep. You will submit the code for solving the problem to the Mini-Project 1 assignment in Gradescope. You will also submit a report describing your agent to Canvas. Your grade will be based on a combination of your report (50%) and your agent’s performance (50%).

About the Project

The Sheep and Wolves problem is identical to the Guards & Prisoners problem from the lecture, except that it makes more semantic sense why the wolves can be alone (they have no sheep to eat). Ignore for a moment the absurdity of wolves needing to outnumber sheep in order to overpower them. Maybe it’s baby wolves vs. adult rams.

As a reminder, the problem goes like this: you are a shepherd tasked with getting sheep and wolves across a river for some reason. If the wolves ever outnumber the sheep on either side of the river, the wolves will overpower and eat the sheep. You have a boat, which can only take one or two animals in it at a time, and must have at least one animal in it because you’ll get lonely (and because the problem is trivial otherwise). How do you move all the animals from one side of the river to the other?

In the original Sheep & Wolves (or Guards & Prisoners) problem, we specified there were 3 sheep and 3 wolves; here, though, your agent should be able to solve the problem for an arbitrary number of initial sheep and wolves. You may assume that the initial state of the problem will follow those rules (e.g. we won’t give you more wolves than sheep to start). However, not every initial state will be solvable; there may be combinations of sheep and wolves that cannot be solved.

You will return a list of moves that will solve the problem, or an empty list if the problem is unsolvable based on the initial set of Sheep and Wolves. You will also submit a brief report describing your approach.

Your Agent

To write your agent, download the starter code below. Complete the solve() method, then upload it to Gradescope to test it against the autograder. Before the deadline, make sure to select your best performance in Gradescope as your submission to be graded.