Consider two relations A and B. A is of size 10,000 disk pages, and B is of size 1,000

pages. Consider the following SQL statement:

SELECT *

FROM A, B

WHERE A.a = B.a;

We wish to evaluate an equijoin between A and B, with an equality condition A.a = B.a.

There are 502 buffer pages available for this operation. Both relations are stored as

simple heap files. Neither relation has any indexes built on it.

Consider alternative join strategies described below and calculate the cost of each

alternative. Evaluate the algorithms using the number of disk I/O’s as the cost. For each

strategy, provide the formulae you use to calculate your cost estimates.

a) Page-oriented Nested Loops Join. Consider A as the outer relation. (1 mark)

b) Block-oriented Nested Loops Join. Consider A as the outer relation. (1 mark)

c) Sort-Merge Join (1 mark)

d) Hash Join (1 mark)

e) What would the lowest possible I/O cost be for joining A and B using any join algorithm

and how much buffer space would be needed to achieve this cost? Explain briefly. (1

mark)

Question 2 (5 marks)

Consider a relation with the following schema:

Executives (id: integer, name:string, title:string, level: integer)

The Executives relation consists of 100,000 tuples stored in disk pages. The relation is stored

as simple heap file and each page stores 100 tuples. There are 10 distinct titles in the

Executives hierarchy and 20 distinct levels ranging from 0-20.

Suppose that the following SQL query is executed frequently using the given relation:

SELECT E.ename

FROM Executives

WHERE E.title = “CEO” and E.level > 15;

Your job is to analyze the query plans given below and estimate the cost of the best plan

utilizing the information given about different indexes in each part.

a) Compute the estimated result size and the reduction factor (selectivity) of this query (1

mark)

b) Compute the estimated cost of the best plan assuming that a clustered B+ tree index

on (title, level) is (the only index) available. Suppose there are 200 index pages, and

the index uses Alternative 2. Discuss and calculate alternative plans. (1 mark)

c) Compute the estimated cost of the best plan assuming that an unclustered B+ tree

index on (level) is (the only index) available. Suppose there are 200 index pages, and

the index uses Alternative 2. Discuss and calculate alternative plans. (1 mark)

d) Compute the estimated cost of the best plan assuming that an unclustered Hash index

on (title) is (the only index) available. The index uses Alternative 2. Discuss and

calculate alternative plans. (1 mark)

e) Compute the estimated cost of the best plan assuming that an unclustered Hash index

on (level) is (the only index) available. The index uses Alternative 2. Discuss and

calculate alternative plans. (1 mark)

Question 3 (10 marks)

Consider the following relational schema and SQL query. The schema captures information

about employees, departments, and company finances (organized on a per department

basis).

Emp(eid: integer, did: integer, sal: integer, hobby: char(20))

Dept(did: integer, dname: char(20), floor: integer, phone: char(10))

Finance(did: integer, budget: real, sales: real, expenses: real)

Consider the following query:

SELECT D.dname, F.budget

FROM Emp E, Dept D, Finance F

WHERE E.did=D.did AND D.did=F.did

AND

E.sal ≥ 59000 AND E.hobby = ‘yodeling’;

The system’s statistics indicate that employee salaries range from 10,000 to 60,000, and

employees enjoy 200 different hobbies. There are a total of 50,000 employees and 5,000

departments (each with corresponding financial record in the Finance relation) in the

database. Each relation fits 100 tuples in a page. Suppose there exists a clustered B+ tree

index on (Emp.did) of size 50 pages.

a) Compute the estimated result size and the reduction factors (selectivity) of this query

(2 marks)

b) Compute the cost of the plans shown below. Assume that sorting of any relation (if

required) can be done in 2 passes: 1st pass to produce sorted runs and 2nd pass to

merge runs. Similarly hash join can be done in 2 passes: 1st pass to produce partitions,

2nd pass to join corresponding partitions. NLJ is a Page-oriented Nested Loops Join.

Assume that did is the candidate key, and that 100 tuples of a resulting join between

Emp and Dept fit in a page. Similarly, 100 tuples of a resulting join between Finance

and Dept fit in a page. (8 marks, 2 marks per plan)

Formatting Requirements

For each question, present an answer in the following format:

Show the question number and question in black text

Show your answer in blue text

For each of the calculations provide the formulae you used to calculate your cost estimates.