Please read this assignment carefully.
This coursework is concerned with the creation of a library to analyse data from different instruments onboard
a satellite. The instruments provide daily updates from a piece of land that can be used to track
water levels at different locations.
This assignment asks you to work collaboratively within your team to create a package. For that you will
need to write some code for querying, loading and analysing a dataset about different portions of land in
a map. We will describe how the code must behave, but it is up to you to fill in the implementation. The
package needs to follow all the good practices learnt in the course, that is, the package should: be version
controlled; include tests; provide documentation and doctests; set up command line interfaces; and
be installable. Besides this, you will also need to modify an existing implementation of a provided script
to make it more readable, more efficient, and measure its performance.
The collaboration aspect should be organised and managed using GitHub.
The exercise will be semi-automatically marked, so it is very important that your solution adheres to the
correct interface, file and folder name convention and structure, as defined in the rubric below. An otherwise
valid solution that doesn’t work with our marking tool will not be given credit.
For this assignment, you can use the Python standard library and other libraries you may wish to use (but
make sure they are clearly set as dependencies when installing your package). Your code should work with
First, we set out the problem we are solving. Next, we specify the target for your solution in detail. Finally,
to assist you in creating a good solution, we state the marking scheme we will use.
1 Background information
The Irish Space Agency has launched Aigean, an Earth observation satellite to monitor an area around
Lough Ree. Recently, rainfall has decreased in the area, and during the latest years droughts have become
more frequent and more severe. With the instruments on board Aigean the scientific community will be
able to obtain better data about the water levels and the erosion of the land, and therefore will be able to
generate more accurate predictions.
However, the Irish Space Agency sadly hasn’t provided any software tools to do this analysis!
Thankfully, Aoife O’Callaghan, a geology PhD student at the Athlone City Institute, has set the objective
to solve this problem by creating an open-source package to analyse Aigean data. Aoife has some ideas of
what she would like the package to do, but she doesn’t have a research software development background
beyond how to install and use Python libraries. That’s why Aoife has contacted you!
You and your group members agree this is a great tool to offer to the community and have decided to put all
your brains together to come up with an easy-to-use Python library to analyse and visualise Aigean satellite
What do we know? What do we have? What do we want?
1. Aigean has multiple instruments, as an starting point we only need to focus on the imagers and the
2. There are three imagers on board of the spacecraft. Their only differences are in their resolution (how
much area they cover per pixel) and their field-of-view (how much they can see in a single image).
3. The three imagers are called: Lir, Manannan and Fand. •Lir has the largest field-of-view, but the smaller resolution with a pixel size of 20 m per pixel; •Manannan provides a smaller field-of-view with a better resolution of 10 m per pixel; and •Fand has the smallest field-of-view but a very high resolution of 1 m per pixel.
4. The radar is called Ecne and it provides three measurements for the deepest areas in the region.
5. Each instrument provides data in a different format, but the imagers share a common set of metadata.
6. A number of images are taken every day, however not all the land is fully covered in a single day, it
depends on the satellite orbits. Ecne, however, takes always measurements of the same points.
7. All the data is available at the Irish Space Agency webservice archive.
8. The Python library – aigeanpy – should be able to query, download, open, process and visualise the
9. We want to create three command line tools to provide access to some functionality from outside
10. We have a script from a post-doc of Aoife’s group that implements the so-called k-means algorithm for
clustering data points. We want to include it in our library too! It will help people to analyse different
land areas based in their parameters.
11. We are also interested on how to make our code, specifically the k-means algorithm, more efficient.
This will be used to analyse Ecne’s data.
12. We want this tool to be used by any researcher, so it needs to be easy to install and use. This includes
having good documentation about how to use it and how to acknowledge it in the publications that
benefit from it.
13. And we also want to make it easier to others to contribute so we need to provide information about
how we would like others to contribute.
Let’s look at what we’ve got access to already:
1.1 The data archive webservice
The Irish Space Agency data archive is located at: https://dokku-app.dokku.arc.ucl.ac.uk/isa-archive/ and
their main page provides some information about how to query this service.
The website offers two services. One is used to query the catalogue, and the other to download a file from
The results from the query service are provided as JSON files with the properties of the observations found
in the specified time range (and instruments). These files include information about the date and time of
the observations, the instrument used, the field of view observed and the filename where that observation is
stored. We can download that files using the filename as an argument to the download service. The format
from the observation files vary depending on the instrument (specified in the following section).
Read the information on the archive website to understand how to query the service, what parameters are
accepted and what are the defaults.
We need to create a set of tools within the Python package to query and download the files. They need to
be available from aigeanpy.net.query_isa and aigeanpy.net.download_isa. They must accept
all the parameters listed on the website. Additionally, the download_isa need to allow the user to specify
where to download the file (save_dir).
1.2 Different instruments, different file types
Data from each instrument is provided in a different type of file.
Lir uses the Advanced Scientific Data Format (ASDF). The asdf Python library can read them and extract
the data and metadata from these files.
Manannan uses Hierarchical Data Format 5 (HDF5). As with the asdf, this type of file contains the data
and the metadata together. The h5py Python library can load them.
The Fand instrument stores the data in npy format and the metadata in JSON files. npy files can be read
from NumPy’s load and the Python Standard Library provides support to load JSON files. The archive
provides that pair of files in a single zip file (for which Python Standard Library also provides a module to
Finally, the Ecne instrument doesn’t take images, but infers some measurements of the 300 deepest areas in
the region. The measurements are turbulence, salinity and algal density for these points. They are stored
Ideally a user shouldn’t need to unzip the file before loading it with the library. The io.BytesIO
class can help you to load the file in memory. Take a look at how it’s used on the exemplar at the
beginning of our course notes.
1.2.1 Getting the coordinates right
Arrays are stored in Python as (rows, columns). However, we normally refer to places in a map as (x, y)
coordinates (with x running from left to right, and y running from bottom to top). Also when displaying an
image in matplotlib with imshow, by default, you’d get the axis as its origin is in the top-left corner and
positive y-values going downwards. For this library, we will need to manage two type of coordinate systems:
pixels and earth.
Figure 1: Difference between the two coordinate systems. The plot in the left shows the default when
visualising an array, the (0, 0) is on the top left corner. On the right, the same array is shown as a map,
which a set of (x, y) coordinates are represented within a pixel. In this case each pixel corresponds to 10
meters and the origin is within the second row, and the first column.
To ease the conversion between the coordinate systems you’ll need to create two helper functions, which are
called earth_to_pixel and pixel_to_earth.
Each image will come with an array of a particular size (and shape) and the metadata will provide the
resolution (in meters per pixel), the earth x- and y-coordinates (in meters) as the (lower, upper)
boundaries for each axis. The field of view (i.e., the difference between the boundaries) divided by the
resolution should give you the shape of the array (in the (columns, rows) order).