Please write codes to read the data file TrainingData.csv.
The first row is the header (variable names). Data are stored in
Determine the number of variables and the number of records in this
Store the variable names in a list.
Determine if there is any missing values in the data set. If yes, please
report the total number of missing values.
Find the number of distinct LCID in the data set.
Find the variable with the most missing values.
Convert the variable hour_id to datetime format.
What is the time duration of the entire data set?
Determine the number of records per day.
Use the median method in the statistics package (from statistics
import median) or else, do the followings:
Divide the entire data set by distinct value of LCID.
For each distinct LCID value, determine the median of each
variables in the divided data set.
Package the result in (b) in a dictionary.
Determine the number of Complaint cases and Non-complaint cases
in the entire data set.
Determine the top 10 LCIDs with the most complaint cases.
Calculate the median value per day per each variable in the entire data
Use the first 5 digits of the LCID values to define a new variable Region.
Determine the region with the most complaint cases found in the data