Data Mining and Analysis
Learning Outcome 1: Describe and explain the concepts of data mining and business analytics.
Learning Outcome 2: Critically review and appreciate the role of data mining in business analytics.
Learning Outcome 3: Critically explain how and why data mining and business analytics can be used to create competitive advantage for businesses and enterprises.
Learning Outcome 4: Critically analyse when, why, and how data mining should be considered a possible problem-solving strategy from a business perspective.
Learning Outcome 5: Gain sufficient working knowledge of SAS Enterprise Miner and SAS Enterprise Guide for performing data exploration, modelling, model comparison, and reporting with real-world case studies.
Coursework Aim:This referred/deferred individual coursework assignment involves analysing a real- world dataset and creating meaningful insights to address certain business concerns and problems.
Overview The objective of this individual assignment is to evaluate your understanding of the basic theory, concepts, and various methods and algorithms in data mining, and assess your skills of applying appropriate Python packages, such as NumPy, Pandas, Matplotlib, and Scikit-learn, etc., to carry out a data mining project.
This dataset contains all the road accidents occurring in London boroughs which have been reported to the police over a certain period of time. Your role in this project is two- fold: acting as a business client and as a data analyst. As a business client, you are expected to raise meaningful business concerns/problems in relation to the data given. And as a data analyst, you are required to follow a proper data mining methodology and apply various techniques covered in lectures to analyse your data to address the business concerns and problems having been raised.
You must contact the module leader to know which borough`s data in the dataset you need to analyse.
Tasks You are required to undertake the following tasks:
1. Problem Identification
Read the data description file (metadata) to learn the basic characteristics of the dataset including the certain business context associated with the data, the total number of attributes (dimensions, variables), the data type of each attribute, the value range/mode, skewness, and kurtosis of each attribute, the total number of instances, and simple data exploration with essential plotting, etc.
Identify a set of meaningful business problems of interest with regard to the data for analysis.
Identify what data mining tasks need to be performed in order to address the business problems raised.
2. Data Preparation
Determine which variables to be used in which analysis. Also refer to 1.2. and 1.3. Task 1.
Get your data for analysis. Choose appropriate methods for data pre-processing, including detecting and dealing with incorrect data types, irrelevant variables, missing values, outliers, imbalanced classes, and duplicates, changing data type, and conducting proper dimensionality reduction, feature extraction, data transformation, data partition, and normalisation, etc. where appropriate. Also refer to1.1. Task 1.
3. Model Construction
With the pre-processed dataset undertake the data mining tasks you have identified in 1.2. You are required to apply two different algorithms for both predictive and descriptive modelling. For descriptive modelling, you may choose to use the k- means clustering and various EDA (Exploratory Data Analysis) methods, e. g., histograms, bar charts, and Person`s correlation coefficient, etc. For predictive modelling, for example, you may use decision trees and artificial neural networks, or decision trees and k-nearest-neighbour, etc.
In order to build the most appropriate and accurate models and identify meaningful hidden patterns, different settings for the relevant model parameters should be considered for each of the selected algorithms and methods.
4. Model Interpretation and Evaluation
Interpret the descriptive models created, such as clusters created using k-means algorithms, correlation among variables, and various relevant plots created.
Compare the performances of different predictive models in terms of accuracy, error rate, generalisation capability (over-fitting), simplicity and cost, etc., where appropriate.
Discuss the meaningfulness and usefulness of the models built and the patterns revealed, and how the models and the patterns can be used to address the original business concerns. This includes both descriptive and predictive models.
5. A summary of the main findings of the project.