Introduction to Statistical Modelling qualitative attributes – Assignment

Introduction to Statistical Modelling of Qualitative Attributes - Assignment

Assignment Task

Background Brief

The dataset originates from an experiment conducted on mice within a biological laboratory setting. The primary objective of the experiment is to evaluate the impact of medication on restoring the learning ability of mice with chromosome abnormalities.

For this assignment, we narrow down the inquiry to predicting the expression level of a specific protein that generates detectable signals in the brain cortex of mice. Predictors encompass expression levels of various other proteins and several qualitative attributes.

Variable Description

1. Response Variable

Y: Expression level of the protein named pCASP9.

2. Quantitative Predictors

Expression levels of 10 other proteins:

Variable Name Protein Name
X1 ERBB4
X2 IL1B
X3 nNOS
X4 pNR2A
X5 P70S6
X6 PSD95
X7 SYP
X8 BRAF
X9 DYRK1A
X10 pELK

Qualitative Predictors

Variable Name Levels and Descriptions
Genotype “Ts65Dn” if the mouse has trisomy, “Control” if otherwise.
Treatment “Memantine” if the mouse is injected with memantine, “Saline” if injected with only saline.
Behavior “C/S” (context-shock) if the mouse is stimulated to learn, “S/C” (shock-context) if not stimulated to learn.

Instruction

Explore Data

Conduct exploratory analysis on the variables using the entire dataset.

Describe the data and provide commentary on your observations/findings.

Fit Model

Divide the dataset into a training set and testing set in an approximate ratio of 75:25.

Set the random state/seed using the last 4 digits of your SP admission number.

Fit the full additive Multiple Linear Regression (MLR) model on the training set.

Evaluate Model

Perform relevant diagnostics on the full MLR model fitted.

Assess the model from the perspectives of model fit, prediction accuracy, model/predictor significance, and checking of assumptions.

Improve Model

Enhance the model using at least 4 of the following techniques where appropriate:

  • Removing outlier(s) (if any)
  • Centering and/or standardizing variables
  • Principal component analysis (PCA)
  • Transformation of variables
  • Interaction of variables
  • Variable selection

Explain how the model improves after applying each technique.

Present Results

Present and elucidate your work for the above, incorporating relevant graphs, figures, and/or tables that support your analysis, in a report not exceeding 12 pages.

The report should be detailed yet succinct, with a clear and easily readable format. Elaborate layout design is unnecessary, but the report must be comprehensible and accessible.

WhatsApp icon