Introduction to Statistical Modelling of Qualitative Attributes - Assignment
Assignment Task
Background Brief
The dataset originates from an experiment conducted on mice within a biological laboratory setting. The primary objective of the experiment is to evaluate the impact of medication on restoring the learning ability of mice with chromosome abnormalities.
For this assignment, we narrow down the inquiry to predicting the expression level of a specific protein that generates detectable signals in the brain cortex of mice. Predictors encompass expression levels of various other proteins and several qualitative attributes.
Variable Description
1. Response Variable
Y: Expression level of the protein named pCASP9.
2. Quantitative Predictors
Expression levels of 10 other proteins:
| Variable Name | Protein Name |
|---|---|
| X1 | ERBB4 |
| X2 | IL1B |
| X3 | nNOS |
| X4 | pNR2A |
| X5 | P70S6 |
| X6 | PSD95 |
| X7 | SYP |
| X8 | BRAF |
| X9 | DYRK1A |
| X10 | pELK |
Qualitative Predictors
| Variable Name | Levels and Descriptions |
|---|---|
| Genotype | “Ts65Dn” if the mouse has trisomy, “Control” if otherwise. |
| Treatment | “Memantine” if the mouse is injected with memantine, “Saline” if injected with only saline. |
| Behavior | “C/S” (context-shock) if the mouse is stimulated to learn, “S/C” (shock-context) if not stimulated to learn. |
Instruction
Explore Data
Conduct exploratory analysis on the variables using the entire dataset.
Describe the data and provide commentary on your observations/findings.
Fit Model
Divide the dataset into a training set and testing set in an approximate ratio of 75:25.
Set the random state/seed using the last 4 digits of your SP admission number.
Fit the full additive Multiple Linear Regression (MLR) model on the training set.
Evaluate Model
Perform relevant diagnostics on the full MLR model fitted.
Assess the model from the perspectives of model fit, prediction accuracy, model/predictor significance, and checking of assumptions.
Improve Model
Enhance the model using at least 4 of the following techniques where appropriate:
Explain how the model improves after applying each technique.
Present Results
Present and elucidate your work for the above, incorporating relevant graphs, figures, and/or tables that support your analysis, in a report not exceeding 12 pages.
The report should be detailed yet succinct, with a clear and easily readable format. Elaborate layout design is unnecessary, but the report must be comprehensible and accessible.