This CA assesses students on core concepts in descriptive analytics, discrete and continuous probability models, and hypothesis tests. Submission of a shiny app including UI and Server and URL on shinyapp.io is mandatory. Any submission after the deadline will not be considered and scored.
B9DA101: Statistics For Data Analytics, Assignment
Item 1 Tab a: Describe the dataset using appropriate plots/curves/charts. (7)
Tab b: Consider one of the continuous attributes, and compute central and variational measures.
Tab c: For a particular variable of the dataset, use Chebyshev’s rule and propose a one-sigma interval. Based on your proposed interval, specify the outliers if any.
Tab d: Explain how the box-plot technique can be used to detect outliers. Apply this technique for one attribute of the dataset.
Item 2 Tab a: Select four variables of the dataset, and propose an appropriate probability model to quantify the uncertainty of each variable.
Tab b: For each model in part (a), estimate the parameters of the model.
Tab c: Explain how each model can be used for predictive analytics, then find the prediction for each attribute.
Item 3 Tab a: Consider two categorical variables of the dataset, develop a binary decision-making strategy to check whether the two variables are independent at the significant level alpha=0.01. To do so:
State the hypotheses. Find the statistic and critical values. Explain your decision and interpret the results.
Tab b: Consider one categorical variable, apply a goodness of fit test to evaluate whether a candidate set of probabilities can appropriately quantify the uncertainty of class frequency at the significant level alpha=0.05.
Tab c: Consider one continuous variable in the dataset, and apply a test of mean for a proposed candidate at the significant level alpha=0.05.