B9BA103: A Dessert Parlour Wants To Use The Above Dataset To Train A Model That Could Automatically Label:

Text Mining Assignment

Dataset

The dataset Yelp Restaurant Reviews.csv contains customer reviews of different restaurants. In addition to customer reviews, customer ratings (scale 1–5) are also provided. It is assumed that a customer would recommend a given restaurant to others if the rating given by that customer is 4 or 5. Otherwise, it is assumed that the customer would not recommend the restaurant to others.

This dataset is taken from https://www.kaggle.com/datasets/farukalam/yelp– restaurant–reviews

Task

A dessert parlour wants to use the above dataset to train a model that could automatically label its customer reviews to indicate whether a given customer would recommend it to others or not. Your task is to help dessert parlour create such a model by trying any two classification algorithms (that you deem appropriate) in Python.

Deliverables & Assessment Rubric

Put the following in a zipped folder and upload to Moodle:

1. Python Code (.py) (40% Weighting)

Key Assessment Areas: Does the Python code demonstrate the functionality required to implement text classification effectively on a given dataset (i.e., data preparation; model training, evaluation)?

Is the code running without errors? Does the code come with copious explanatory comments for the various steps involved?

2. Report (.pdf) (60% Weighting)

Key assessment areas: Evidence of critical analysis. A max 2000–word report should provide a critical analysis of the following points in context of the given task:

a) Text cleaning – Has the learner discussed the approach used to clean text before its conversion to numerical format? Has the learner discussed limitations of the used approach?

b) Creation of structured data – Has the learner discussed the approach used to convert clean text to a structured numerical format? Has the learner discussed limitations of the used approach?

c) Model performance evaluation – Has the learner discussed the choice of model evaluation metric? Has the learner interpreted the performance of each model and discussed why they performed the way they did? Has the learner recommended any model for use by the dessert parlour?