Objective
Clean and standardize entity names (sponsoring government agencies and awarded companies) from SAM.gov (USA) and the E-Procurement Government of India databases.
Part 1: Data Cleaning and Standardization
Manually clean and standardize a subset (100 records) of entity names from the provided datasets..
Part 2: Automation Proposal and Script Development
Develop a basic automation script or method using Python and language models (OpenAI API, Llama2, etc.) to standardize entity names in the datasets.
Part 3: Scalability and Production Readiness
Document how the proposed method can be scaled and implemented in a production environment.
Details :
Include considerations for continuous data updating and processing large volumes of data. Explain how the method adheres to data quality and standards.
Evaluation Criteria
Standards & Quality: Accuracy and consistency in the final cleaned and standardized data. Scalability: The potential of the method to handle large datasets efficiently in a production environment.
Documentation: Clarity and comprehensiveness of the documentation, including reasoning for scaling the solution.
Deliverables
Candidates should submit a Google Drive folder containing:
Documentation
Additional Task Details
This IT Computer Science has been solved by our PhD Experts at UnilearnO. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK and US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics and referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.