This is a machine learning model in python using scikit learn to classify the

handwritten Arabic letters. There are two files. The train data and the test data. The

code is available, and we need to optimize the code so under box number 6 when we

do the cross validation of the model, the accuracy of the model should be in high 80s

and low 90s. we should be tuning the hyperparameters and improve the pipeline as

needed. Anything is allowed to be used from the scikit learn but nothing more.

The code as it is, the model accuracy is 79

The goal is to modify the code to be able to get an accuracy of the model in the high

80s and low 90s.

In box 3 of the code, there are the hyperparameters that need to be tuned and the

pipeline that might need to be modifed. Voting model can be used to get high

accuracy.

We need to improve the model accuracy from the existing code.

Info about the dataset: The dataset is composed of 16,800 characters written by 60

participants, the age range is between 19 to 40 years, and 90% of participants are

right-hand. Each participant wrote each character (from 'alef' to 'yeh') ten times on

two forms. The forms were scanned at the resolution of 300 dpi. The dataset is

partitioned into two sets: a training set (13,440 characters to 480 images per class)

and a test set (3,360 characters to 120 images per class). Writers of training set and

test set are exclusive. Ordering of including writers to test set are randomized to make

sure that writers of test set were not from a single institution (to ensure variability of

the test set).

The code: This is a machine learning model in python using scikit learn to classify

the handwritten Arabic letters. There are two files. The train data and the test data.

The code is available, and we need to optimize the code so under box number 6 when

we do the cross validation of the model, the accuracy of the model should be in high

80s and low 90s. we should be tuning the hyperparameters and improve the pipeline

as needed. Anything is allowed to be used from the scikit learn but nothing more.

Voting model can be used to improve accuracy.

Goal: build an image classifier to classify handwritten Arabic language characters

using scikit learn. The model accuracy have to be in high 80s like 89% or low 90s

like 92%

This is all about tuning the hyperparameters and the model pipeline

Most Viewed Questions Of Machine Learning

3-For the data shown in the attached figure (dark circles are one class, white circles another) solve the classification problem with a neuron by hand. That is, find the appropriate weights of the required linear discriminant.

Verified Answer

Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction. c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data?

Verified Answer

2. Perform K-means clustering with K = 2 using the Euclidean norm.Toss a coin 7 times to initialise the algorithm. 3. Cluster the data using hierarchical clustering with complete linkage and the Euclidean norm. Draw the resulting dendrogram.

Verified Answer

Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data? Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction.

Verified Answer

Q2. Using the data from Problem 2, build a Gaussian Naive Bayes classifier for this problem. For this you have to learn Gaussian distribution parameters for each input data feature, i.e. for p(height|W), p(height|M), p(weight|W), p(weight|M), p(age|W), p(age|M). a) Learn/derive the parameters for the Gaussian Na ive Bayes Classifier for the data from Question 2 a) and apply them to the same target as in problem 1a). b) Implement the Gaussian Na ive Bayes Classifier for this problem. c) Repeat the experiment in part 1 c) and 1 d) with the Gaussian Native Bayes Classifier. Discuss the results, in particular with respect to the performance difference between using all features and using only height and weight. d) Same as 1d but with Naïve Bayes. e) Compare the results of the two classifiers (i.e., the results form 1 c) and 1d) with the ones from 2 c) 2d) and discuss reasons why one might perform better than the other.

Verified Answer

1. Introduction In this assignment you will build on your knowledge of classification image classification problem using a convolutional neural network. This assignment aims to guide you through the processes by following the four fundamental princi- ples. in particular you will solve an • Data: Data import, preprocessing, and augmentation. • Model: Designing a convolutional neural network model for classifying the images of the parts. • Fitting: Training the model using stochastic gradient descent. • Validation: Checking the model's accuracy on the reserved test data set and investigating where the most improvement could be found. Additionally, looking into the uncertainty in the predictions. This is not necessarily a lincar process, after you have fit and/or validated your model, you may need to go back to carlier steps and adjust your processing of the data or your model structure. This may need to be done several times to achieve a satisfactory result. This assignment is worth 35% of your course grade and is graded from 0 35 marks. An additional two bonus marks are available to the student who's model performs best on a previously unseen data sel.

Verified Answer

(a) What is meant by feature engineering in machine learning? (b) You are given a classification problem with one feature and the followingItraining set: As usual, y is the label. This is a multi-class classification problem with possible labels A, B, and C. The test samples are 0, 1, and -5. Find the 1-Nearest Neighbour prediction for each of the test samples. Use the standard Euclidean metric. If you have encountered any ties, discuss briefly your tie-breaking strategy.[5 marks] Engineer an additional feature for this dataset, namely ². Therefore, your new training set still has 6 labelled samples in its training set and 3 unlabelled samples in its test set, but there are two features, and ². Find the 1-Nearest Neighbour prediction for each of the test samples in the new dataset.[16 marks] (d) What is meant by a kernel in machine learning? (e) How can the distance between the images of two samples in the feature space be expressed via the corresponding kernel?[2 marks] (f) You are given the same training set as before, and only one test sample, 1. The learning problem is still multi-class classification with possible labels A, B, or C. Using kernelized Nearest Neighbours algorithm with kernel K(1,1)= (1-1¹)², compute the 3-Nearest Neighbours prediction for the test sample. If applicable, describe your tie-breaking strategy.[10 marks]

Verified Answer

For this programming assignment you will implement the Naive Bayes algorithm from scratch and the functions to evaluate it with a k-fold cross validation (also from scratch). You can use the code in the following tutorial to get started and get ideas for your implementation of the Naive Bayes algorithm but please, enhance it as much as you can (there are many things you can do to enhance it such as those mentioned at the end of the tutorial):

Verified Answer

Question 1 Download the SGEMM GPU kernel performance dataset from the below link. https://archive.ics.uci.edu/ml/datasets/SGEMM+GPU+kernel+performance Understand the dataset by performing exploratory analysis. Prepare the target parameter by taking the average of the THREE (3) runs with long performance times. Design a linear regression model to estimate the target using only THREE (3) attributes from the dataset. Discuss your results, relevant performance metrics and the impact of normalizing the dataset.

Verified Answer

This is a machine learning model in python using scikit learn to classify the handwritten Arabic letters. There are two files. The train data and the test data. The code is available, and we need to optimize the code so under box number 6 when we do the cross validation of the model, the accuracy of the model should be in high 80s and low 90s. we should be tuning the hyperparameters and improve the pipeline as needed. Anything is allowed to be used from the scikit learn but nothing more. The code as it is, the model accuracy is 79 The goal is to modify the code to be able to get an accuracy of the model in the high 80s and low 90s. In box 3 of the code, there are the hyperparameters that need to be tuned and the pipeline that might need to be modifed. Voting model can be used to get high accuracy. We need to improve the model accuracy from the existing code. Info about the dataset: The dataset is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from 'alef' to 'yeh') ten times on two forms. The forms were scanned at the resolution of 300 dpi. The dataset is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set were not from a single institution (to ensure variability of the test set). The code: This is a machine learning model in python using scikit learn to classify the handwritten Arabic letters. There are two files. The train data and the test data. The code is available, and we need to optimize the code so under box number 6 when we do the cross validation of the model, the accuracy of the model should be in high 80s and low 90s. we should be tuning the hyperparameters and improve the pipeline as needed. Anything is allowed to be used from the scikit learn but nothing more. Voting model can be used to improve accuracy. Goal: build an image classifier to classify handwritten Arabic language characters using scikit learn. The model accuracy have to be in high 80s like 89% or low 90s like 92% This is all about tuning the hyperparameters and the model pipeline

Verified Answer