1. Consider a simplified fitting problem in the frequency domain where we are looking to find the best fit of data with a
set of periodic (trigonometric) basis functions of the form 1, sin²(x), sin(kx), sin²(2kx)..., where k is effectively the
frequency increment. The resulting function for a given "frequency increment", k, and "function depth", d, and
parameter vector is then:
y = 00 * 1+(9; * sin(i + k + x)* sin(i * k*x)
i=1
Try "frequency increment" k from 1-10
For example, if k = 1 and d = 1, your basis (feature) functions are: 1, sin²(x)
if k = 1 and d = 2, your basis (feature) functions are: 1, sin(x), sin²(2.x)
if k=3 and d = 4, your basis (feature) functions are: 1, sin²(3*1*x), sin²(3*2*x), sin²(3*3*x), sin²(3*4*x)
This means that this problem can be solved using linear regression as the function is linear in terms of the parameters Ⓒ.
Try "frequency increment" k from 1-10 and thus your basis functions as part of the data generation process described
above.
a) Implement a linear regression learner to solve this best fit problem for 1 dimensional data. Make sure your
implementation can handle fits for different "function depths" (at least to "depth" 6).
b) Apply your regression learner to the data set that was generated for Question 1b) and plot the resulting function
for "function depth" 0, 1, 2, 3, 4, 5, and 6. Plot the resulting function together with the data points
c) Evaluate your regression functions by computing the error on the test data points that were generated for Question 1c) Compare the error results and try to determine for what "function depths" overfitting might be a problem. Which "function depth" would you consider the best prediction function and why? For which values of k and d do you get minimum error?
d) Repeat the experiment and evaluation of part b) and c) using only the first 20 elements of the training data set
part b) and the Test set of part c). What differences do you see and why might they occur?
Locally Weighted Linear Regression
2.
Another way to address nonlinear functions with a lower likelihood of overfitting is the use of locally weighted
linear regression where the neighborhood function addresses non-linearity and the feature vector stays simple. In this
case we assume that we will use only the raw feature, x, as well as the bias (i.e. a constant feature 1). Thus the locally
applied regression function is y = 0 + 0₁ *x
As discussed in class, locally weighted linear regression solves a linear regression problem for each query point, deriving
a local approximation for the shape of the function at that point (as well as for its value). To achieve this, it uses a
modified error function that applies a weight to each data point's error that is related to its distance from the query
point. Here we will assume that the weight function for the i data point and query point x is:
w(s) (x) = e
(z (6)_x)²
Use y: 0.204
where y is a measure of the "locality" of the weight function, indicating how fast the influence of a data
point changes with its distance from the query point.
a. Implement a locally weighted linear regression learner to solve the best fit problem for 1 dimensional data.
b. Apply your locally weighted linear regression learner to the data set that was generated for Question 1b) and
plot the resulting function together with the data points
c.
Evaluate the locally weighted linear regression on the Test data from Question 1 c). How does the performance
compare to the one for the results from Question 1 c) ?
d. Repeat the experiment and evaluation of part b) and c) using only the first 20 elements of the training data set.
How does the performance compare to the one for the results from Question 1 d) ? Why might this be the case?
e. Given the results form parts c) and d), do you believe the data set you used was actually derived from a function
that is consistent with the function format in Question 1? Justify your answer.
Logistic Regression
3. Consider again the problem from Questions 1 and 2 in the first assignment where we want to predict the gender of a
person from a set of input parameters, namely height, weight, and age. Assume the same datasets you generated for
the first assignment. Use learning rate=0.01. Try different values for number of iterations.
a. Implement logistic regression to classify this data (use the individual data elements, i.e. height, weight, and age,
as features). Your implementation should take different data sets as input for learning.
b.
Plot the resulting separating surface together with the data. To do this plotting you need to project the data and
function into one or more 2D space. The best visual results will be if projection is done along the separating
hyperplane (i.e. into a space described by the normal of the hyperplane and one of the dimension within the
hyperplane)
c. Evaluate the performance of your logistic regression classifier in the same way as for Project 1 using leave-one-
out validation and compare the results with the ones for KNN and Naïve Bayes Discuss what differences exist
and why one method might outperform the others for this problem.
d. Repeat the evaluation and comparison from part c) with the age feature removed. Again, discuss
what differences exist and why one method might outperform the others in this case.
3-For the data shown in the attached figure (dark circles are one class, white circles another) solve the classification problem with a neuron by hand. That is, find the appropriate weights of the required linear discriminant.
Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction. c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data?
2. Perform K-means clustering with K = 2 using the Euclidean norm.Toss a coin 7 times to initialise the algorithm. 3. Cluster the data using hierarchical clustering with complete linkage and the Euclidean norm. Draw the resulting dendrogram.
Q1 Consider the problem where we want to predict the gender of a person from a set of input parameters, namely height, weight, and age. a) Using Cartesian distance, Manhattan distance and Minkowski distance of order 3 as the similarity measurements show the results of the gender prediction for the Evaluation data that is listed below generated training data for values of K of 1, 3, and 7. Include the intermediate steps (i.e., distance calculation, neighbor selection, and prediction). b) c) To evaluate the performance of the KNN algorithm (using Euclidean distance metric), implement a leave- one-out evaluation routine for your algorithm. In leave-one-out validation, we repeatedly evaluate the algorithm by removing one data point from the training set, training the algorithm on the remaining data set and then testing it on the point we removed to see if the label matches or not. Repeating this for each of the data points gives us an estimate as to the percentage of erroneous predictions the algorithm makes and thus a measure of the accuracy of the algorithm for the given data. Apply your leave-one-out validation with your KNN algorithm to the dataset for Question 1 c) for values for K of 1, 3, 5, 7, 9, and 11 and report the results. For which value of K do you get the best performance? d) Repeat the prediction and validation you performed in Question 1 c) using KNN when the age data is removed (i.e. when only the height and weight features are used as part of the distance calculation in the KNN algorithm). Report the results and compare the performance without the age attribute with the ones from Question 1 c). Discuss the results. What do the results tell you about the data? Implement the KNN algorithm for this problem. Your implementation should work with different training data sets as well as different values of K and allow to input a data point for the prediction.
Q2. Using the data from Problem 2, build a Gaussian Naive Bayes classifier for this problem. For this you have to learn Gaussian distribution parameters for each input data feature, i.e. for p(height|W), p(height|M), p(weight|W), p(weight|M), p(age|W), p(age|M). a) Learn/derive the parameters for the Gaussian Na ive Bayes Classifier for the data from Question 2 a) and apply them to the same target as in problem 1a). b) Implement the Gaussian Na ive Bayes Classifier for this problem. c) Repeat the experiment in part 1 c) and 1 d) with the Gaussian Native Bayes Classifier. Discuss the results, in particular with respect to the performance difference between using all features and using only height and weight. d) Same as 1d but with Naïve Bayes. e) Compare the results of the two classifiers (i.e., the results form 1 c) and 1d) with the ones from 2 c) 2d) and discuss reasons why one might perform better than the other.
1. Introduction In this assignment you will build on your knowledge of classification image classification problem using a convolutional neural network. This assignment aims to guide you through the processes by following the four fundamental princi- ples. in particular you will solve an • Data: Data import, preprocessing, and augmentation. • Model: Designing a convolutional neural network model for classifying the images of the parts. • Fitting: Training the model using stochastic gradient descent. • Validation: Checking the model's accuracy on the reserved test data set and investigating where the most improvement could be found. Additionally, looking into the uncertainty in the predictions. This is not necessarily a lincar process, after you have fit and/or validated your model, you may need to go back to carlier steps and adjust your processing of the data or your model structure. This may need to be done several times to achieve a satisfactory result. This assignment is worth 35% of your course grade and is graded from 0 35 marks. An additional two bonus marks are available to the student who's model performs best on a previously unseen data sel.
(a) What is meant by feature engineering in machine learning? (b) You are given a classification problem with one feature and the followingItraining set: As usual, y is the label. This is a multi-class classification problem with possible labels A, B, and C. The test samples are 0, 1, and -5. Find the 1-Nearest Neighbour prediction for each of the test samples. Use the standard Euclidean metric. If you have encountered any ties, discuss briefly your tie-breaking strategy.[5 marks] Engineer an additional feature for this dataset, namely ². Therefore, your new training set still has 6 labelled samples in its training set and 3 unlabelled samples in its test set, but there are two features, and ². Find the 1-Nearest Neighbour prediction for each of the test samples in the new dataset.[16 marks] (d) What is meant by a kernel in machine learning? (e) How can the distance between the images of two samples in the feature space be expressed via the corresponding kernel?[2 marks] (f) You are given the same training set as before, and only one test sample, 1. The learning problem is still multi-class classification with possible labels A, B, or C. Using kernelized Nearest Neighbours algorithm with kernel K(1,1)= (1-1¹)², compute the 3-Nearest Neighbours prediction for the test sample. If applicable, describe your tie-breaking strategy.[10 marks]
For this programming assignment you will implement the Naive Bayes algorithm from scratch and the functions to evaluate it with a k-fold cross validation (also from scratch). You can use the code in the following tutorial to get started and get ideas for your implementation of the Naive Bayes algorithm but please, enhance it as much as you can (there are many things you can do to enhance it such as those mentioned at the end of the tutorial):
Question 1 Download the SGEMM GPU kernel performance dataset from the below link. https://archive.ics.uci.edu/ml/datasets/SGEMM+GPU+kernel+performance Understand the dataset by performing exploratory analysis. Prepare the target parameter by taking the average of the THREE (3) runs with long performance times. Design a linear regression model to estimate the target using only THREE (3) attributes from the dataset. Discuss your results, relevant performance metrics and the impact of normalizing the dataset.
This is a machine learning model in python using scikit learn to classify the handwritten Arabic letters. There are two files. The train data and the test data. The code is available, and we need to optimize the code so under box number 6 when we do the cross validation of the model, the accuracy of the model should be in high 80s and low 90s. we should be tuning the hyperparameters and improve the pipeline as needed. Anything is allowed to be used from the scikit learn but nothing more. The code as it is, the model accuracy is 79 The goal is to modify the code to be able to get an accuracy of the model in the high 80s and low 90s. In box 3 of the code, there are the hyperparameters that need to be tuned and the pipeline that might need to be modifed. Voting model can be used to get high accuracy. We need to improve the model accuracy from the existing code. Info about the dataset: The dataset is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from 'alef' to 'yeh') ten times on two forms. The forms were scanned at the resolution of 300 dpi. The dataset is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set were not from a single institution (to ensure variability of the test set). The code: This is a machine learning model in python using scikit learn to classify the handwritten Arabic letters. There are two files. The train data and the test data. The code is available, and we need to optimize the code so under box number 6 when we do the cross validation of the model, the accuracy of the model should be in high 80s and low 90s. we should be tuning the hyperparameters and improve the pipeline as needed. Anything is allowed to be used from the scikit learn but nothing more. Voting model can be used to improve accuracy. Goal: build an image classifier to classify handwritten Arabic language characters using scikit learn. The model accuracy have to be in high 80s like 89% or low 90s like 92% This is all about tuning the hyperparameters and the model pipeline