Let’s say we have chosen and implemented the best Machine Learning Algorithm, suitable for the data of our choice but ended up figuring out that the algorithm is actually making unacceptably large errors in prediction. What do we do next?
Let’s discuss some choices that can be made in such a scenario and also get an intuition of what can be expected out of them. These can be used in general, for debugging or analyzing the performance of a Machine Learning Algorithm on a specific implementation.
- Add more training examples
- Try smaller set of features
- Try getting additional features
- Add polynomial features
- Increasing / decreasing the regularization parameter
The result of these can only make sense after understanding some underlying concepts like Over-fit, Under-fit, Bias, Variance etc. Here is what they mean:
Under-fit and Over-fit:
As illustrated in the graph, if the regression model fits only few data samples, it is said to be under-fitting. Insufficient number of Features might cause under-fitting. This is also called High Bias.
As we can see, the model fits almost all the data samples provided to the algorithm. It may perform well on the training set but will not generalize well enough when new samples are encountered in the test set. This is also called High Variance.
Bias vs Variance:
When the training set error and the test set error are plotted against, say, the size of the data sample, we get a graph similar to that given below.
As more number of sample are fed, the algorithm performs excellently on the training set, thus reducing the error drastically. Whereas, on the test set it may performs poorly leading to high error.
[Error is the difference between the prediction made by the model and the actual value. The entire data sample is usually divided into a Training set, a Test set and sometimes a Cross validation set. The predictive model is trained using the training set. Cross validation and test sets are used to validation and Testing the model accordingly.]
High Bias: If both the training set error and the test set error are high, we can incur that the algorithm is suffering a high bias or is under-fitting.
High Variance: In case the training set error is very low and error on test set is very high when compared to the training set, the algorithm is suffering from high variance or is over-fitting.
Back to the diagnosis:
Now that we have the required background, let’s go back to the diagnosis and analyze each of our choices.
- Add more training examples:
- If the test set error is too higher than that of the training set and we conclude the possibility of high variance, we can fix it by collecting more data samples and training the model again.
- This might not be such a great idea if both the training and test set errors are high.
- Try getting additional features:
- Say the algorithm is under-fitting the samples, the number of features might not be sufficient for the model to make accurate predictions. Collecting some more features may prove to be helpful.
- For Example, if the model is predicting the price of a hotel, adding more features like number of rooms, room size, floor elevation, balconies, furniture etc can act as additional features.
- Try smaller set of features:
- If the algorithm is over-fitting the data samples; in other words, it performs extremely well on the training set but is highly erroneous on the test set, adding more features might help improve the performance.
- Add polynomial features:
- In some scenarios, we may be confined only to a limited set of features and adding more number of features might not be possible. In such case, an additional term can be added to the model, using an existing feature.
- A linear model, for example, can be transformed into a quadratic model or a cubic model by adding an additional term which can be the square of the size of the room (or the cube)
- Adding a polynomial feature can fix high Bias.
- Increasing / decreasing the regularization parameter:
- Cost function is the best possible hypothesis/model to the training set that has the minimum squared error. To minimize the error or optimize the cost function, a regularization parameter called lambda is added.
- Increasing the regularization parameter penalizes the model vector growing arbitrarily, forcing the optimization function to choose smaller values of the weights (thetha). This results in fitting a better model to the data when the existing one is suffering from high variance/over-fitting.
- Similarly, decreasing the regularization parameter leads to higher values of thetha, fitting the model better when it is under-fitting. In other words, it fixes high bias.
These choices are not exhaustive but identifying the right step to be taken can save a lot of time and help arrive at the Best Fit, faster!
Pingback: Evaluating your Classifier – Reverie