Machine Learning Simplified

Machine Learning Simplified

Introduction to ML

So what is machine learning? Well if you look it up on the internet it’ll say that it’s algorithms that can learn from observational data and can make predictions based on it.

But in reality, these techniques are usually very simple: we take a set of observational data, we fit a line to it, and then we can use that line to make predictions.

Types of Machine Learning

Let’s talk about the two different types of machine learning we talk about, supervised and unsupervised.

1. Unsupervised Learning

The basic definition of unsupervised learning is that you’re not giving your model any answers to learn from. You’re just presenting it as a group of data and an unsupervised model tries to make sense out of given information.

Example

There are different objects; balls and cubes and sets of dice. And I have some algorithm that will cluster these objects into things that are similar to each other based on some similarity metric.

So the problem with that is you don’t necessarily know what the algorithm will come up with. it’s gonna depend on the metric that I give it for the similarity between items, primarily. But sometimes you’ll find clusters that are surprising and emerged that you didn’t expect to see.

So that’s really the point of unsupervised learning.

2. Supervised Learning

Now, in contrast, supervised learning is a case where we have a set of answers that the model can learn from.

So we give it a set of training data, in this case, that the model learns from and it can infer relationships between the features and the categories that we want and then apply that to unseen new values and predict information about them.

Example

So I have a set of known cars and their actual prices that they sold for, I train the model on that set of complete answers, and then I can create a model that I can use to predict the prices of new cars that I haven’t seen before.

3. Reinforcement Learning

In reinforcement learning, a model faces a game-like situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. The model gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.

It’s up to the model to figure out how to perform the task to maximize the reward, starting from totally random trials and finishing with sophisticated tactics and superhuman skills.re

Example

In order for an autonomous vehicle to drive and obey the rules of law, the programmer cannot predict everything that could happen on the road. Instead of building lengthy “if-then” instructions, the programmer prepares the reinforcement learning agent to be capable of learning from the system of rewards and penalties.

Evaluating Supervised Learning

Train-Test

We can use a trick called train/test.

If I were to split observational data that I want my model to learn from into two groups: a training set and a testing set.

I build my model based on the data that I’m calling my training set.

And I reserve another part of my data and I’m gonna use that for testing purposes, Then I can evaluate the model and see if it can successfully predict the correct answers for my testing data.

Reason for Testing Data?

It gives me a very concrete way to test how good my model is on unseen data because I actually have a bit of data set aside that I can test it with. And then you can measure quantitatively how well it did use r-squared or some other metric like a root-mean-squared error.

Points for Successful Train-Test Implementation

  • Training and Testing sets should be large enough to contain all the variations and outliers in the data.
  • Make sure the data selection is random: because there could be some pattern sequentially in your data. (A great way to guard against overfitting.)

Problems with Train-Test

Overfitting

So if your model is overfitting and just going out of its way to accept outliers in your training data. A model learns the detail and noise in the training dataset to the extent that it negatively impacts the performance of the model on a new dataset.

Sign of Overfitting: The error on the testing or validation dataset is much greater than the error on the training dataset.

Solutions to Overfitting

  1. Cross-validation

Use your initial training data to generate multiple mini train-test splits. In standard k-fold cross-validation, we partition the data into k subsets, called folds. Cross-validation allows you to tune hyperparameters with only your original training dataset.

2. Reduce Complexity

By decreasing the complexity of the model to make it simple enough that it does not overfit. Some of the actions that can be implemented include pruning a decision tree, reducing the number of parameters in a Neural Networks, and using dropout on a Neural Networks.

Underfitting

It refers to a model that can neither model the training dataset nor generalizes to a new dataset. It's not a suitable model and will be obvious as it will have poor performance on the training dataset.

Solution to Underfitting

  1. Increasing the model complexity

Your model may be underfitting simply because it is not complex enough to capture patterns in the data. Using a more complex model, for instance by switching from a linear to a non-linear model will very often help solve underfitting.

  1. Reducing regularization

The algorithms you use include by default regularization parameters meant to prevent overfitting. Sometimes, they prevent the algorithm from learning. Reducing their values generally helps.

General Flow Chart for Guidance

Thank you for reading this post, I hope you enjoyed and learn something new today. Feel free to contact me through my blog if you have questions, I will be more than happy to help.

Stay safe and Happy learning!

References

[Unsupervised clustering with mixed categorical and continuous data | Tomas Beuzen
Recently I had to do some clustering of data that contained both continuous and categorical features. Standard…tomasbeuzen.com](https://www.tomasbeuzen.com/post/clustering-mixed-data/ "tomasbeuzen.com/post/clustering-mixed-data")

[Understand The Machine Learning From Scratch For Beginners
Machine Learning is the term that we have been Hearing Now everywhere. When I Started My ML Journey Back in April 2016…houseofbots.com](https://www.houseofbots.com/news-detail/3581-4-understand-the-machine-learning-from-scratch-for-beginners "houseofbots.com/news-detail/3581-4-understa..")

[Why Unsupervised Machine Learning is the Future of Cybersecurity
Not all Artificial Intelligence is created equal As we move towards a future where we lean on cybersecurity much more…technative.io](https://technative.io/why-unsupervised-machine-learning-is-the-future-of-cybersecurity/ "technative.io/why-unsupervised-machine-lear..")