Machine learning is hot topic now day and will be in future. data preparation is key when it comes to machine learning. with good data preparation a simple ml model can give very good and satisfying results but if data is not well prepared and you use very good quality/sophisticated ml algorithm(with good prediction precision) it is going to fail ,your model will just take garbage input and give out garbage output. keep in mind data preparation often makes more difference then the algorithm it self.
Ultimate goal of data preparation is to insure that machine learning model works optimal way.
Steps of data preparation ---
- Explore to understand problems in data---there are many methods are used to explore data hrad(), tail(),shape(),describe() are just example of such methods ,and also we can use plots for this purpose.
- Remove duplicates
- Treat missing values
- Treat errors and outliers
- Treat null values
- scale the features
- split dataset
Comments
Post a Comment