Skip to main content

Learn Data preparation steps for machine learning just in 5 min..!!!

Machine learning is hot topic now day and will be in future. data preparation is key when it comes to machine learning. with good data preparation a simple ml model can give very good and satisfying results but if data is not well prepared and you use very good quality/sophisticated ml algorithm(with good prediction precision) it is going to fail ,your model will just take garbage input and give out garbage output. keep in mind data preparation often makes more difference then the algorithm it self.
Ultimate goal of data preparation is to insure that machine learning model  works optimal way.

Steps of data preparation ---
  • Explore to understand problems in data---there are many methods are used to explore data hrad(), tail(),shape(),describe() are just example of such methods ,and also we can use plots for this purpose.
  • Remove duplicates
  • Treat missing values
  • Treat errors and outliers
  • Treat null values
  • scale the features 
  • split dataset
now understand that we don't just do these steps in this order . and often we just need to preform each of the step more then ones and maybe not even ones all depends on data. 

Comments

Popular posts from this blog

Feature Engineering is easy ..!!

Features are input to machine learning algorithm. and the row features that are provided to us are not the best, they are just whatever just has to be in data.  Feature engineering have a goal to convert given features into much more predictive ones so that the can predict label more precisely . feature engineering can make simple algorithm to give good results. at a same time if you apply the best algorithm and do not perform feature engineering well you are going to get poor results.  feature engineering is a broad subject people dedicate their entire careers to feature engineering. there are some steps in feature engg that we need to follow and repeat most of the times to get job done. steps--- 1. Explore and understand data relationships 2. Transform feature  3.Compute new features from other by applying some  maths on it 4. Visualization to check results   5. Test with ML model 6. Repeat above steps as needed Transforming feature--- Why transform featu...