Missing values are probably the most common headache you are going to have as machine learning engineer. missing values are the ones who screw the whole algorithm and make model give wired results(predictions).
Treating missing values---
1. Use exploration to detect the missing values -- detecting missing values is crucial because lot of machine learning models fail because of missing values.
2.Find how are missing values are coded-- missing values could be codded in the data in one or more of following formats.
-NULL
-a string or number--eg.-9999,0,"NA","?"etc.
3. Treatment strategy--
- if some column has lot of missing values then its better to get rid of that column.
- remove row-- suppose very few rows have missing values then remove those rows.
-Forward or backward fill-- sometimes its just better to use fill which work by filling value of nearest neighbours in null cell. it is useful when data is in some order say in order of date or time.
Comments
Post a Comment