Microsoft In these days has reIeased an interesting paper about using R to preparing Data.
In this paper, they’ll demonstrate some of the things that can go wrong with data, and explore ways to address those issues using the R statistical language (https://cran.r-project.org/) before going on to analysis.
For faster numerical libraries, allte the paper is based on the Microsoft R Open distribution (https://mran.microsoft.com/open/).
The idealized goal in mind is using machine learning to build a predictive model.
In the paper that can be found on this link you cand ofund information about:
- Loading Data
- Shaping Data
- Variable type
- Check for bad or missing value
- Dealing with missing values (NA)
- Categorical variables with too many levels or with rare levels
Read it is very interesting!