Preparing data for analysis using R

Microsoft In these days has reIeased an interesting paper about using R to preparing Data.

In this paper, they’ll demonstrate some of the things that can go wrong with data, and explore ways to address those issues using the R statistical language ( before going on to analysis.

For faster numerical libraries, allte the paper is based on the Microsoft R Open distribution (

The idealized goal in mind is using machine learning to build a predictive model.

In the paper that can be found on this link you cand ofund information about:

  • Loading Data
  • Shaping Data
  • Variable type
  • Check for bad or missing value
  • Dealing with missing values (NA)
  • Categorical variables with too many levels or with rare levels

Read it is very interesting!


Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo di

Stai commentando usando il tuo account Chiudi sessione /  Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione /  Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione /  Modifica )

Connessione a %s...