Monday, October 13, 2014

R Tip of the Day


If I'd known this a month ago, I'd have saved myself a lot of time. I was pulling GC data from a JVM's logs but my pattern matching wasn't perfect so I was getting a lot of dirty data (NA in R).

To clean these from a vector, you can do something similar to this:

data      <- c( 1, 2, 3, NA, 5, 6 ) # data with some dirty elements
badData   <- is.na(data)            # elements are TRUE or FALSE
data(!badData)
cleanData <- data[!badData]

Similarly, if you have 2-dimensional (or more) data, you can clean all data when there is as much as one element of a tuple that is dirty. For example:

<- c( 1,  2,  3,  NA, 5,  6 )
<- c( 10, 20, NA, 40, 50, 60 )
cleaned <- complete.cases(x, y)

will clean all data where either x or y (or both) is dirty:

> x[cleaned]
[1] 1 2 5 6
> y[cleaned]
[1] 10 20 50 60


No comments:

Post a Comment