R programming – convert factors to numeric data in R

Symbol of coding on a computer screen

When reading a file into R, numeric data may be imported as factors. Here’s how to convert factor to numeric data in R.



Syntax to convert factor data to numeric data, maintaining decimal accuracy

# convert the factor data in vector sampleFactor to numeric
 
sampleNumeric <- as.numeric(levels(sampleFactor))[as.integer(sampleFactor)]

Steps shown in the code below:
  • Generate some source decimal data seq()
  • Sample the data sample()
  • Convert the data from numeric to factor as.factor()
  • The result of using as.numeric() on the factor data (spoiler alert: integers are returned)
  • The result of using the correct syntax to convert factor to numeric.

Convert factor to numeric data — code without results

# create a vector of numeric (positive decimal) data
 source <- seq(from=10, to=25, by=.25)
 
# look at the data
 class(source)
 head(source)

# take a sample of values from the vector
 # syntax: sample(x, size, replace = T/F, ...)
 # x | the data to sample from
 # size | number of elements to take
 # replace | whether to replace the values back into the pot if selected
 
 sample <- sample(source, size=50, replace=TRUE)
 
# check the data
 class(sample)
 sample

# change the datatype from numeric to factor
 sampleFactor <- as.factor(sample)

 class(sampleFactor)
 head(sampleFactor)

# change the datatype back from factor to numeric
 
# as.numeric() returns integer values
 sampleInt <- as.numeric(sampleFactor)
 
 head(sampleInt)
 
 #CORRECT METHOD to retain decimal accuracy
 sampleNumeric <- as.numeric(levels(sampleFactor))[as.integer(sampleFactor)]
 
 class(sampleNumeric)
 sampleNumeric

Convert factor to numeric data — code with results

> # create a vector of numeric (positive decimal) data
>  source <- seq(from=10, to=25, by=.25)
>  
> # look at the data
>  class(source)
[1] "numeric"
>  head(source)
[1] 10.00 10.25 10.50 10.75 11.00 11.25
> 
> # take a sample of values from the vector
>  # syntax: sample(x, size, replace = T/F, ...)
>  # x | the data to sample from
>  # size | number of elements to take
>  # replace | whether to replace the values back into the pot if selected
>  
>  sample <- sample(source, size=50, replace=TRUE)
>  
> # check the data
>  class(sample)
[1] "numeric"
>  sample
 [1] 12.25 14.50 17.75 16.00 20.75 20.25 11.00 11.75 15.00 17.25 23.00 21.00 15.50 18.50 12.00 15.25 12.50 20.50 11.75 20.75 11.75 13.25 19.75 12.25 19.00 15.75 23.50
[28] 10.75 14.50 23.00 24.75 20.50 21.75 12.00 20.00 11.00 13.50 21.00 13.25 15.50 13.25 12.25 14.50 13.25 22.25 17.25 11.25 18.25 23.50 12.75
> 
> # change the datatype from numeric to factor
>  sampleFactor <- as.factor(sample)
> 
>  class(sampleFactor)
[1] "factor"
>  head(sampleFactor)
[1] 12.25 14.5  17.75 16    20.75 20.25
32 Levels: 10.75 11 11.25 11.75 12 12.25 12.5 12.75 13.25 13.5 14.5 15 15.25 15.5 15.75 16 17.25 17.75 18.25 18.5 19 19.75 20 20.25 20.5 20.75 21 21.75 22.25 23 ... 24.75
> 
> # change the datatype back from factor to numeric
>  
> # as.numeric() returns integer values
>  sampleInt <- as.numeric(sampleFactor)
>  
>  head(sampleInt)
[1]  6 11 18 16 26 24
>  
>  #CORRECT METHOD 
>  sampleNumeric <- as.numeric(levels(sampleFactor))[as.integer(sampleFactor)]
>  
>  class(sampleNumeric)
[1] "numeric"
>  sampleNumeric
 [1] 12.25 14.50 17.75 16.00 20.75 20.25 11.00 11.75 15.00 17.25 23.00 21.00 15.50 18.50 12.00 15.25 12.50 20.50 11.75 20.75 11.75 13.25 19.75 12.25 19.00 15.75 23.50
[28] 10.75 14.50 23.00 24.75 20.50 21.75 12.00 20.00 11.00 13.50 21.00 13.25 15.50 13.25 12.25 14.50 13.25 22.25 17.25 11.25 18.25 23.50 12.75

R’s default display output

Below we can see that R’s default display of numerals depends on the number of numerals before and after the decimal point. When we call a variable, the value displayed won’t necessarily match the actual value stored by R.

To see the actual value, use the print() command, with the argument ‘digits=‘ to specify the number of digits to display.

> # 11 digits to 9dp
> x <- 10.125125125
> # default display
> x 
[1] 10.12513
> print(x, digits=20)
[1] 10.125125125
> 
> # 11 digits to 8dp
> y <- 101.25125125
> y	
[1] 101.2513
> print(y, digits=20)
[1] 101.25125125
> 
> # 11 digits to 7dp
> z <- 1012.5125125
> z	
[1] 1012.513
> print(z, digits=20)
[1] 1012.5125125
> 
> # 12 digits to 6dp
> n <- 100000.125125
> n
[1] 100000.1
> print(n, digits=20)
[1] 100000.12512500001

stringsAsFactors when importing data in R

To avoid R from converting numeric data to factors, include stringsAsFactors=FALSE in the read.csv() function.

Read more in our post here.