When reading a file into R, numeric data may be imported as factors. Here’s how to convert factor to numeric data in R.
Links to topics on this page
- Syntax to convert factor data to numeric
- Code with results
- Code without results
- R’s default display of numbers — what you see isn’t (necessarily) what you have
- Use stringsAsFactors=FALSE when importing files
- Links to more information
Syntax to convert factor data to numeric data, maintaining decimal accuracy
# convert the factor data in vector sampleFactor to numeric
sampleNumeric <- as.numeric(levels(sampleFactor))[as.integer(sampleFactor)]
Steps shown in the code below:
- Generate some source decimal data seq()
- Sample the data sample()
- Convert the data from numeric to factor as.factor()
- The result of using as.numeric() on the factor data (spoiler alert: integers are returned)
- The result of using the correct syntax to convert factor to numeric.
Convert factor to numeric data — code without results
# create a vector of numeric (positive decimal) data
source <- seq(from=10, to=25, by=.25)
# look at the data
class(source)
head(source)
# take a sample of values from the vector
# syntax: sample(x, size, replace = T/F, ...)
# x | the data to sample from
# size | number of elements to take
# replace | whether to replace the values back into the pot if selected
sample <- sample(source, size=50, replace=TRUE)
# check the data
class(sample)
sample
# change the datatype from numeric to factor
sampleFactor <- as.factor(sample)
class(sampleFactor)
head(sampleFactor)
# change the datatype back from factor to numeric
# as.numeric() returns integer values
sampleInt <- as.numeric(sampleFactor)
head(sampleInt)
#CORRECT METHOD to retain decimal accuracy
sampleNumeric <- as.numeric(levels(sampleFactor))[as.integer(sampleFactor)]
class(sampleNumeric)
sampleNumeric
Convert factor to numeric data — code with results
> # create a vector of numeric (positive decimal) data
> source <- seq(from=10, to=25, by=.25)
>
> # look at the data
> class(source)
[1] "numeric"
> head(source)
[1] 10.00 10.25 10.50 10.75 11.00 11.25
>
> # take a sample of values from the vector
> # syntax: sample(x, size, replace = T/F, ...)
> # x | the data to sample from
> # size | number of elements to take
> # replace | whether to replace the values back into the pot if selected
>
> sample <- sample(source, size=50, replace=TRUE)
>
> # check the data
> class(sample)
[1] "numeric"
> sample
[1] 12.25 14.50 17.75 16.00 20.75 20.25 11.00 11.75 15.00 17.25 23.00 21.00 15.50 18.50 12.00 15.25 12.50 20.50 11.75 20.75 11.75 13.25 19.75 12.25 19.00 15.75 23.50
[28] 10.75 14.50 23.00 24.75 20.50 21.75 12.00 20.00 11.00 13.50 21.00 13.25 15.50 13.25 12.25 14.50 13.25 22.25 17.25 11.25 18.25 23.50 12.75
>
> # change the datatype from numeric to factor
> sampleFactor <- as.factor(sample)
>
> class(sampleFactor)
[1] "factor"
> head(sampleFactor)
[1] 12.25 14.5 17.75 16 20.75 20.25
32 Levels: 10.75 11 11.25 11.75 12 12.25 12.5 12.75 13.25 13.5 14.5 15 15.25 15.5 15.75 16 17.25 17.75 18.25 18.5 19 19.75 20 20.25 20.5 20.75 21 21.75 22.25 23 ... 24.75
>
> # change the datatype back from factor to numeric
>
> # as.numeric() returns integer values
> sampleInt <- as.numeric(sampleFactor)
>
> head(sampleInt)
[1] 6 11 18 16 26 24
>
> #CORRECT METHOD
> sampleNumeric <- as.numeric(levels(sampleFactor))[as.integer(sampleFactor)]
>
> class(sampleNumeric)
[1] "numeric"
> sampleNumeric
[1] 12.25 14.50 17.75 16.00 20.75 20.25 11.00 11.75 15.00 17.25 23.00 21.00 15.50 18.50 12.00 15.25 12.50 20.50 11.75 20.75 11.75 13.25 19.75 12.25 19.00 15.75 23.50
[28] 10.75 14.50 23.00 24.75 20.50 21.75 12.00 20.00 11.00 13.50 21.00 13.25 15.50 13.25 12.25 14.50 13.25 22.25 17.25 11.25 18.25 23.50 12.75
R’s default display output
Below we can see that R’s default display of numerals depends on the number of numerals before and after the decimal point. When we call a variable, the value displayed won’t necessarily match the actual value stored by R.
To see the actual value, use the print() command, with the argument ‘digits=‘ to specify the number of digits to display.
> # 11 digits to 9dp
> x <- 10.125125125
> # default display
> x
[1] 10.12513
> print(x, digits=20)
[1] 10.125125125
>
> # 11 digits to 8dp
> y <- 101.25125125
> y
[1] 101.2513
> print(y, digits=20)
[1] 101.25125125
>
> # 11 digits to 7dp
> z <- 1012.5125125
> z
[1] 1012.513
> print(z, digits=20)
[1] 1012.5125125
>
> # 12 digits to 6dp
> n <- 100000.125125
> n
[1] 100000.1
> print(n, digits=20)
[1] 100000.12512500001
stringsAsFactors when importing data in R
To avoid R from converting numeric data to factors, include stringsAsFactors=FALSE in the read.csv() function.
Links to more information
- Read more about the precision of values in R: Double-precision vectors
- R FAQ — 7.10 How do I convert factors to numeric?
- Factors in R
- More on printing values in R