I have a data frame where some columns are of character type, others numeric etc. Some of the columns have 0's. I want to change all 0's to NAs
I apply the following
datos2<-data.frame(sapply(datos, function(x) {ifelse(x==0,NA,x)}))
str(datos2)
And it turns out that all the columns are now of type factor.
How do I make data and data2 have the same structure?
Without having the object
datos
it is a bit difficult to know exactly what is happening, but it seems to me that the problem is the sapply and the data structure coercion rules that it applies. Here is an example of what could be happening.In this case , it
datos
has a numeric column, a logical one (if the NAs do not have another valid data type, they are assigned the logical type) and a character column.R
"guesses" which data type, we already know that it is weakly typed . Since I'm using R>4.0,stringsAsFactors = FALSE
by default, otherwisec
I would assign the classfactor
. I think it doesn't matter for the purposes of the problem.Indeed, they are all of the character class, if this were R<4.0 they would be factors, but it does not matter for this problem. Why? Because
sapply()
, that is a danger applying coercions to the data. It is difficult to know precisely which class is going to return to ussapply()
. In this case what it does is pass the functionifelse()
on each column ofdatos
and return the result in an array , which is then coerced to data.frame. The point is that arrays don't support columns of different types, so it forces the data to be expressed in the most basic type. Since I have a column with characters, what it does is lower all the others to the character level, which is more general. That is, I can express number as characters, but not characters as numbers.Solution
If
datos
it should contain only numerics there is the problem. If it's ok for it to be a combination of numbers and letters then the best alternative islapply()
instead ofsapply()
.lapply()
it always returns a list and there can be heterogeneous elements in a list, so each column retains its attributes.Alternatively:
Also works. Putting
datos[]
on the left sideR
tries to preserve as many attributes as it can in the assignment, so it keeps it asdata.frame
.conclusion
R
it's weakly typed, that makes a lot of things easier, but it generates these kinds of inconsistencies in the results. Guessing the ouputsapply()
is a lottery and you have to use it very carefully. If you're working with lists, including data.frame, it's safer to uselapply()
and if necessary simplify the result in a later step.