I have 2 dataframes about 2 surveys (National Household and Adult Health Survey), in which many of the variables give "Don't know/No answer" as an answer option, assigning a number or a pair of numbers to said answer.
This response (NS/NC):
- in most occasions/variables/question they are assigned the numbers 8 and 9 to indicate if "does not know" or "does not answer", respectively.
- In a variable it corresponds only to 8, since it only gives the option to indicate "does not know". Similarly for 9.
- In another variable they are 98 and 99
- in others the 998 and 999
- There are other variables that use these values (8,9,998...) to indicate other options that do not correspond to "don't know" or "no answer".
So what I want is to convert all these NS/NC into NA values to later treat them in a more comfortable way with the rest of the missing values. I am doing it as follows:
# Variables de la Encuesta Adultos con respuesta de NS/NC
vars_na_89_A <- c("enf_cron","SM_estres","tipo_act_fis","frec_act_fis","fuma")
vars_na_8_A <- c("sit_lab")
vars_na_9_A <- c("clase_pr")
vars_na_9899_A <- c("niv_est")
vars_na_998999_A <- c("altura","peso")
# Variables de la Encuesta Hogares con respuesta de NS/NC
vars_na_89_H <- c("ruido","malos","agua","limpieza", "cont_indus","cont_otras",
"escasez_verde","molest_animal" ,"delincuencia")
vars_na_9899_H <- c("n_dormitorios", "ingreso")
vars_na_998999_H <-c("m2")
ifelse(datos_adultos[vars_na_89_A]!=8 | datos_adultos[vars_na_89_A!=9],datos_adultos,NA)
ifelse(datos_adultos[vars_na_8_A]!=8,datos_adultos,NA)
.
.
.
In this way I have to create an ifelse for each type of NS/NC of each of the two surveys. My question is: How can I get the same result but without so many lines of code?
It is difficult to give you a general answer, since I do not know what all the total columns that your df contains are, so I cannot give you a reproducible example with your data, I recommend the following code which only works if you know exactly which ones are the columns that contain NA based on the patterns you mention assuming that the columns: vars_na_89_A,vars_na_8_A,vars_na_9_A,vars_na_9899_A,vars_na_998999_A,vars_na_89_H,vars_na_9899_H
keys 89, 8 , 9 ,9899,998999,,89,9899 do not have a different meaning, for example, for vars_na_8_A 9 it has a different meaning than "NA"
This is a proof of concept of a possible way to reduce some code, although I don't think you gain much. The idea is to define a list of replacements, where you indicate the columns and values that you want to replace with
NA
. Assuming onedata.frame
like this:And a replacement list, where we define different criteria:
For example, the first criteria is that in
V1
andV2
the values 8 and 9 are replaced byNA
, the following is to iterate to process each criteria: