I am working with the national health survey in Spain (ENSE). In it, the respondents are asked about their nationality in the form of 4 binary response questions (1 in the affirmative case, 2 in the negative case) :
- Spanish nationality?: E2_1a
- foreign nationality?: E2_1b
- Don't know: E2_1c
- Does not answer: E2_1d
Well, what I need is to create a single variable from these 4 , assigning a value to each affirmative answer of the first two questions and another for when it was answered that he does not know or does not answer, add one more value for those that have the double nationality (a 1 in the first two questions).
In addition, if in any case the respondent gave an affirmative answer to one or both of the first 2 questions and also to the last 2 (This would mean that the respondent claims to have a nationality but at the same time does not know it or does not confess it, which is illogical) or if your answer was negative in all cases, delete these observations.
E2_1a E2_1b E2_1c E2_1d E2
1 2 2 2 A
1 1 2 2 C
1 2 2 2 A
1 2 2 2 A
1 2 2 2 A
1 2 2 2 A
1 2 2 2 A
1 2 2 2 A
2 1 2 2 B
1 2 2 2 A
1 1 2 2 D
In the previous table of the dataframe, the variable E2 is the one I need and the others are the ones from the dataframe (the ones I already have). All values are NOT numeric and I prefer variable E2 to be numeric as well, so I use the following values:
- A: born Spanish
- B: born foreign
- C: double born
- D: Does not know/does not answer
Lastly, the code I've tried so far in reference to this question simply assigns a value (from 1 to 4) to each yes in the dataframe variables. My sample is 57068 observations, that's why I first create the variable E2 and give all of them any value (5), the rest of the variables are in the dataframe called DATA1.
E2<-rep(5,times=57068)
for(i in 1:nrow(DATA1)){
if (DATA1$E2_1a=="1"){E2[i,]=1}
else (DATA1$E2_1b=="1"){E2[i,]=2}
else (DATA1$E2_1c=="1"){E2[i,]=3}
else (DATA1$E2_1d=="1"){E2[i,]=4}
}
Fixing this snippet of my code would be enough for me to continue to answer the main question on my own.
You do not need to iterate over the data, in R base you have
ifelse()
a vector function that allows you to apply the conditional to an entire vector:Another very compact form, if you already use it ,
dplyr
is thecase_when()
, in this case combined with thewith()
to avoid always writing the name ofdata.frame
:On the other hand, I recommend you not to use the values
c(1,2)
to indicate aVerdadero / Falso
, it is preferable to usec(1,0)
them since they naturally translate toVerdadero / Falso
,