I create a function that allows to join data frames usingrbind()
The function is as follows (you can run it since it is generated with random data)
juntador<-function(x){
for (i in 1:x) {
assign(paste0("data_",i), data.frame(var1=sample(1:10), var2=sample(1:10)))
}
lista<-lapply(ls(pattern = "^data"), get)
junte<-do.call(rbind, lista)
return(junte)
}
Then, the function takes a number as an argument, and based on this, data frames are created, and then they are joined.
For example, if I consider 3. I should join the 3 dataframes that I just created.
juntador(3)
# NULL
But it throws me NULL
, I presume this is because in the execution of the lapply
it was not saved in the environment. So I put this on each of the lines.
juntador<-function(x){
for (i in 1:x) {
assign(paste0("data_",i), data.frame(var1=sample(1:10), var2=sample(1:10)),
envir = globalenv())
}
lista<-lapply(ls(pattern = "^data"), function(x){get(x, envir = globalenv())})
junte<-do.call(rbind, lista, envir = globalenv())
return(junte)
}
But I still can't get what I want. He keeps throwing me NULL
.
For what is this? Where do I have to correct?
Indeed, it is a matter of environments: there are two important ones to keep in mind when working with functions, because they are the ones that determine the order in which the symbols are searched (for example, variables):
The variable is always searched for in the environment in which the function is being called, and if it is not found there, it goes to the environment where the function has been defined.
When you call
get()
directly, the "Calling environment" is the one of your function, where you have the variables defined and obviouslyget()
you have no problem finding them, now when you call throughlapply()
the "Calling environment" it is the one fromlapply()
(it is insidelapply
where it is calledget()
) and in this environment you do not have defined the variable to search for, so when the variable is not found, it will be searched for in the "enclosing environment" which in the case ofget()
is the global environment, where you do not have it defined, in fact, as a demonstration, if you define it outside your function, you probably won't have the error, although it's clearly not what you're looking for:Here we have created an object
data_1
in the same environment where it is definedget()
when the search fails incalling environment
which the symbol is searched in the global space.In the case of
get()
what you could do, because the function implements it, is to indicate in which environment you want it to searchlista<-lapply(ls(pattern = "^data"), get, env = environment())
, the third parameter is passed directly toget
and basically we are telling it to search for the variables in a specific environment, in this case the environment corresponding to that of your functionFor more information I recommend Advance R by Hadley Wickham and the chapter: Environments