I need to share a data.frame
with a colleague, but would like to somehow "anonymize" the data. The idea would be:
- Nothing too advanced (I don't need to adhere to any standards or norms)
- not reversible
- only for chains
- simple and fast
Suppose some data like this:
df <- data.frame(nombre = c('Juan', 'Pedro'),
Edad = c(34, 45),
dni = c('12345678', '87654321'))
The idea would be to apply the algorithm only on the name and the ID.
I can think of a few alternatives:
1. Transform the strings into a factor and then use the index of each
level
2. Generate random characters of the size of each string
Clarification: technically speaking, this form would be reversible if the "seed" with which the process began is known.
3. Use some "hashing" routine
Source: Data anonymization in R