What is a promise in Javascript?

Question

Patricio Moracho

Asked: 2022-09-04 14:42:22 +0800 CST 2022-09-04 14:42:22 +0800 CST 2022-09-04 14:42:22 +0800 CST

How to anonymize strings from a data.frame?

772

I need to share a data.framewith a colleague, but would like to somehow "anonymize" the data. The idea would be:

Nothing too advanced (I don't need to adhere to any standards or norms)
not reversible
only for chains
simple and fast

Suppose some data like this:

df <- data.frame(nombre = c('Juan', 'Pedro'),
                 Edad = c(34, 45),
                 dni = c('12345678', '87654321'))

The idea would be to apply the algorithm only on the name and the ID.

1 Answers

Voted

Patricio Moracho · Answer 1 · 2022-09-04T14:42:22+08:00

I can think of a few alternatives:

1. Transform the strings into a factor and then use the index of each`level`

anonymize_str_vector <- function(x) {
  if( !is.character(x) ) stop('x is not a vector of characters')
  as.character(as.numeric(factor(x)))
}

anonymize_str_df <- function(df) {
  if( !is.data.frame(df) ) stop('df is not a data.frame')
  convert <- sapply(df, is.character)
  df[,convert] <- sapply(df[,convert], anonymize_str_vector)
  df
}

anonymize_str_df(df)

  nombre Edad dni
1      1   34   1
2      2   45   2

2. Generate random characters of the size of each string

anonymize_str_vector <- function(x) {
  if( !is.character(x) ) stop('x is not a vector of characters')
  
  anonymize_scalar <- function(s) {
    suppressWarnings(if (!is.na(as.numeric(s))) {sample_data <- 0:9} else {sample_data <- letters})
    paste0(sample(sample_data, nchar(s), replace = TRUE), collapse="")
  }
  sapply(x, anonymize_scalar)
}

anonymize_str_df(df)

  nombre Edad      dni
1   wgon   34 36360598
2  cbval   45 18800229

Clarification: technically speaking, this form would be reversible if the "seed" with which the process began is known.

3. Use some "hashing" routine

Source: Data anonymization in R

library(digest)

anonymize_str_vector <- function(x, algo="crc32") {
  if( !is.character(x) ) stop('x is not a vector of characters')
  unq_hashes <- vapply(unique(x), function(object) digest(object, algo=algo), FUN.VALUE="", USE.NAMES=TRUE)
  unname(unq_hashes[x])
}

anonymize_str_df(df)

    nombre Edad      dni
1 5c97bc37   34 2670dbda
2 e6473558   45 bc76be3a

How to anonymize strings from a data.frame?

1. Transform the strings into a factor and then use the index of each`level`

2. Generate random characters of the size of each string

3. Use some "hashing" routine

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

How to anonymize strings from a data.frame?

1 Answers

1. Transform the strings into a factor and then use the index of eachlevel

2. Generate random characters of the size of each string

3. Use some "hashing" routine

1. Transform the strings into a factor and then use the index of each`level`