What is a promise in Javascript?

Question

Asked: 2020-06-16 12:56:09 +0800 CST 2020-06-16 12:56:09 +0800 CST 2020-06-16 12:56:09 +0800 CST

Obtain division of the levels of the same factor to generate a new data structure

772

I have a database of this type (here is a minimal sample that would contain data from two simulations for each round, level and condition):

Edited data:

df <- data.frame(Sim=c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2),Ronda=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),Condicion=c('A1','A1','A2','A2','A1','A1','A2','A2','B1','B1','B2','B2','B1','B1','B2','B2','A1','A1','A2','A2','A1','A1','A2','A2','B1','B1','B2','B2','B1','B1','B2','B2'),Nivel=c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2),Salida=c(3,2.5,2.1,1.9,2.8,2.3,2.0,1.6,2.6,2.7,1.3,1.2,2.4,2.3,1,1.1,2,1.3,1.3,0.9,2,2.1,2.1,1.2,2,1.7,1.2,1,2,1.3,0.5,0.4))

I would now like to manipulate this database to obtain the relative reduction that, in order, the "Output" corresponding to A2 produces in A1 (1-(A2/A1), and B2 in B1 (1-(B2/B1), for each simulation, Round and Level.In other words, it seeks to obtain the reduction percentage that n2 produces in n1.

So what you want to generate is something like this:

Ronda    Condición    Nivel   Resultado
1        1-(A2/A1)      1        0.3
1        1-(A2/A1)      1        0.24
...
1        1-(B2/B1)      2        0.5
1        1-(B2/B1)      2        0.56
...
2        1-(A2/A1)      1
2        1-(A2/A1)      1
...
2        1-(B2/B1)      1
2        1-(B2/B1)      1
...

My attempts for now have been aimed at operating using subsetdata and trying to do calculations with tapply. I appreciate help.

Second edition:

Here is an example of real data. The goal is the same. In this case Condition is a factor with 6 levels. Therefore, it is intended to create a new dataframe that contains Simulation, Round, Cofactor and the new columns: "Result" and "New Condition" (a new condition with 3 levels). For example:

A=(1-(Heterogeneity OTA/Homogeneity OTA)
B=(1-(Heterogeneity C/Homogeneity C)
C=(1-(Heterogeneity PR/Homogeneity PR)

This would allow you to easily graph the data later.

2 Answers

Voted

mpaladino · Answer 1 · 2020-06-17T07:42:03+08:00

Below an answer for the actual data.

A solution goes with the dlyrand tidyr. The idea is to go to wide format, so that there is a column for each Condition (A1, A2, B1, B2), then it is easy to do the division and subtraction operation.

The problem I found is that there was no identifier for the simulations. In your example data for each Round, Level and Condition group there were two rows left.

I assumed each row corresponded to a different simulation and gave them a unique identifier. If in your real data you have a simulation/patient/whatever identifier you should use that identifier. The idea is that each group is unique.

library(tidyverse)

df %>% 
  group_by(Ronda, Nivel, Condicion) %>% 
  mutate(simulacion = 1:n()) %>%              #Aquí agrego un identificador de simulación
  spread(key = Condicion, value = Salida) %>% 
  mutate(A = 1-(A2/A1), 
         B = 1-(B2/B1)) %>%                   #Acá ya está el resultado
  gather(condicion, resultado, A, B).         #Solo para que se vea como lo que se indica en la pregunta.

# A tibble: 16 x 9
# Groups:   Ronda, Nivel [4]
Ronda Nivel simulacion    A1    A2    B1    B2 condicion resultado
<dbl> <dbl>      <int> <dbl> <dbl> <dbl> <dbl> <chr>         <dbl>
1     1     1          1   3     2.1   2.6   1.3 A             0.300
2     1     1          2   2.5   1.9   2.7   1.2 A             0.24 
3     1     2          1   2.8   2     2.4   1   A             0.286

If your data is very clean and you can trust the alphabetical order of Condiciónthe following alternative would work for an arbitrary number of groups ofCondición

df %>% 
  group_by(Ronda, Nivel, Condicion) %>% 
  mutate(simulacion = 1:n()) %>% 
  arrange(Ronda, Nivel, simulacion, Condicion) %>% 
  separate(Condicion, into = c("Letra", "Número"), sep = 1) %>% #OJO ACÁ: separa por posición!!
  group_by(Ronda, Nivel, simulacion, Letra) %>% 
  mutate(Resultado = 1-Salida/lag(Salida))

Although you should be careful with the separator on the 5th line which is completely ad hoc.

spread()it is notorious for the error message Error: Each row of output must be identified by a unique combination of keys., which can be frustrating at times. But it's actually good reassurance that makes it easier to identify problems in the data. In this case there was unidentified data (simulation).

Answer with real data

Diagnosis

Reviewing the data I see that they are octuplicated

data_homo %>% distinct() %>% nrow()
[1] 115500
115500*8
[1] 924000 #Es el nrow del dataframe original

Perhaps this data that you uploaded is missing a column that separates those groups of 8, but as it is, that information is redundant. In the answer I use distinct()to remove repeated rows. Otherwise it spread()will fail (and rightly so).

Solution

With real data it is difficult to use the approximation x/lag(x), because it depends on the alphabetical or numerical order and in this case there is no "natural" order, as would be the case when working with dates.

Considering that you already know the calculations you want to do I think the best solution is

pass it to wide data: one column for each level of the Condition factor
make a mutate()that specifies the calculations on those columns.
Since the result is going to be "width", pass to long again with gather().

I use the clean_names()package function janitor::to avoid having to wrap names with spaces in `. It's optional, but if you don't use it you should correct the column names in the one mutate()on line 5.

data_homo %>% 
  distinct() %>%                                   #Porque los datos están octuplicados
  spread(key = Condition, value = Output) %>% 
  janitor::clean_names() %>%                       #Para eliminar espacios en los nombres y normaliza a minúsculas
  mutate(A=1-(heterogeneity_ota/homogeneity_ota),  #Dejé los nombres ABC pq no sé que iría ahí realmente. Pero está claro que hace cada operación
         B=1-(heterogeneity_c/  homogeneity_c),
         C=1-(heterogeneity_pr/ homogeneity_pr)) %>% 
  select(simulation, round, cofactor, A, B, C) %>% 
  gather(resultado, nueva_condicion, A, B, C)

I think the result is what you are looking for, however, reviewing the results tail(), some -Inf(negative infinities) appear, probably because there are divisions between 0 and R handles them as infinite numbers.

With filter (nueva_condicion == -Inf)you can see that there are 2303 potentially problematic rows, especially if you plan to graph later.

Patricio Moracho · Answer 2 · 2020-06-17T14:20:37+08:00

First of all, I understand the same as @mpaladino, that there is duplicate information, there are 8 identical rows per observation. First, we remove these duplicates, although technically it is not necessary, but it is important to regenerate one idfor each group of observations as follows:

df <- read.csv('~/Descargas/data_homo.csv', stringsAsFactors = FALSE)
df %>% 
    select(-X) %>% 
    distinct()  %>% 
    group_by(Simulation, Round, Condition) %>% 
    mutate(nr = row_number()) -> new_df

new_df

# A tibble: 63,000 x 6
# Groups:   Simulation, Round, Condition [10,500]
   Simulation Round Condition         Output Cofactor    nr
        <int> <int> <chr>              <dbl>    <dbl> <int>
 1          1     1 Homogeneity OTA        3      0       1
 2          1     1 Homogeneity OTA        3      0.2     2
 3          1     1 Homogeneity OTA        3      0.4     3
 4          1     1 Homogeneity OTA        3      0.6     4
 5          1     1 Homogeneity OTA        3      0.8     5
 6          1     1 Homogeneity OTA        3      1       6
 7          1     1 Heterogeneity OTA      3      0       1
 8          1     1 Heterogeneity OTA      3      0.2     2
 9          1     1 Heterogeneity OTA      3      0.4     3
10          1     1 Heterogeneity OTA      3      0.6     4
# ... with 62,990 more rows

Now, one way to solve it is by thinking about the problem from the perspective of relational databases. The idea is to start from a master table that indicates the existing relationships. Since there are three relationships, it is not difficult to define them manually:

condiciones <- data.frame(new_cond = c('A', 'B', 'C'),
                          from = c('Homogeneity OTA', 'Homogeneity C', 'Homogeneity PR'),
                          to = c('Heterogeneity OTA', 'Heterogeneity C', 'Heterogeneity PR'),
                          stringsAsFactors = FALSE
)
condiciones

new_cond            from                to
1        A Homogeneity OTA Heterogeneity OTA
2        B   Homogeneity C   Heterogeneity C
3        C  Homogeneity PR  Heterogeneity PR

With this "master table" we simply have to propose joinsthat each observation be joined in a single row and then simply do the calculations. This is quite handy because if you eventually show all the variables, we can check the result manually.

condiciones %>% 
    inner_join(new_df, by=c("from" = "Condition")) %>% 
    inner_join(new_df, by=c("to" = "Condition",
                            "Simulation" = "Simulation",
                            "Round" = "Round",
                            "nr" = "nr")
               ) %>% 
    mutate(Resultado = 1 - (Output.y/Output.x)) %>% 
    select(new_cond, Simulation, Round, Resultado) %>% 
    sample_n(10)

      new_cond Simulation Round   Resultado
8615         A        206     1  0.00000000
20523        B        239     5  0.22222222
6896         A        165     2  0.16666667
8859         A        211     7 -1.03117261
23279        C         55     2  0.10000000
5987         A        143     4 -0.39458360
2496         A         60     3 -0.12991072
11193        B         17     4  0.04193819
11764        B         31     1  0.00000000
22463        C         35     6  0.18817578

Obtain division of the levels of the same factor to generate a new data structure

Answer with real data

Diagnosis

Solution

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?