What is a promise in Javascript?

Question

Asked: 2020-04-25 16:44:32 +0800 CST 2020-04-25 16:44:32 +0800 CST 2020-04-25 16:44:32 +0800 CST

How do I convert a large confusion matrix to a 2x2 matrix in python? All this being both a dataframe

772

I have a dataframe with a confusion matrix (5x5) with the following data:

I would like to convert this (5x5) matrix into 5 (2x2) confusion matrices, one for each of the letters a,e,i,o,u). For example, for the letter "a", it would have in position [1,1] it would have the times that both the prediction and the result were "a" (correct). At position [2,1], you would have the times that the result is not "a", but the program has predicted that it is (error). At position [1,2], it would have the times that the result is "a", but the program has not recognized a (error). In position [2,2], it would have the times that neither the prediction nor the result has been "a", that is, the rest of the cases.

Something like what you see in the attached image.

To get to the confusion matrix of the first image, I have made this code:

import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

working_path = os.getcwd() #sirve para establecer en qué carpeta estamos trabajando (ruta), ahora todos los archivos q se encuentren en esa carpeta solo los tenemos que llamar con su nombre

df = pd.read_csv("salida.txt",delimiter="\t") #Hacemos un dataframe, importando el archivo txt separado por tabuladores

df.rename(columns={'Number of Syllables': 'NSyllables'}, inplace = True) #Cambiamos (acortamos) nombres de la columna que indica el nº de sílabas

confusion_matrixV = pd.crosstab(df['TargetV'], df['RespV'], rownames=['Target'], colnames=['Response'], margins = True); #Matriz de confusión para VOCALES

enter the code here

I don't know how I could create the other 2x2 from this dataframe, I have supposed that through a for loop that it would start like this, but I don't know how to do it:

for index, row in confusion_matrixV.iterrows():

1 Answers

Voted

FJSevilla · Answer 1 · 2020-04-26T03:21:11+08:00

First I'm going to create an array like the one you show:

import pandas as pd

confusion_matrixV = pd.DataFrame(
    {"Targets": ("a", "e", "i", "o", "u"),
      "a": (158, 0, 0, 0, 0),
      "e": (0, 104, 0, 1, 0),
      "i": (0, 3, 87, 0, 0),
      "o": (0, 0, 0, 123, 2),
      "u": (0, 0, 0, 2, 44)}
)

confusion_matrixV.set_index("Targets", inplace=True)
confusion_matrixV.columns.name = "Response"

>>> confusion_matrixV

Response  a   e   i   o   u
Targets                   
      a   158 0   0   0   0
      e   0   104 3   0   0
      i   0   0   87  0   0
      o   0   1   0   123 2
      u   0   0   0   2   44

Assuming your crosstab has exactly that structure, for each letter we can define:

True Positives - is the value of the cell on the main diagonal of that column.
False positives : it is the sum of all the values of the column except the cell of the main diagonal.
True negatives : it is the sum of all the values of the matrix except those included in the row and column of that letter.
False Negatives : It is the sum of the values of the row with that same index minus the value of the cell of the main diagonal.

Therefore, by vectorizing operations, we can obtain a matrix for each of the previous values. Then just generate a dataframe for each:

import numpy as np

verdaderos_positivos = np.diag(confusion_matrixV)
falsos_positivos = confusion_matrixV.sum(axis=0) - verdaderos_positivos
falsos_negativos = confusion_matrixV.sum(axis=1) - verdaderos_positivos
verdaderos_negativos = (
    confusion_matrixV.to_numpy().sum() - 
    (verdaderos_positivos + falsos_positivos + falsos_negativos)
)

for dato, vp, fp, vn, fn in zip(
    confusion_matrixV.columns,
    verdaderos_positivos, falsos_positivos,
    verdaderos_negativos, falsos_negativos):

    frame = pd.DataFrame.from_dict(
        {"Positive(1)": (vp, fp), "Negative(0)": (fn, vn)},
        orient="Index",
        columns=("Positive(1)", "Negative(0)")
    )
    frame.index.name = f'Predicted Values for "{dato}"'
    frame.columns.name = f'Actual values for "{dato}"'
    print(frame, "\n")

Actual values for "a"     Positive(1)  Negative(0)
Predicted Values for "a"                          
Positive(1)                       158            0
Negative(0)                         0          366 

Actual values for "e"     Positive(1)  Negative(0)
Predicted Values for "e"                          
Positive(1)                       104            1
Negative(0)                         3          416 

Actual values for "i"     Positive(1)  Negative(0)
Predicted Values for "i"                          
Positive(1)                        87            3
Negative(0)                         0          434 

Actual values for "o"     Positive(1)  Negative(0)
Predicted Values for "o"                          
Positive(1)                       123            2
Negative(0)                         3          396 

Actual values for "u"     Positive(1)  Negative(0)
Predicted Values for "u"                          
Positive(1)                        44            2
Negative(0)                         2          476

Instead of printing, you can put each DataFrame in a list or wherever you want.

How do I convert a large confusion matrix to a 2x2 matrix in python? All this being both a dataframe

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?