I have a dataframe with a confusion matrix (5x5) with the following data:
I would like to convert this (5x5) matrix into 5 (2x2) confusion matrices, one for each of the letters a,e,i,o,u). For example, for the letter "a", it would have in position [1,1] it would have the times that both the prediction and the result were "a" (correct). At position [2,1], you would have the times that the result is not "a", but the program has predicted that it is (error). At position [1,2], it would have the times that the result is "a", but the program has not recognized a (error). In position [2,2], it would have the times that neither the prediction nor the result has been "a", that is, the rest of the cases.
Something like what you see in the attached image.
To get to the confusion matrix of the first image, I have made this code:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
working_path = os.getcwd() #sirve para establecer en qué carpeta estamos trabajando (ruta), ahora todos los archivos q se encuentren en esa carpeta solo los tenemos que llamar con su nombre
df = pd.read_csv("salida.txt",delimiter="\t") #Hacemos un dataframe, importando el archivo txt separado por tabuladores
df.rename(columns={'Number of Syllables': 'NSyllables'}, inplace = True) #Cambiamos (acortamos) nombres de la columna que indica el nº de sílabas
confusion_matrixV = pd.crosstab(df['TargetV'], df['RespV'], rownames=['Target'], colnames=['Response'], margins = True); #Matriz de confusión para VOCALES
enter the code here
I don't know how I could create the other 2x2 from this dataframe, I have supposed that through a for loop that it would start like this, but I don't know how to do it:
for index, row in confusion_matrixV.iterrows():
First I'm going to create an array like the one you show:
Assuming your crosstab has exactly that structure, for each letter we can define:
Therefore, by vectorizing operations, we can obtain a matrix for each of the previous values. Then just generate a dataframe for each:
Instead of printing, you can put each DataFrame in a list or wherever you want.