The Python package called Patsy helps to define a formula with dependent and independent variable. The variable defined to the left of '~'
is the dependent variable, and the variables defined to the right of it are the independent variables. Variables enclosed within C() are treated as categorical variables. From the DataFrame "titanic_3.csv"
, I need to create two sets: training set and testing set. This DataFrame is accessible in the Files folder of this link
When executing the script
import os
import pandas as pd
from patsy import dmatrices
# Change directory
os.chdir("G:\Py_machine_learning\Ficheros")
df = pd.read_csv("titanic_3.csv")
# Aplicar axis as 1 para eliminar las columnas con las siguientes etiquetas
df = df.drop(["ticket","cabin","name", "Unnamed: 14", "Unnamed: 15", "Unnamed: 16",
"Unnamed: 17", "Unnamed: 18", "Unnamed: 19"], axis=1)
# Eliminamos valores Na
df = df.dropna()
from patsy import dmatrices
formula = "survived ~ C(pclass) + C(sex) + age + sibsp + C(embarked) + parch"
# Crea un diccionario de resultados para mantener nuestros resultados de regresión
# para un fácil análisis posterior
df_train = df.iloc[ 0: 600, : ]
df_test = df.iloc[ 600: , : ]
# Divide los datos en variables dependientes e independientes.
# Crea los conjuntos de datos de entrenamiento y prueba
y_train,x_train = dmatrices(formula, data=df_train, return_type = 'dataframe')
y_test,x_test = dmatrices(formula, data=df_test, return_type = 'dataframe')
it returns me the error:
ValueError: negative dimensions are not allowed What is the cause of this error?
After doing the
df.dropna()
resulting dataframe only has 142 rows in this example, so when trying to select from row 600 onwardsdf_test
you end up with an empty dataframe, hence the error.It is true that the message is not intuitive, by mentioning a negative dimension that is not seen in your code, but this happens because internally the patsy tries to create a unit matrix (ones on the diagonal and zeros outside) of dimensions
len(levels)-1
, callingnp.eye()
, and since itlevels
's zero in this case, it ends up causing a negative dimension that causes the exception innp.eye()
.