Hi, I'm new to Python and I can't fix this error, let me explain;
I have this code:
import csv
import pandas as pd
import numpy as np
encabezado=['RFC','EMP','COMPROBANTE','TIPO','CPT','IMPORTE','ANIO','QNA','PTDA','C1','C2','PRDNAME'
,'COL','C3']
file=pd.read_csv('TRA.csv',low_memory=False,sep=",",names=encabezado)
df = pd.DataFrame(file)
conceptos= df['TIPO'].map(str)+df['CPT'].map(str)
df.loc[:,'COLUMNAS'] = conceptos;
print(df)
df.to_csv('TRA1321_2.csv',sep=',')
My header variable contains the name of the columns of my DataFrame, columns that I later try to concatenate and write in a new file, the detail is that it does not respect the zeros that some data have at the beginning and even that, converts to float the integer values, I show them:
My original csv file:
SARS751009J27,2000003369,701457548,1,37, 6299.99,2021,13,TP,,,PRDE130,000001,
SARS751009J27,2000003369,701457548,2,01, 1430.7,2021,13,TP,,,PRDE130,000001,
OEGC8105169P5,2000503934,701457549,1,30, 558.4,2021,13,BR,,,PRDE130,000002,
OEGC8105169P5,2000503934,701457549,2,01, 119.26,2021,13,00,,,PRDE130,000002,
The file that this script generates:
,RFC,EMP,COMPROBANTE,TIPO,CPT,IMPORTE,ANIO,QNA,PTDA,C1,C2,PRDNAME,COL,C3,COLUMNAS
0,SARS751009J27,2000003369.0,701457548.0,1.0,37,6299.99,2021.0,13.0,TP,,,PRDE130,1.0,,1.037
1,SARS751009J27,2000003369.0,701457548.0,2.0,01,1430.7,2021.0,13.0,TP,,,PRDE130,1.0,,2.001
2,OEGC8105169P5,2000503934.0,701457549.0,1.0,30,558.4,2021.0,13.0,BR,,,PRDE130,2.0,,1.030
3,OEGC8105169P5,2000503934.0,701457549.0,2.0,01,119.26,2021.0,13.0,00,,,PRDE130,2.0,,2.001
Plus it numbers my columns and loops through everything.
Can you help me to solve it? I can't find how to do it, thanks.
I solved my problem, the detail was that I did not control the empty or NaN values, so somehow the values were changed.
I realized this because I removed the empty columns manually and the file was uploaded correctly.
So I read the pandas documentation and found the following which I roughly translate here:
keep_default_nabool, true default Whether or not to include default NaN values when parsing the data. Depending on whether na_values is passed, the behavior is as follows:
If keep_default_na is True and na_values is specified, na_values is appended to the default NaN values used for analysis.
If keep_default_na is True and no na_values are specified, only the default NaN values are used for analysis.
If keep_default_na is False and na_values are specified, only the specified NaN values na_values are used for analysis.
If keep_default_na is False and no na_values are specified, no string will be parsed as NaN.
Here I leave the link so you can give it a read if you like:
pandas.read_csv
So all I did was modify and/or add this line to take NaNs into account when parsing the data: