I am programming a task in Python but when loading the dates from a CSV using Pandas it loads them in an erroneous way, sometimes it changes the day for the month. For example, for these cases they should all be from the month 06, but sometimes it is 06 and other times it exchanges 06 for the day, changing the month, for example row 19 to 23, 27 and 28:
......
19 2017-03-06 01:10:00
20 2017-03-06 01:10:00
21 2017-03-06 17:44:00
22 2017-03-06 17:44:00
23 2017-04-06 04:12:00
24 2017-06-06 04:21:00
25 2017-06-06 04:21:00
26 2017-06-06 15:37:00
27 2017-09-06 18:43:00
28 2017-09-06 18:43:00
29 2017-09-06 21:59:00
....
I load the file like this and create a list of dates:
df = pd.read_csv('PORFA.csv', header=0, sep=';')
Date2=pd.to_datetime(df["FDE Date"]) #Crear lista de Fechas
But when printing "Date2" it throws it as above.
I attach the BBDD that I am reading:
https://drive.google.com/file/d/0B11sJdX_AaJBRllVVjdkenE3RzA/view?usp=sharing
The problem is that by default it tries to parse the date interpreting that the month comes before the day, when it finds an incongruous date it tries to parse the first data as the day. This causes you to not get exceptions.
Your date has the format
dd-mm-aaaa
so you must tell it to try to use the first data as day and not as month. For this use the attributedayfirst
:Departure:
This shouldn't be a problem as long as you're sure your dates are valid and that they're all in the
dd-mm-aaaa
. This is the verbatim notice from the documentation:Edition:
You can properly parse the columns by loading the csv directly.
In
parse_dates
this case, you must pass the names of the columns that must be parsed as a date. If they were non-standard dates, you must use the argument bydate_parser
passing a function that matches the string to a valid date.You can later sort the DataFrame by date using one of these columns:
The problem I had was due to the fact that the dates were in dd/mm/yy format and in the program where I perform the data visualization, by default it takes the first value as month, the second as day, and the third as year. when using the instruction "inplace=True" at the time of uploading the CSV file, I organize it as explained in this post