If I have different dataframes in Pandas
, how can I join them?
For example I create three DataFrame
:
import pandas as pd
import numpy as np
df_1 = pd.DataFrame({"fruta": ["manzana", "pera", "platano", "naranja", "aguacate"],
"precio": [0.20, 0.45, 0.15, 0.12, 0.62]})
df_2 = pd.DataFrame({"stock": [10, 20, 25, 12, 40]})
df_3 = pd.DataFrame({"ventas_totales":[3, 5, 2, 3, 6],
"ingresos_ventas": [120, 110, 64,44, 147]})
Is there a way to concatenate (join) them in Pandas
, or is it impossible and I should use a loop for
?
And if they have an Identifier (ID) , is it possible to join them based on a column as in SQL
?
Pandas has multiple ways to join dataframes, depending on what you want to do, one or the other will be better for you. I will now explain the two main ways and their results using the example in the question.
concatenate
If you want to join different DataFrames and they all have the same order (that is, if the data of row 1 of DataFrame 1 corresponds to that of DataFrame 2 and DataFrame 3), it would be done like this:
Departure:
With
axix=1
we indicate that we want to join it by rows, if we putaxis=0
it would be joined by columns.Advantage:
.append()
Disadvantages:
A synonym for this form would be:
The result is the same, although as you can see it is more tedious to write
Use merge by ID
In this case we are going to suppose that we have a column with IDs that identify each row, and that they are not ordered, that is, the ID of row 1 in DataFrame 1, we can find it in row "X" of DataFrame 2. I put the example code:
In this case we cannot use
pandas.concat()
. To join the dataframes we can use the method.merge()
that is exactly the same as the functionpandas.merge()
, and it will allow us to select the column by which we want to join, and the way of joining:Departure:
With the parameter
on=
we indicate the name of the column by which we want to join, and with the parameterhow=
the type of union that works the same as inSQL
.inner
Pandas supports the ,left
,right
outer` union types.In the event that we want to make
merge()
more than two DataFrames, we only have to chain methods:Departure:
Emphasize that with
merge()
we can make different types of unions, in the style ofSQL
These methods are tremendously useful if you know how to use them well in
Pandas
. Here is the official pandas documentation, in case you want to expand further and see different parameters: