I have two DataFrames, the first df_subset has a three variables where RD_GDP is the R&D investment per country as a percentage of GDP
df_subset
country year RD_GDP
0 AUS 1981 0.902542
1 AUS 1984 1.023905
2 AUS 1986 1.179149
3 AUS 1987 1.137756
4 AUS 1988 1.163591
... ... ...
1305 LTU 2014 1.031138
1306 LTU 2015 1.044081
1307 LTU 2016 0.842334
1308 LTU 2017 0.896408
1309 LTU 2018 0.876656
The second DataFrame df_subset_gdp also has three variables where GDP_PC is the GDP per capita by country and year.
country year GDP_PC
0 AUS 2015 47304.816745
1 AUS 2016 50284.172793
2 AUS 2017 51297.139196
3 AUS 2018 53700.680893
4 AUS 2019 54752.242834
.. ... ... ...
265 MAR 2017 7582.794999
266 SGP 2015 86972.563073
267 SGP 2016 89396.724443
268 SGP 2017 94945.250892
269 SGP 2018 101280.413499
I need to join the two DataFrames to be able to make a Scatterplot of the two variables that I mentioned and see the relationship, so to join them I tried the following command
dataset_merge = df_subset.merge(df_subset_gdp, left_on = "year", right_on = "year")
The problem is that I get the following without having sorted GDP_PC by country_x
country_x year RD_GDP country_y GDP_PC
0 AUS 2015 1.875612 AUS 47304.816745
1 AUS 2015 1.875612 AUT 49955.456118
2 AUS 2015 1.875612 BEL 46214.082208
3 AUS 2015 1.875612 CAN 44671.409504
4 AUS 2015 1.875612 CZE 33701.384000
The way to solve your problem is to create a list for left_on and right_on in this way you will be able to join the dataframes according to the country and the year.
The result is as follows