I have a concern related to some statistics and programming, it happens that I have a variable called df_trading_pair
which contains a series of prices recorded in the time frame of 30m (30 minutes):
Start Date Open Price High Price Low Price Close Price End Date
0 2022-07-09 22:30:00.000 0.04805 0.04806 0.0476 0.04775 2022-07-09 22:59:59.999
1 2022-07-09 23:00:00.000 0.04773 0.04776 0.0476 0.04764 2022-07-09 23:29:59.999
2 2022-07-09 23:30:00.000 0.04762 0.04776 0.04752 0.04769 2022-07-09 23:59:59.999
3 2022-07-10 00:00:00.000 0.04771 0.04789 0.04687 0.047 2022-07-10 00:29:59.999
4 2022-07-10 00:30:00.000 0.047 0.04708 0.04663 0.04671 2022-07-10 00:59:59.999
5 2022-07-10 01:00:00.000 0.0467 0.04681 0.04648 0.04671 2022-07-10 01:29:59.999
6 2022-07-10 01:30:00.000 0.04671 0.04698 0.04656 0.0467 2022-07-10 01:59:59.999
7 2022-07-10 02:00:00.000 0.04657 0.04671 0.04623 0.04625 2022-07-10 02:29:59.999
8 2022-07-10 02:30:00.000 0.04625 0.04629 0.04603 0.04624 2022-07-10 02:59:59.999
9 2022-07-10 03:00:00.000 0.04622 0.04623 0.04584 0.04591 2022-07-10 03:29:59.999
10 2022-07-10 03:30:00.000 0.04591 0.04677 0.04576 0.04586 2022-07-10 03:59:59.999
11 2022-07-10 04:00:00.000 0.04581 0.0461 0.04549 0.04592 2022-07-10 04:29:59.999
12 2022-07-10 04:30:00.000 0.04595 0.04614 0.04575 0.04596 2022-07-10 04:59:59.999
13 2022-07-10 05:00:00.000 0.04593 0.0461 0.04574 0.04596 2022-07-10 05:29:59.999
14 2022-07-10 05:30:00.000 0.04601 0.04614 0.04582 0.0459 2022-07-10 05:59:59.999
15 2022-07-10 06:00:00.000 0.04589 0.04591 0.04553 0.04571 2022-07-10 06:29:59.999
16 2022-07-10 06:30:00.000 0.04563 0.04586 0.04532 0.04552 2022-07-10 06:59:59.999
17 2022-07-10 07:00:00.000 0.04546 0.04571 0.04543 0.04563 2022-07-10 07:29:59.999
18 2022-07-10 07:30:00.000 0.04559 0.04592 0.04559 0.04579 2022-07-10 07:59:59.999
19 2022-07-10 08:00:00.000 0.04579 0.04611 0.04579 0.04598 2022-07-10 08:29:59.999
20 2022-07-10 08:30:00.000 0.04601 0.0461 0.04572 0.04576 2022-07-10 08:59:59.999
21 2022-07-10 09:00:00.000 0.04576 0.04587 0.04562 0.04572 2022-07-10 09:29:59.999
22 2022-07-10 09:30:00.000 0.04573 0.04589 0.043 0.0456 2022-07-10 09:59:59.999
23 2022-07-10 10:00:00.000 0.04559 0.04584 0.04555 0.04562 2022-07-10 10:29:59.999
24 2022-07-10 10:30:00.000 0.04567 0.04581 0.04552 0.04555 2022-07-10 10:59:59.999
25 2022-07-10 11:00:00.000 0.04561 0.04588 0.04548 0.04549 2022-07-10 11:29:59.999
26 2022-07-10 11:30:00.000 0.0455 0.04564 0.04532 0.04553 2022-07-10 11:59:59.999
27 2022-07-10 12:00:00.000 0.04558 0.04568 0.04549 0.04563 2022-07-10 12:29:59.999
28 2022-07-10 12:30:00.000 0.04563 0.04565 0.04555 0.04557 2022-07-10 12:59:59.999
29 2022-07-10 13:00:00.000 0.04568 0.04578 0.04534 0.04542 2022-07-10 13:29:59.999
30 2022-07-10 13:30:00.000 0.04543 0.04669 0.0447 0.04483 2022-07-10 13:59:59.999
31 2022-07-10 14:00:00.000 0.04481 0.04481 0.04434 0.04448 2022-07-10 14:29:59.999
32 2022-07-10 14:30:00.000 0.04449 0.04453 0.04421 0.04443 2022-07-10 14:59:59.999
33 2022-07-10 15:00:00.000 0.04442 0.0448 0.04436 0.04447 2022-07-10 15:29:59.999
34 2022-07-10 15:30:00.000 0.04441 0.04465 0.04438 0.0446 2022-07-10 15:59:59.999
35 2022-07-10 16:00:00.000 0.04463 0.04464 0.04419 0.04432 2022-07-10 16:29:59.999
36 2022-07-10 16:30:00.000 0.04434 0.04474 0.0443 0.04452 2022-07-10 16:59:59.999
37 2022-07-10 17:00:00.000 0.04452 0.04482 0.04451 0.0448 2022-07-10 17:29:59.999
38 2022-07-10 17:30:00.000 0.0448 0.04499 0.04475 0.04497 2022-07-10 17:59:59.999
39 2022-07-10 18:00:00.000 0.04488 0.045 0.04472 0.04499 2022-07-10 18:29:59.999
40 2022-07-10 18:30:00.000 0.04502 0.04539 0.04483 0.04535 2022-07-10 18:59:59.999
41 2022-07-10 19:00:00.000 0.04532 0.04559 0.04497 0.04509 2022-07-10 19:29:59.999
42 2022-07-10 19:30:00.000 0.0451 0.04531 0.04501 0.04514 2022-07-10 19:59:59.999
43 2022-07-10 20:00:00.000 0.04514 0.04528 0.04508 0.04525 2022-07-10 20:29:59.999
44 2022-07-10 20:30:00.000 0.04515 0.0452 0.04509 0.04517 2022-07-10 20:59:59.999
45 2022-07-10 21:00:00.000 0.04522 0.04546 0.04513 0.04514 2022-07-10 21:29:59.999
46 2022-07-10 21:30:00.000 0.04519 0.04534 0.04513 0.04534 2022-07-10 21:59:59.999
47 2022-07-10 22:00:00.000 0.04534 0.04534 0.04474 0.0449 2022-07-10 22:29:59.999
48 2022-07-10 22:30:00.000 0.0448 0.04507 0.04475 0.04482 2022-07-10 22:59:59.999
49 2022-07-10 23:00:00.000 0.04481 0.04481 0.04443 0.04455 2022-07-10 23:29:59.999
50 2022-07-10 23:30:00.000 0.04451 0.04451 0.04428 0.04442 2022-07-10 23:59:59.999
51 2022-07-11 00:00:00.000 0.04442 0.04442 0.04374 0.04382 2022-07-11 00:29:59.999
52 2022-07-11 00:30:00.000 0.0438 0.04413 0.04368 0.04399 2022-07-11 00:59:59.999
53 2022-07-11 01:00:00.000 0.04395 0.04402 0.04319 0.04342 2022-07-11 01:29:59.999
54 2022-07-11 01:30:00.000 0.04343 0.04348 0.04323 0.04329 2022-07-11 01:59:59.999
55 2022-07-11 02:00:00.000 0.04333 0.04338 0.04314 0.04331 2022-07-11 02:29:59.999
56 2022-07-11 02:30:00.000 0.04332 0.04332 0.04312 0.04318 2022-07-11 02:59:59.999
57 2022-07-11 03:00:00.000 0.04317 0.04337 0.04309 0.04322 2022-07-11 03:29:59.999
58 2022-07-11 03:30:00.000 0.04322 0.04327 0.04299 0.04307 2022-07-11 03:59:59.999
59 2022-07-11 04:00:00.000 0.04305 0.04313 0.04273 0.0428 2022-07-11 04:29:59.999
Once said series of prices is graphed using the following lines of code:
import finplot as fplt
fplt.candlestick_ochl(df_trading_pair[['Start Date', 'Open Price', 'Close Price', 'High Price', 'Low Price']]).colors.update(dict(bull_frame='#26a69a', bull_body='#26a69a', bull_shadow='#26a69a'))
fplt.show()
Its corresponding Japanese candlestick chart is obtained:
So, looking closely at this graph, it becomes apparent that there is an outlier, which it is, 0.043
and it can be called when executing df_trading_pair['Low Price'].iat[22]
in this case.
My question is, how could I identify that with a few lines of code?
I was thinking of considering the following formula:
In 2]:
(df_trading_pair["Close Price"]-df_trading_pair["Low Price"])/df_trading_pair["Low Price"]*100
Which when executed returns the following output:
0 0.315126
1 0.084034
2 0.357744
3 0.277363
4 0.171563
5 0.494836
6 0.300687
7 0.043262
8 0.456224
9 0.152705
10 0.218531
11 0.945263
12 0.459016
13 0.480979
14 0.174596
15 0.395344
16 0.441306
17 0.440238
18 0.438693
19 0.414938
20 0.087489
21 0.219202
22 6.046512
23 0.153677
24 0.065905
25 0.021988
26 0.463372
27 0.307760
28 0.043908
29 0.176445
30 0.290828
31 0.315742
32 0.497625
33 0.247971
34 0.495719
35 0.294184
36 0.496614
37 0.651539
38 0.491620
39 0.603757
40 1.159938
41 0.266845
42 0.288825
43 0.377107
44 0.177423
45 0.022158
46 0.465322
47 0.357622
48 0.156425
49 0.270088
50 0.316170
51 0.182899
52 0.709707
53 0.532531
54 0.138793
55 0.394066
56 0.139147
57 0.301694
58 0.186090
59 0.163819
dtype: float64
Based on that information, I infer that the above values should move between 0.021987
a 1.159938
and which 6.046512
is the outlier, but I'm not a statistician and I don't know how I should lay out the logic of this in python code, can I get some help here?
Thanks to @HeytalePazguato I was able to understand and create a solution adjusted to my case, first the function is defined
outlier
(Taken from here )Then a sub dataframe called is created
df_open_low_price_range
that will save the data series that you want to analyze and check if it has outliers or not ( I realized that for my particular case this is the real formula to use to obtain the outlier ):Then the function is executed
outlier
and its tuple is stored in the variablethe_outlier
:Finally, two conditionals are added to conclude the task: