I have made the following program (with your help) that prints in a file all the paths of the files and folders that a given path contains:
from os import walk, getcwd
def ls(ruta = getcwd()):
for root, subdirs, archivos in walk(ruta):
f = open("ficheroderutas.txt","a",encoding="utf8")
for archivo in archivos:
f.write(root+'\\'+archivo+"\n")
for subdir in subdirs:
f.write(root+'\\'+subdir+"\n")
f.close()
return
The problem I have is that when I put: ls(ruta='C:\\')
since I want to save all the paths to files and folders on my hard drive in a file, it takes me several minutes to do this task (generating a txt file of almost 30MB). I was wondering if there is a faster way to run this code or any other. I am not looking to change the code, unless it is to take advantage of features of a library that allows parallelization, something similar to the fork of c... I suppose it will exist. Because this last point is important since I have several cores in my computer. Well, the answer could be very extensive or not so long. Basically the options that sound to me are:
- generate an .exe of my program, since I suppose that if it were written in already compiled machine code to give it to the micro, it would run faster than executing it through the interpreter.
- Translate it c, preferably with a python->c translator (that's what it's called, right?), although I suppose the problem is that automating this type of program rewriting tasks doesn't result in code as efficient as if you rewritten it yourself.
- Use some python library that allows me to parallelize my code.
Does anyone know if there is any other option? I don't know if there is any other option. If the parallelization library exists, what is it called?
OS: WXP SP3
Python 3.4.4
The biggest problem with your code is that you are constantly opening and closing a file, simplifying this already reduces the time a lot (in my case it is reduced from 164 seconds to 7 seconds):
This way you only open the file once. Note that using
with
the file closes automatically when the function ends. Hereturn
is also unnecessary.If you want to optimize something, you have to know what is slowing you down . To think about parallelizing or passing routines to C++ is to kill flies with cannon fire. Come on, I don't recommend you waste your time on that, if you don't know a priori that it's going to solve something for you.
At the very least, keep an eye on the CPU load level while the program is running. Maybe what takes the longest is the hard disk to read the entire disk, and there you will not be able to improve the code, since it is not a question of the processor not having more than itself. By the way, is your hard drive an SSD?
You also have to apply a little "lateral thinking" For example: what is the purpose of the program? monitor file system changes? In that case, use an event monitor.
Since you seem to be interested in the subject, why don't you learn about profiling ? You can start with cProfile, which is very simple.