I've seen programs that either upload files remotely or receive them remotely use base64.
The strangest thing is that I copied the code and modified it, thus removing the base64, and to my amazement it continued to work normally.
That is where my doubt is, what is the implementation of this module called base64, according to what I saw in the documentation, what this module did was code bytes to pass them to plain text, and this was recommended to be done in media where everything was plain text, such as the web in general or emails. (This is how I understood it).
I hope you can help me (theory and practice) to make use of it in my projects if necessary and useful.
The code I was talking about is the following:
elif answer[:8] == "download":
with open(answer[9:],mode = "rb") as file_download:
self.cliente.send(base64.b64encode(file_download.read()))
elif answer[:6] == "upload":
with open(answer[7:],mode = "wb") as file_upload:
datos = self.cliente.recv(1000000)
datos = datos.decode("utf-8")
file_upload.write(base64.b64decode(datos))
elif answer == "screenshot":
try:
self.screenshot()
with open("monitor-1.png",mode = "rb") as screen:
self.cliente.send(base64.b64encode(screen.read()))
#self.cliente.send(screen.read())
os.remove("monitor-1.png")
What I did was remove the base64 inside each write, and it continued to work normally. Data could be sent and received (word documents, EXE'S, etc), in addition to this when I took a remote capture, when I opened it to see how it had turned out, it looked good, complete and without any type of error in the image.
Thank you very much.
Your understanding of the base64 utility is correct. I expand a little more.
Bytes, ASCII and restrictions
As we well know, information is stored and transmitted in bytes. A byte is a group of 8 bits and therefore admits 2 8 different values, which are 256.
Due to the existence of the ASCII standard, several of these 256 values can be interpreted as a printable character (for example, the binary code
01000001
, which in base 10 would be the value 65, represents the letter'A'
). However, since ASCII uses only 7 bits, half of the possible values of a byte would not be valid ASCII codes, and of the half that are, not all of them are printable characters (the first 32 codes, from00000000
to00011111
, are control characters, which do not represent any written symbol).Many internet protocols are text-based (eg HTTP, SMTP) so they expect the content to be transmitted to be ASCII. In many cases they also limit to only printable ASCII, and even within the printables you can further restrict what characters are allowed, eliminating many punctuation marks that may have special meaning within the protocol.
In order to transmit arbitrary binary content, it is necessary to re-encode it to another format that ensures that each transmitted byte can only take certain allowed values, and decode it again when received. Base64 is one possible way to do it (not the only one).
How Base64 works
As its name suggests, Base64 restricts the possible characters to be transmitted to a subset of ASCII consisting of only 64 characters. Specifically, they would be all the uppercase letters of English (26 characters), the lowercase letters (another 26), the digits from 0 to 9 (10 more and that's 62), and the signs
/
and+
. In addition, it also optionally uses the sign=
as padding at the end of the sequence.Since 64 is 2 6 it follows that the "alphabet" used by base64 can be encoded with 6 bits. Therefore the algorithm used is:
/
or+
. For example, if the number is 0, the character will beA
, etc. There is an equivalence table.Thus, each group of 3 bytes in the original gives rise to 4 ASCII characters in the encoding. But these characters will be printable, chosen from the aforementioned subset of 64.
You'll notice that base64 solves one problem (it allows you to send arbitrary bytes in a way that is compatible with text protocols), but it introduces another: the size of the data to send has grown. If they were originally 600 bytes, after passing them to base64 they will be 800. Each group of 3 becomes 4.
Example:
Note that in the first example it would be quite absurd to use base64, since the information (the message "Hello how are you") is already made up of pure ASCII. However, it can be done in order to slightly "obfuscate" the message (it does not provide any cryptographic security, since decoding is trivial, but it prevents someone from "unintentionally" seeing the message).
The second case shows a more real utility, since in this case the byte sequence contains non-printable values (1, 2, 0, 17, 8) and some non-ascii (240, 232), but as you see the resulting encoding is composed exclusively of alphanumeric codes (the
==
one at the end in this case is because the original byte sequence would be missing 2 more bytes to be a multiple of 3).In short, Base64 should be used in contexts where the protocol requires you to transmit data in printable ascii (such as the body of an email, or part of a URL, an HTTP header such as cookies , or images encoded in a
data
tag field<img>
in HTML). If the protocol instead supports binary data (such as the body of a GET response, or a POST request), you don't need to use base64 in that case.your doubt
You raise a curious question:
This is quite rare. Without knowing what you were doing, with what library, and for what protocol, it's hard to explain. If the destination was expecting something base64 encoded and you sent it unencoded, when the destination tries to decode it it will most likely encounter errors, and if not (because your message just happened to have only letters and numbers), decoding would return another message.
The only explanation I can think of is that you are using a library to do the streaming and that this library accepts both a string
str
and a sequence ofbyte
s as a parameter, and that the library itself takes care of encoding them in base64 if you don't give it to it already. encoded. However, if you show exactly the code in question, we can try to solve the puzzle.