I'm looking for a certain string in a fairly large file:
$ ls -lh archivo.csv
-rw-rw-rw- 1 yo yo 723M Dec 10 10:46 archivo.csv
If I use grep
I find that the result does not appear, but only the indication that there is some in the file:
$ grep "12345" archivo.csv
Binary file archivo.csv matches
So looking at the type of file in question, I see that it is...
$ file archivo.csv
archivo.csv: ISO-8859 text, with very long lines, with CRLF line terminators
I have converted it to UNIX with the command dos2unix
:
$ dos2unix archivo.csv
dos2unix: converting file archivo.csv to Unix format...
But the problem still appears:
$ grep "12345" archivo.csv
Binary file archivo.csv matches
I've noticed later that you grep
have an option to search in binaries, the -a
:
$ grep -a "12345" archivo.csv
12345 esto es un test
Well man grep
, it indicates that:
-a, --text
Process a binary file as if it were text;
this is equivalent to the --binary-files=text option.
But still I'm wondering, how can I convert this binary file to ASCII?
Actually, all files are binary (obviously), but when we give such binary encoding an X interpretation, then we say that it has encoding X (or is encoded in X).
In your case, the file is not binary , it has the ISO-8859 encoding and therefore you must use tools that know how to work (understand) said encoding.
The parameter forces it
-a
togrep
ignore certain codes that are not interpreted as an ASCII string (eg the\x0
).Thus, in your case, you should convert said file to another more suitable for your tools, for which, logically, there are many tools, but the one I like the most is iconv , which in your case would be something like (from the same ref)
(NOTE: instead of utf you could convert it to ASCII as you ask, but then you may lose information in the original file such as the symbol
§
).For example, taking this file we have
Which, as we can see, allows you to view and filter correctly without losing information.
Lastly, using
dos2unix
doesn't work for you in this case, because the command requires files to be plain text, and your file doesn't have that encoding (see dos2unix ).What you are asking is a bit broad, since for me it would depend on what the file contains and what I want to extract from it. If the way it
grep
's converting the binary file to text works for you and you don't care about more information from the file than it gives you, you can just use: