I am finishing an exercise, and I need to know what type each column of a CSV is (Integer, decimal, date...) through bash.
This is the part that displays the data type, but it doesn't work properly. Also it only shows if it is char
or number
. Is there any other way to simplify it?
cat dataset_cars.csv | sed -rn '1p;2s/([^,][[:alpha:]]+[^,])+/(char)/g;2s/([^,][[:digit:]]+[^,])+/(num)/gp'
When I run it, it returns the following, it looks like it doesn't do it really well (it adds zeros, and parentheses where it shouldn't):
,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition
0,(num),(char),(char),(num),(char),(num)0,(char), (char(num),(num),(char),(char),(num)char)
And the output I expect should be something like this:
Test data:
Columna1, Columna2, Columna3
prueba, 123, 123.89
Departure:
Columna1, Columna2, Columna3
texto, numerico, decimal
Despite what I put in the comments, you could use this regex to
sed
:Let's spread it out a bit so it doesn't look so stuck and cryptic:
Where I assume the file
archivo.txt
has this content type:And the output gives something like:
In short, the first thing we do with the command is print the first line. Then I go looking for patterns in the second line (which according to my understanding and without checking) where each value is shown to be a string, decimal, or integer.
In the end, I just remove whitespace from our result that isn't needed for type checking; only for the presentation of the result.
Clearly this is a weak way of approaching it, since we do not take into account separators or characters that define a text, such as single or double quotes. But as long as it serves, you can take it.
Perhaps my answer could be of more help to you with
awk
: scrpiting - show the data type of each column