I need to capture the following response, skipping the title and --- ; in addition, the EMPTY , NULL , and TEXT fields :
Slot Type PEC Primary Secondary
---- ---- ---------- ------- ---------
1 DMC OOS Mtce Unequipped
2 DMC NTBN19CA IS
3 DMC NULL
4 DMC NTBN19EA IS
5 DMC NTBN19CA IS
6 DMC NTBN19CA IS
7 DMC NTBN19CA IS
8 DMC NTBN19CA IS
9 DMC NTBN19CA IS
10 DMC NTBN19CA IS
11 DMC NTBN19CA IS
12 DMC NTBN19EA IS
13 DMC NTBN19CA IS
14 DMC NTBN19CA IS
15 MSC NTBN20BA IS
-- SM NTBN21AB IS
-- SM NULL NULL
-- SM NULL NULL
-- SM NULL NULL
My perl script : _
for my $line (split "\r", $pmatch)
{
$line =~ s/\s*\n$//;
if ($line =~ /More/) {
$t->print(''); # enter
next PAGE;
}
elsif ($line =~ /^\s+[\d-]/) {
my (undef, $slot, $type, $pec, $primary, $secondary) = split /\s+/, $line, 6;
push @$vplevels, {
slot => $slot
, type => $type
, pec => $pec
, primary => $primary
, secondary => $secondary
, opc => $opc };
}
}
return $vplevels;
I have a split()
piece of data that sometimes comes with the complete 5 columns and other times it doesn't, so the value of PEC takes that of Primary . The same thing happens in the other columns. Therefore, I can't find how to validate if it comes empty, to keep it, and the other values all in their corresponding fields.
Yes, the problem of using split() comes when the fields can be empty and then they get confused by the space separators of fields.
The solution, for this type of problem, is to use the --- below the title as a guide to know where the fields begin and end. You save those columns in an array, and then, for each line, you extract the fields with a simple substr().
Also, that solution will work for you even if the format of the output changes, both in the number of fields and in the width of each column.
To extract the positions of the hyphens you can use a regular expression in a while() loop, and with the pos() function, or better, with the content of the @- and @+ variables , you can find out the position of each column.
Another more powerful option is to use the unpack() command. Here is an example: http://perlenespanol.com/foro/al-descargar-replacer-espacios-vacios-por-nan-t4470.html
Suppose the output of the program comes in a file.
We make functions that we need:
archivo_hacia_texto
, in this case we choose the nametabla_texto.txt
.Content of table_text.txt :
The function
capturar_posiciones
converts the line of dashes to an array of positions.The function
capturar_campos
will be in charge of adding each cell of the table to a matrix array, removing the spaces on the sides. Example:NTBN19CA
.3.2. Note : Due to the complexity of programming in Perl , it will only be a one-dimensional array and not a two-dimensional array.
What the function
unificar_pendientes
will do is combine all the lines that are before the dash line.4.2. In this case, the combination of cells is considered vertically:
cabe_segu
cera_nda
.The third column will be
primera_cabecera
.What the function
obtener_matriz_campos
does is add each cell to the array. For this, the text is divided into lines, and from there, in each line,capturar_campos
y is usedcapturar_posiciones
to obtain each cell. at the end is usedunificar_pendientes
to add the header cells to the end of the array.5.2 Note: The first element of the array is added a value that represents the number of columns.
The function
leer_matriz
reads the array returned byobtener_matriz_campos
and displays the result of fetching the array on the console.The following is the dashed line:
We look at the positions of the hyphens that come after spaces... In case of reaching the end we add -1 . If it is -1 , then it will cut to the end of the line:
The array of positions remains
[0,6,12,24,33,-1]
.The header of the table will be added to the end of the array, while everything else is added to the beginning. This is because it is not initially known how many columns the table has. The number of columns is known after parsing the dashed line.
The following is the table header:
Both the header and the other lines will be cut into columns using the position array, and the spaces on the sides will be removed.
Keywords are extracted from each header cell and converted to lower case. If there are spaces, it is joined with an underscore, adopting the form of variables of the viborita notation , such as:
la_notacion_viborita
.Note: This is only the case for the header, not for the other lines.
How to use these functions (to analyze other tables, you just have to try with another txt that has the information):
Full code in Perl :
Console output: