What is a promise in Javascript?

Question

Asked: 2020-05-03 04:06:25 +0800 CST 2020-05-03 04:06:25 +0800 CST 2020-05-03 04:06:25 +0800 CST

Get only part of a string in Bash

772

I have to do a report on the schemas and tables that Amazon Redshiftconsiders to have bad statistics. There is a process that runs every weekend that is in charge of applying the corresponding operations on that, but I need to export the name of the schemas and tables in a .csv.

The thing is that this process generates a report, among whose lines the ones that interest me look like this:

-- 2019-04-28 07:05:06.538818 [73589] [73589] Running 200 out of 214 commands: analyze schema_owner."nombre_tabla_mala"

I am collecting the lines that meet this pattern in the following way:

while read linea
do
    SCHEMA="schema_owner"
    FILTRO="commands: analyze $SCHEMA"
    if [[ $linea =~ $FILTRO ]]
    then
      ...Codigo que falta...
    fi
done < /ruta_del_fichero_log

The problem is that I obviously capture the entire line and I only need to store the part ofschema_owner."nombre_tabla_mala"

How could I discard the rest of the chain?

I put the first twenty lines of the log file in question:

-- 2019-04-28 05:54:53.830738 [73589] [73589] Running 1 out of 1 commands: set wlm_query_slot_count = 4
-- 2019-04-28 05:54:53.833469 [73589] Success.
-- 2019-04-28 05:54:53.833531 [73589] [73589] Running 1 out of 1 commands: set statement_timeout = '36000000'
-- 2019-04-28 05:54:53.836162 [73589] Success.
-- 2019-04-28 05:54:53.836190 [73589] [73589] Running 1 out of 1 commands: set application_name to 'AnalyzeVacuumUtility-v.9.1.6'
-- 2019-04-28 05:54:53.838700 [73589] Success.
-- 2019-04-28 05:54:53.838788 [73589] Extracting Candidate Tables for Vacuum...
-- 2019-04-28 05:55:57.850685 [73589] Found 0 Tables requiring Vacuum and flagged by alert
-- 2019-04-28 05:55:57.850795 [73589] Extracting Candidate Tables for Vacuum ...
-- 2019-04-28 05:56:34.908067 [73589] Found 107 Tables requiring Vacuum due to stale statistics
-- 2019-04-28 05:56:34.908263 [73589] [73589] Running 1 out of 214 commands: vacuum FULL schema_owner."t_ed_p" ; /* Size : 120 MB,  Unsorted_pct : N/A */ ;
-- 2019-04-28 05:56:47.588342 [73589] Success.
-- 2019-04-28 05:56:47.588401 [73589] [73589] Running 2 out of 214 commands: analyze schema_owner."t_ed_p"
-- 2019-04-28 05:56:50.363655 [73589] Success.
-- 2019-04-28 05:56:50.363711 [73589] [73589] Running 3 out of 214 commands: vacuum FULL schema_owner."t_ed_p_estados" ; /* Size : 120 MB,  Unsorted_pct : N/A */ ;
-- 2019-04-28 05:57:03.430064 [73589] Success.
-- 2019-04-28 05:57:03.430124 [73589] [73589] Running 4 out of 214 commands: analyze schema_owner."t_ed_p_estados"
-- 2019-04-28 05:57:06.024933 [73589] Success.
-- 2019-04-28 05:57:06.025023 [73589] [73589] Running 5 out of 214 commands: vacuum FULL schema_owner."t_ed_p_tps_actividad" ; /* Size : 120 MB,  Unsorted_pct : N/A */ ;
-- 2019-04-28 05:57:06.024933 [73589] Success.

In the end, what I need to obtain is the schema and the table. That is, of those that appear in this example, you would need to send the following to the .csv file:

schema_owner."t_ed_p"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_tps_actividad"

3 Answers

Voted

fedorqui · Answer 1 · 2020-05-03T05:09:25+08:00

It looks like it's about taking the string schema_owner.+ " cosas ". So, let's leave the task a greptogether with -oso that it only shows the match:

$ grep -o 'schema_owner\."[^"]*"' fichero.log
schema_owner."t_ed_p"
schema_owner."t_ed_p"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_tps_actividad"

schema_owner\."[^"]*"says: "the text schema_ownerfollowed by a period (escaped because it doesn't .match any character) and followed by a string enclosed in double quotes.

I notice that there are duplicate entries. If you want to remove them, you can pass the result to sort -uso that it only shows one entry of each:

$ grep -o 'schema_owner."[^"]*"' fichero.log | sort -u
schema_owner."t_ed_p"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_tps_actividad"

gustavovelascoh · Answer 2 · 2020-05-03T05:03:12+08:00

With the -o option and this regular expression you can output only the part of your interest:

grep -oP '[a-zA-Z_]+\."[a-zA-Z_]+"'

the option -oreturns only the part corresponding to the regular expression. There I assume that the table name is enclosed in quotes, and the schema contains only alphabetic characters and an underscore (_).

For example, taking the line you provide:

$ echo '-- 2019-04-28 07:05:06.538818 [73589] [73589] Running 200 out of 214 commands: analyze schema_owner."nombre_tabla_mala"' | grep -oP '[a-zA-Z_]+\."[a-zA-Z_]+"'
schema_owner."nombre_tabla_mala"

And if you have many repeated records you can use the command sort | uniq. You could also count them and order them from the most frequent to the least frequent with uniq -c | sort -rn:

grep -oP '[a-zA-Z_]+\."[a-zA-Z_]+"' log.csv| sort | uniq -c | sort -rn
  2 schema_owner."t_ed_p_estados"
  2 schema_owner."t_ed_p"
  1 schema_owner."t_ed_p_tps_actividad"

Cuauhtli · Answer 3 · 2020-05-06T02:31:04+08:00

A little late but with different options.

with awk

$ awk 'match($0,/schema_owner\.".*"/, gr){
    un[gr[0]]++
}END{for (i in un) print i}' fichero.log

What I do here is to capture with matcheach line ( $0) the mentioned regular expression and then I assign the elements found to the array gr. Then, for each line that enters and each group found, I fill the array unwith the keys found and increment their values by 1. This step is just to take advantage of the unique nature of array keys, values don't matter to me. That is, by filling the array with any value, its keys will always be different. Then, and at the end of the script, I iterate over the values of this array and print its values.

Variation of the previous answers

$ grep -o 'schema_owner\.".*"' fichero.log | awk '!a[$0]++'

Here is the usual regular expression mentioned in other answers in the same use of grep, the difference is that, to show only the unique characters, I use in awkthe condition of only printing when the value corresponding to that key is NOT greater than 0, that is , when the lines are unique.

with pearl

$ perl -ne '/schema_owner\.".*"/ && $un{$&}++; END{
    print "$_\n" for keys %un
}'  fichero.log

This option is similar, I look for the desired pattern and then everything that is matched (with $&) is assigned to the hash $un, which by its nature, is of unique keys, so there will be no duplicate keys. At the end of the script, I print the hash keys un.

In all cases something of the form results.

schema_owner."t_ed_p_estados"
schema_owner."t_ed_p"
schema_owner."t_ed_p_tps_actividad"

The advantage of using only one program (in the case of awkor perl), is that it is much faster, you take up less processing. Because if there were a lot of lines, hundreds of thousands, millions of logs , they would go through grep, then the matches would go through sorteach one, then these ordered lines would go through uniq, etc, etc. And each of these programs is creating processes, opening file descriptors, closing file descriptors, and so on.

Get only part of a string in Bash

with awk

Variation of the previous answers

with pearl

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?