What is a promise in Javascript?

Question

Asked: 2020-05-03 04:06:25 +0800 CST 2020-05-03 04:06:25 +0800 CST 2020-05-03 04:06:25 +0800 CST

在 Bash 中仅获取字符串的一部分

772

我必须对 AmazonRedshift认为统计数据错误的模式和表进行报告。有一个每个周末都会运行的进程负责对其应用相应的操作，但我需要将模式和表的名称导出到.csv.

问题是这个过程会生成一个报告，其中我感兴趣的行如下所示：

-- 2019-04-28 07:05:06.538818 [73589] [73589] Running 200 out of 214 commands: analyze schema_owner."nombre_tabla_mala"

我正在通过以下方式收集符合此模式的行：

while read linea
do
    SCHEMA="schema_owner"
    FILTRO="commands: analyze $SCHEMA"
    if [[ $linea =~ $FILTRO ]]
    then
      ...Codigo que falta...
    fi
done < /ruta_del_fichero_log

问题是我显然捕获了整条线，我只需要存储部分schema_owner."nombre_tabla_mala"

我怎么能丢弃链的其余部分？

我把日志文件的前二十行放在了有问题的地方：

-- 2019-04-28 05:54:53.830738 [73589] [73589] Running 1 out of 1 commands: set wlm_query_slot_count = 4
-- 2019-04-28 05:54:53.833469 [73589] Success.
-- 2019-04-28 05:54:53.833531 [73589] [73589] Running 1 out of 1 commands: set statement_timeout = '36000000'
-- 2019-04-28 05:54:53.836162 [73589] Success.
-- 2019-04-28 05:54:53.836190 [73589] [73589] Running 1 out of 1 commands: set application_name to 'AnalyzeVacuumUtility-v.9.1.6'
-- 2019-04-28 05:54:53.838700 [73589] Success.
-- 2019-04-28 05:54:53.838788 [73589] Extracting Candidate Tables for Vacuum...
-- 2019-04-28 05:55:57.850685 [73589] Found 0 Tables requiring Vacuum and flagged by alert
-- 2019-04-28 05:55:57.850795 [73589] Extracting Candidate Tables for Vacuum ...
-- 2019-04-28 05:56:34.908067 [73589] Found 107 Tables requiring Vacuum due to stale statistics
-- 2019-04-28 05:56:34.908263 [73589] [73589] Running 1 out of 214 commands: vacuum FULL schema_owner."t_ed_p" ; /* Size : 120 MB,  Unsorted_pct : N/A */ ;
-- 2019-04-28 05:56:47.588342 [73589] Success.
-- 2019-04-28 05:56:47.588401 [73589] [73589] Running 2 out of 214 commands: analyze schema_owner."t_ed_p"
-- 2019-04-28 05:56:50.363655 [73589] Success.
-- 2019-04-28 05:56:50.363711 [73589] [73589] Running 3 out of 214 commands: vacuum FULL schema_owner."t_ed_p_estados" ; /* Size : 120 MB,  Unsorted_pct : N/A */ ;
-- 2019-04-28 05:57:03.430064 [73589] Success.
-- 2019-04-28 05:57:03.430124 [73589] [73589] Running 4 out of 214 commands: analyze schema_owner."t_ed_p_estados"
-- 2019-04-28 05:57:06.024933 [73589] Success.
-- 2019-04-28 05:57:06.025023 [73589] [73589] Running 5 out of 214 commands: vacuum FULL schema_owner."t_ed_p_tps_actividad" ; /* Size : 120 MB,  Unsorted_pct : N/A */ ;
-- 2019-04-28 05:57:06.024933 [73589] Success.

最后，我需要获取的是模式和表。也就是说，在本示例中出现的那些中，您需要将以下内容发送到 .csv 文件：

schema_owner."t_ed_p"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_tps_actividad"

3 Answers

Voted

fedorqui · Answer 1 · 2020-05-03T05:09:25+08:00

看起来它是关于获取字符串schema_owner.+ " cosas "。所以，让我们把任务 agrep放在一起-o，让它只显示匹配：

$ grep -o 'schema_owner\."[^"]*"' fichero.log
schema_owner."t_ed_p"
schema_owner."t_ed_p"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_tps_actividad"

schema_owner\."[^"]*"说：“文本schema_owner后跟一个句点（因为它不.匹配任何字符而被转义），然后是一个用双引号括起来的字符串。

我注意到有重复的条目。如果要删除它们，可以将结果传递给，sort -u以便它只显示每个条目：

$ grep -o 'schema_owner."[^"]*"' fichero.log | sort -u
schema_owner."t_ed_p"
schema_owner."t_ed_p_estados"
schema_owner."t_ed_p_tps_actividad"

gustavovelascoh · Answer 2 · 2020-05-03T05:03:12+08:00

使用 -o 选项和这个正则表达式，您可以只输出您感兴趣的部分：

grep -oP '[a-zA-Z_]+\."[a-zA-Z_]+"'

该选项-o仅返回与正则表达式对应的部分。我假设表名用引号引起来，并且模式只包含字母字符和下划线 (_)。

例如，采用您提供的线路：

$ echo '-- 2019-04-28 07:05:06.538818 [73589] [73589] Running 200 out of 214 commands: analyze schema_owner."nombre_tabla_mala"' | grep -oP '[a-zA-Z_]+\."[a-zA-Z_]+"'
schema_owner."nombre_tabla_mala"

如果你有很多重复的记录，你可以使用命令sort | uniq。您还可以计算它们并将它们从最频繁到最不频繁排序uniq -c | sort -rn：

grep -oP '[a-zA-Z_]+\."[a-zA-Z_]+"' log.csv| sort | uniq -c | sort -rn
  2 schema_owner."t_ed_p_estados"
  2 schema_owner."t_ed_p"
  1 schema_owner."t_ed_p_tps_actividad"

Cuauhtli · Answer 3 · 2020-05-06T02:31:04+08:00

有点晚，但有不同的选择。

用 awk

$ awk 'match($0,/schema_owner\.".*"/, gr){
    un[gr[0]]++
}END{for (i in un) print i}' fichero.log

我在这里所做的是用match每一行 ( $0) 捕获提到的正则表达式，然后将找到的元素分配给数组gr。然后，对于进入的每一行和找到的每一组，我un用找到的键填充数组，并将它们的值加1。这一步只是为了利用数组键的唯一性，值不要对我来说很重要。也就是说，通过用任何值填充数组，它的键总是不同的。然后，在脚本的最后，我遍历这个数组的值并打印它的值。

先前答案的变化

$ grep -o 'schema_owner\.".*"' fichero.log | awk '!a[$0]++'

这是其他答案中提到的常用正则表达式，在相同的用法中grep，不同之处在于，为了仅显示唯一字符，我在awk该键对应的值不大于 0 时仅打印的条件下使用，即是，当线条是唯一的。

带珍珠

$ perl -ne '/schema_owner\.".*"/ && $un{$&}++; END{
    print "$_\n" for keys %un
}'  fichero.log

此选项类似，我查找所需的模式，然后将匹配的所有内容（与$&）分配给 hash $un，其本质上是唯一键，因此不会有重复键。在脚本的最后，我打印了 hash keys un。

在所有情况下，都会产生某种形式的结果。

schema_owner."t_ed_p_estados"
schema_owner."t_ed_p"
schema_owner."t_ed_p_tps_actividad"

仅使用一个程序（在awkor的情况下perl）的优点是速度更快，占用的处理更少。因为如果有很多行，数十万，数百万条日志，它们会经过grep，然后匹配项会经过sort每一行，然后这些有序的行会经过uniq，等等。这些程序中的每一个都在创建进程、打开文件描述符、关闭文件描述符等等。

在 Bash 中仅获取字符串的一部分

用 awk

先前答案的变化

带珍珠

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?