What is a promise in Javascript?

Question

antparleo

Asked: 2020-11-24 00:20:36 +0800 CST 2020-11-24 00:20:36 +0800 CST 2020-11-24 00:20:36 +0800 CST

How can I read a text string, reading its characters three at a time?

772

How can you read a string of text, reading its characters three at a time?

Specifically, I am looking to read the trinucleotides of a DNA sequence, and be able to count how many there are.

Let be a string:

AAGACAGAGTAGACAAGTACAGTAGACAGATGACGGGTAGCAT

I would like to split it into

AAG
ACA
GAG
...

But the problem is that I don't have a delimiter to use the command cut.

How could I fix it?

The programming language I use is Bash.

I have tried the following:

#!/bin/bash
a=$(cat $1|wc -m)
b=$(cat $1)
for ((i=0;i<$a;i=i+3));do
        echo ${b:i:(i+3)}
done

But it prints me from three to the end of the entire string, not three by three. The argument $1is a file containing the DNA sequence.

3 Answers

Voted

OscarGarcia · Answer 1 · 2020-11-24T02:09:48+08:00

The third parameter of the expansion is the length of the clip and not the end position of the clip. You can consult it in the help with man bash:

${parameter:offset:length}

SubstringExpansion. Expands to up to lengthcharacters of parameterstarting at the character specified by offset. If lengthis omitted, expands to the substring of parameterstarting at the character specified by offset. lengthand offsetare arithmetic expressions (see ARITHMETIC EVALUATION below).

In Spanish:

${parámetro:inicio:longitud}

Substring expansion. Expands up to longitudcharacters in the string parámetrostarting with the character whose position is specified by inicio. If omitted longitud, expands the substring from parámetrobeginning at the character specified by inicioto the end. longitude inicioare arithmetic expressions (see ARITHMETIC EVALUATION below).

So your code would be:

#!/bin/bash
a=$(cat $1|wc -m)
b=$(cat $1)
for ((i=0;i<$a;i=i+3));do
   echo ${b:i:3}
done

If what you want is to count the trinucleotides, it might be enough to calculate ($a + 2) / 3:

#!/bin/bash
b=$(<"$1")
a=${#b}
echo "Número de trinucleótidos: $(((a + 2) / 3))"
for ((i=0; i<$a; i=i+3))
do
    echo ${b:i:3}
done

I want to highlight the use of $(<"$1")to load the contents of a file into a variable (in quotes to support white space files) and ${#b}to get the length of a variable.

If you are only interested in the calculation of the number and at no time are you interested in displaying its content, then it is better to do:

#!/bin/bash
NUM=$(wc -c < "$1")
echo "Número de trinucleótidos: $(((NUM + 2) / 3))"

Or, in a reduced way, $((( $(wc -c < "$1") + 2) / 3)).

Note the use of wc -c < [archivo]to prevent the file name from appearing along with the count result.

Note that it wc -mis much slower than wc -cvery large files because it does not require reading its content to count multibyte characters. A letter ñis a character, but it takes up two bytes in a UTF-8 encoded file.

Also, keep in mind that both ( -mand -c) would count all line feeds ( \n) if any.

fedorqui · Answer 2 · 2020-11-24T03:54:28+08:00

Use grepit to trim the chain into blocks of three:

$ echo "123456789" | grep -o '...'
123
456
789

Since .it matches any character, the regular expression ...matches three characters. Using the signal we get each result -oto grepbe displayed on a different line, so you can then do whatever you want with it: count lines, add...

You can also even say the following, using process substitution to pretend to read a file line by line :

while IFS= read -r tri; do
    echo "--- $tri"
done < <(grep -o '...' <<< "123456789")

And thus be able to work with each trinucleotide in each iteration:

$ while IFS= read -r tri; do echo "-- $tri"; done < <(grep -o '...' <<< "123456789")
-- 123
-- 456
-- 789

In your case:

$ echo "AAGACAGAGTAGACAAGTACAGTAGACAGATGACGGGTAGCAT" | grep -o '...'
AAG
ACA
GAG
TAG
ACA
AGT
ACA
GTA
GAC
AGA
TGA
CGG
GTA
GCA

Cuauhtli · Answer 3 · 2020-11-25T10:23:04+08:00

You can use sedorawk

Thirsty.

$ sed -r 's/(.{3})/\1\n/g' <<< "abcdefghi"

Here the regular expression matches every three characters (.{3})and then adds a newline to that group \1\n.

with awk.

$ awk -v FS="" '{
    for (i=1; i<=length($0); i++) {
        printf $i
        if (i % 3 == 0)
            printf "\n"
    }
}' <<< "abcdefghi"

Where the field separator is redefined to nothing so that it iterates over characters. And, during the loop, each character number modulo 3 prints a line break.

In your case, using the sequence of nitrogenous bases that you showed.

$ awk -v FS="" '{
    for (i=1; i<=length($0); i++) {
        printf $i
        if (i % 3 == 0)
            printf "\n"
    }
}' <<< "AAGACAGAGTAGACAAGTACAGTAGACAGATGACGGGTAGCAT"
AAG
ACA
GAG
TAG
ACA
AGT
ACA
GTA
GAC
AGA
TGA
CGG
GTA
GCA
T

It seems that at the end of what you entered there was "left over" a thymine.

How can I read a text string, reading its characters three at a time?

Thirsty.

with awk.

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?