EN VI

awk printing same line over and over?

2024-03-15 20:00:06
How to awk printing same line over and over

rookie looking for help

I have a big file (2000+ rows, 1455 columns)

  • I would like to iterate through cells and only print columns/samples that have values <200
  • Do I have to transpose my table/array or is there another solution?
  • Once it finds value <200 I want it to print and move onto next column (so it doesn't print the same
  • I would also like to include my headers

awk '{for (i=3; i<=NF; i++) if ($i<200) print} {next}' table.txt

This is what I'm working with now - but it iterates through rows(?) (starting at column 3 - sample 1) and prints the same row every time it finds a value less than 200 in that row instead of going straight to next row after printing.

awk '{for (i=3; i<=NF; i++) if ($i<200) print} {next}' table.txt

table.txt (small selection)

Gene targ_bp s_1 s_2 s_3
GNB1 217 53 102 1121
GNB1 202 1112 96 1226
GNB1 163 1141 1162 1181

Output with current code:

| GNB1 | 217 | 53 | 102 | 1121 |

| GNB1 | 217 | 53 | 102 | 1121 |

| GNB1 | 202 | 1112 | 96 | 1226 |

Desired:

Gene targ_bp s_1 s_2
GNB1 217 53 102
GNB1 202 1112 96
  • With header, removed rows where all>200 and columns where all>200.

Hope I am clear enough. 0=)

Solution:

Use a 2-pass approach to identify the rows and columns that need to be printed in the first pass and then print them in the second pass, e.g. using any awk:

$ cat tst.awk
BEGIN {
    OFS = "\t"
    begInColNr = 3
}
NR == FNR {
    if ( FNR == 1 ) {
        outRowNrs[FNR]
        for ( inColNr=1; inColNr<begInColNr; inColNr++ ) {
            inColNrs[inColNr]
        }
    }
    else {
        for ( inColNr=begInColNr; inColNr<=NF; inColNr++ ) {
            if ( $inColNr < 200 ) {
                outRowNrs[FNR]
                inColNrs[inColNr]
            }
        }
    }
    next
}
FNR in outRowNrs {
    if ( FNR == 1 ) {
        for ( inColNr=1; inColNr<=NF; inColNr++ ) {
            if ( inColNr in inColNrs ) {
                out2inColNrs[++numOutCols] = inColNr
            }
        }
    }
    for ( outColNr=1; outColNr<=numOutCols; outColNr++ ) {
        inColNr = out2inColNrs[outColNr]
        printf "%s%s", $inColNr, (outColNr<numOutCols ? OFS : ORS)
    }
}

$ awk -f tst.awk table.txt table.txt
Gene    targ_bp s_1     s_2
GNB1    217     53      102
GNB1    202     1112    96
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login