Friday, July 2, 2021

Munging Standard Input

I was writing a shell script at work to extract a few lines from from some test output. It wasn't working like I expected so I examined the file: it had Windows-style linebreaks.

Converting "DOS" text files into Unix style is a hall of fame problem that's up there with "how do I rm a file with a name that starts with a minus sign?" and there are several fine solutions. The problem I have is when I run the script I don't want to have to remember do the conversion myself. The script should do it for me!

Here's my first pass:

tr -d '\r' | (
  while read line; do
    echo "Got a line: ${line}"
  done
)

Trust me that my real script did actual work, not just a silly echo. It is also several lines longer.

The tr filters the input stream to Unix-style if it's not already, and then pipes it to the real script action inside parentheses.

CHEERS: The caller doesn't have to convert input. Also works with plain old /bin/sh
JEERS
: Virtually the entire script has to live inside parens

Next I tried process substitution, which is possible with bash and zsh (though not Mac OS /bin/sh):

exec 3< <(tr -d '\r')

while read <&3 line; do
  echo "Got a line: ${line}"
done

CHEERS: The script doesn't need to be wrapped in parens!
JEERS: It has to read from alternate file descriptor 3 instead of STDIN. And it doesn't work with /bin/sh anymore.

How about I replace FD 0 (STDIN) with this new FD 3?

exec 3< <(tr -d '\r')
exec 3>&0

while read line; do
  echo "Got a line: ${line}"
done

CHEERS: The while loop and anything else in the script can just read from STDIN like usual
JEERS: We're still opening up this weird file descriptor 3 what we never use after line 2 directly

How about substituting FD 0 from the start? Will that break anything?

exec 0< <(tr -d '\r')
 

while read line; do
  echo "Got a line: ${line}"
done

CHEERS: No more stray file descriptors
JEERS: The first line is cryptic? Just kidding this is 100% excellent. I see no flaws.