Corrections for "Definitive Guide to sed"
This web page has any corrections found for "Definitive Guide to sed", First Edition, published 2013, by Daniel Goldman.
Number: 1
Date: June 27, 2014
Title: Use \^, \$, and \* for Literal MetaChars, don't rely on sed to figure it out.
Source: Discussion on the yahoo sed-users group, totally unrelated to this book.
Problem: When the -r command line option is not used, sed uses "basic regular expressions" (BRE). When the -r command line option IS used, sed uses "extended regular expressions" (ERE). This is all explained in the book. In either mode, "^ab" means "ab" at the beginning of the PatSpace. That's fine. The problem is that in BRE, "a^b" means the literal sequence "a^b", because BRE "figures out" that only a literal makes sense in this position. But in ERE, "a^b" means that "b" occurs at the beginning of the PatSpace, because ERE does not try to "help you" or "figure anything out". In other words, with the -r option (ERE), ^ always has its special anchor meaning, even if not the first character in the RegEx. So "a^b" NEVER matches when the -r option is used. Similar problems occur with $ and *.
Correction: The syntax in the book works correctly. But the syntax is unsafe in a sense, because it will not work correctly with the -r option. So if using ^, $, or * as a literal character, always write \^, \$, or \* as this will always work correctly, whether using BRE (no -r) or ERE (-r). Don't rely on sed to figure out if a character is literal. Instead, explicitly tell sed (eg, \^, \$, \*) when you want a literal.
Number: 2
Date: April 26, 2015
Title: q prints scheduled text, Q does not (at least for now).
Source: Anders Granlund from Sweden sent the author an email pointing out the error.
Problem: "Definitive Guide" states that the only difference between q and Q is: "q prints PatSpace if not -n option", while "Q never prints PatSpace". Although consistent with the GNU sed manual, this is incomplete and incorrect, because it fails to describe the difference in how scheduled text is treated. q prints scheduled text. Q does not. Here is an example showing the difference:
$ echo y | sed -e ix -e az -e q
x
y
z
$ echo y | sed -e ix -e az -e Q
x
In the q example, PatSpace ("y") is printed. Q does not print PatSpace.
In the q example, scheduled text ("z") is printed. Q does not print scheduled text.
Correction: In addition to the difference between q and Q in the book, the following difference should be added: "q prints scheduled output from arR commands, while Q does not print any scheduled output". Also, cdD definitions were modified to mention that they print scheduled text.
Future: Based on discussion on the sed-users group, and discussion with current and past GNU sed maintainers, it was decided that Q behavior should be changed in the future to print any scheduled text. There are two reasons this makes sense: 1) Q was apparently modeled after d (which prints scheduled text), and 2) the decision to schedule was already made before calling Q and should be honored. The future change in behavior will make Q consistent with with other commands that end the sed Cycle prematurely (c dD q), so that scheduled text will always be printed.
Number: 3
Date: April 26, 2015
Title: t/T flags actually reset when t/T run, not when t/T branch.
Source: Anders Granlund from Sweden sent the author an email pointing out the error.
Problem: "Definitive Guide" states: "t ('test') branches to the end of the sed script if s replaced since the current line was read, or since the last t/T branch was taken." Despite being consistent with the wording in the GNU sed manual, this is not correct.
------ Using current T definition, "Branch to label only if there have been no successful substitutions since the last input line was read or conditional branch was taken", the output would be (NOT observed):
$ echo old | sed "s/old/new/; Tx; l; :x T; d"
new$
- s works
- T "runs", but no effect on flag from just running
- T does not branch (s worked), so flag not reset
- l runs
- T does not branch, just like it didn't the last time
- d runs, so PatSpace not printed
------ Using corrected T definition, "Branch to label only if there have been no successful substitutions since the last input line was read or conditional branch was run", the output would be (this IS observed):
$ echo old | sed "s/old/new/; Tx; l; :x T; d"
new$
new
- s works
- T "runs", so flag reset after T done
- T does not branch, because s worked
- l runs
- T branches this time, because flag was reset
- d skipped, PatSpace prints
The original description of sed (from 1970's) says: "The t function tests whether any successful substitutions have been made on the current input line; if so, it branches to 'label'; if not, it does nothing. The flag which indicates that a successful substitution has been executed is reset by: 1) reading a new input line, or 2) executing a t function." The phrase "executing a t function" is consistent with "command was run", not "branch was taken". So it seems GNU sed is correct, but the GNU sed manual was in error.
Correction: The text should read: "t ('test') branches to the end of the sed script if s replaced since the current line was read, or since the last t/T command was run" and "T ('Test') branches to the end of the sed script if s has not replaced since the current line was read, or since the last t/T command was run". My understanding is this will also be corrected in the GNU sed manual.
Number: 4
Date: April 26, 2015
Title: "Delete Following Matched Line" script example is incorrect.
Source: Anders Granlund from Sweden sent the author an email pointing out the error.
Problem: In the Chapter "Examples: Delete Some Lines", the example "Delete Following Matched Line" is incorrect, because it fails to take into account the case where the next line is also a matched line, as shown by the following counter-example, where the intent is to delete a line containing "2" if the line directly follows a line containing "1":
$ echo -e "1\n1\n2\n3" | sed "/1/ {n; /2/ d}"
1
1
2
3
Correction: Similar to the correct syntax used in similar "Print Line After Match" example in the previous chapter, the corrected syntax, where the bk loop ensures we are positioned at the last matching line before testing the next line, is:
$ echo -e "1\n1\n2\n3" | sed "/1/ {:k n; // bk; /2/ d}"
1
1
3
Number: 5
Date: April 26, 2015
Title: "sed {} (grouping)" example is incorrect.
Source: Anders Granlund from Sweden sent the author an email pointing out the error.
Problem: In the Chapter "Other Actions", the second example under "sed { } (grouping)" does not work correctly if the input contains blank lines, as shown by the following counter-example that gives two blank lines (desired behavior was one line), because ^$ RegEx matches so p runs:
$ echo | sed '/2/x; /^$/p; /^$/x'
Correction: The corrected alternate example not using {} syntax is:
$ echo | sed '/2/! b; x; p; x'
Note that the above is logically equivalent to sed '/2/ {x; p; x}'.
Learn sed - Home Page | sed Book - Table of Contents