I know that's what awk does, but I don't think I would have expected
it if I didn't know about it.
$0 is the current input line.
If you don't change anything, or if you modify $0 itself, whitespace betweeen fields is preserved.
If you modify any of the fields, $0 is recomputed and whitespace
between tokens is collapsed.
awk *could* have been defined to preserve inter-field whitespace even
when you modify individual fields,
and I think I would have found that more intuitive.
(And ideally there would be a way to refer to that inter-field
whitespace.)
The fact that modifying a field has the side effect of messing up $0
seems counterintuitive.
Perhaps the behavior matches your intuition better than it matches
mine.
(And perhaps this should be moved to comp.lang.awk if it doesn't die
out soon.
Though both sed and awk are both languages in their own right
and tools that can be used from the shell, so I'd argue there's a
topicality overlap.)
On 3/7/24 18:09, Keith Thompson wrote:
I know that's what awk does, but I don't think I would have expected
it if I didn't know about it.
Okay. I think that's a fair observation.
$0 is the current input line.
Or $0 is the current /record/ in awk parlance.
If you don't change anything, or if you modify $0 itself, whitespace
betweeen fields is preserved.
If you modify any of the fields, $0 is recomputed and whitespace
between tokens is collapsed.
I don't agree with that.
% echo 'one two three' | awk '{print $0; print $1,$2,$3}'
one two three
one two three
I didn't /modify/ anything and awk does print the fields with
different white space.
awk *could* have been defined to preserve inter-field whitespace
even when you modify individual fields,
I question the veracity of that. Specifically when lengthening or
shortening the value of a field. E.g. replacing "two" with
"fifteen". This is particularly germane when you look at $0 as a fixed
width formatted output.
and I think I would have found that more intuitive.
I don't agree.
(And ideally there would be a way to refer to that inter-field
whitespace.)
Remember, awk is meant for working on fields of data in a record. By default, the fields are delimited by white space characters. I'll say
it this way, awk is meant for working on the non-white space
characters. Or yet another way, awk is not meant for working on
white space charters.
The fact that modifying a field has the side effect of messing up $0
seems counterintuitive.
Maybe.
But I think it's one that is acceptable for what awk is intended to do.
Perhaps the behavior matches your intuition better than it matches
mine.
I sort of feel like you are wanting to / trying to use awk in places
where sed might be better. sed just sees a string of text and is
ignorant of any structure without a carefully crafted RE to provide it.
Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.
Awk is optimized for working on records consisting of fields, and not
caring much about how much whitespace there is between fields. But it's flexible enought to do *lots* of other things.
But awk doesn't work with fixed-width data. The length of each field,
and the length of $0, is variable.
If awk *purely* dealt with input lines only as lists of tokens, then
this:
echo 'one two three' | awk '{print $0}'
would print "one two three" rather than "one two three" (and awk would
lose the ability to deal with arbitrarily formatted input). The fact
that the inter-field whitespace is reset only when individual fields are touched feels arbitrary to me.
Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.
The original Awk doesn't support regular expressions, right?
Because regex was not yet talked about back then??
Awk without regexps makes little sense;
mind that the basic syntax of Awk programs is described as
pattern { action }
On 08.03.2024 10:03, Mr. Man-wai Chang wrote:
The original Awk doesn't support regular expressions, right?
Where did you get that from? - Awk without regexps makes little sense;
mind that the basic syntax of Awk programs is described as
/pattern/ { action }
What would remain if there's no regexp patterns; string comparisons?
Because regex was not yet talked about back then??
Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was released 1979. Before that tool we had Sed (1974), and before that we
had Ed and Grep (1973). My perception is that regexps were there as a
basic concept of UNIX in all these tools, so why should Awk be exempt. According to the authors Awk was designed to see how Sed and Grep could
be generalized.
I usually think of regular expressions when I'm doing a sub(/re/, ...)
type thing or a (... ~ /re/) type conditional. More specifically things between the // in both of those statements are the REs.
Maybe I have an imprecise understanding / definition.
On 3/8/24 08:46, Janis Papanagnou wrote:
Awk without regexps makes little sense;
I think this comes down to what is a regular expression and what is not
a regular expression.
mind that the basic syntax of Awk programs is described as
pattern { action }
I'm guessing that 40-60% of the awk that I use doesn't use what I would consider to be regular expressions.
[...]
Maybe I have an imprecise understanding / definition.
On 8/3/2024 10:46 pm, Janis Papanagnou wrote:
Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
released 1979. Before that tool we had Sed (1974), and before that we
had Ed and Grep (1973). My perception is that regexps were there as a
basic concept of UNIX in all these tools, so why should Awk be exempt.
According to the authors Awk was designed to see how Sed and Grep could
be generalized.
That part of history is beyond me. Sorry... my fault for not doing a check.
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
On 2024-03-06, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
On 2024-03-06, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
Without braces, the default action takes place, which is ``{print}''.
On 2024-03-06, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
Care needs to be taken when using this shortcut so the expression
doesn't evalute as false:
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=4'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=0'
$
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2="4"'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=""'
$
On 2024-03-06, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head.
You can't have an action without
enclosing braces. But it's still legal syntax because...
it's an expression serving as a pattern.
The assignment itself is a side effect.
Care needs to be taken when using this shortcut so the expression
doesn't evalute as false:
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=4'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=0'
$
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2="4"'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=""'
$
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
instead of:
$2="1-1"
On 09.03.2024 17:52, Ed Morton wrote:
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
I don't recall such a "consensus". If you want to avoid cryptic code
you'd rather write
'{$2="1-1"; print}'
Don't you think?
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
instead of:
$2="1-1"
unless they NEED the result of the action to be evaluated as a
condition, for that very reason.
On 10/3/2024 12:52 am, Ed Morton wrote:[...]
About 20 or so years ago we had a discussion in this NG (which I'm
not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
instead of:
$2="1-1"
unless they NEED the result of the action to be evaluated as a
condition, for that very reason.
You might Google about it, but Google has unplugged its Usenet
support. I dunno whether you could search old Usenet messages. There
is still Wayback Machine archive.
Do Linux and Unix have a ONE AND ONLY ONE STANDARD regex library?
It seemed that tools and programming languages have their own
implementions, let alone different versions among them.
POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).
On 09.03.2024 17:52, Ed Morton wrote:
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
I don't recall such a "consensus".
If you want to avoid cryptic code you'd rather write
'{$2="1-1"; print}'
Don't you think?
And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
instead of:
$2="1-1"
Janis
In article <tv26ck-3qt.ln1@ID-313840.user.individual.net>,
Geoff Clare <netnews@gclare.org.uk> wrote:
POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).
There is the additional requirement that \ inside [....] can
be used to escape characters,
On 3/9/2024 2:07 PM, Janis Papanagnou wrote:
On 09.03.2024 17:52, Ed Morton wrote:
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
I don't recall such a "consensus".
I do, I have no reason to lie about it, but I can't be bothered
searching through 20-year-old usenet archives for it (I did take a very
quick shot at it but I don't even know how to write a good search for it
- you can't just google "awk '1'" and I'm not even sure if it was in comp.lang.awk or comp.unix.shell).
If you want to avoid cryptic code you'd rather write
'{$2="1-1"; print}'
Don't you think?
If I'm writing a multi-line script I use an explicit `print` but it just doesn't matter for a tiny one-line script like that.
Everyone using awk
needs to know the `1` idiom as it's so common and once you've seen it
once it's not hard to figure out what `{$2="1-1"} 1` does.
By changing `condition` to `{condition}1` we just add 3 chars to remove
the guesswork from anyone reading it in future and protect against unconsidered values so we don't just make it less cryptic but also less fragile.
For example, lets say someone wants to copy the $1 value into $3 and
print every line:
$ printf '1 2 3\n4 5 7\n' | awk '{$3=$1}1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '{$3=$1}1'
1 2 1
0 5 0
$ printf '1 2 3\n4 5 7\n' | awk '$3=$1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '$3=$1'
1 2 1
Note the 2nd line is undesirably (because I wrote the requirements)
missing from that last output.
It happens ALL the time that people don't consider all possible input
values so it's safer to just write the code that reflects your intent
and if you intend for every line to be printed then write code that will print every line.
Ed.
And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
instead of:
$2="1-1"
Janis
On 12.03.2024 23:49, Ed Morton wrote:
On 3/9/2024 2:07 PM, Janis Papanagnou wrote:
On 09.03.2024 17:52, Ed Morton wrote:
About 20 or so years ago we had a discussion in this NG (which I'm not >>>> going to search for now) and, shockingly, a consensus was reached that >>>> we should encourage people to always write:
'{$2="1-1"} 1'
I don't recall such a "consensus".
I do, I have no reason to lie about it, but I can't be bothered
searching through 20-year-old usenet archives for it (I did take a very
quick shot at it but I don't even know how to write a good search for it
- you can't just google "awk '1'" and I'm not even sure if it was in
comp.lang.awk or comp.unix.shell).
I didn't say anything about "lying"; why do you insinuate so?
But your memory may mislead you. (Or mine, or Kaz', of course.)
(And no, I don't do the search for you; since you have been the
one contending something here.)
Without a reference such a statement is just void (and not more
than a rhetorical move).
You should at least elaborate on the details and facts of that
"consensus" - but for the _specific OP context_ (not for made
up cases).
If you want to avoid cryptic code you'd rather write
'{$2="1-1"; print}'
Don't you think?
If I'm writing a multi-line script I use an explicit `print` but it just
doesn't matter for a tiny one-line script like that.
Actually, for the given case, the yet better solution is what the
OP himself said (in CUS, where his question was initially posted):
Grant Taylor on alt.comp.software.thunderbird suggested [...]:
$ awk '{print $1, "1-1"}'
Since this suggestion doesn't overwrite fields and is conceptually
clear. It inherently also handles (possible?) cases where there's
more than two fields in the data (e.g. by spurious blanks).
Everyone using awk
needs to know the `1` idiom as it's so common and once you've seen it
once it's not hard to figure out what `{$2="1-1"} 1` does.
The point is that $2="1-1" as condition is also an Awk idiom.
By changing `condition` to `{condition}1` we just add 3 chars to remove
the guesswork from anyone reading it in future and protect against
unconsidered values so we don't just make it less cryptic but also less
fragile.
Your examples below are meaningless since you make up cases that have
nothing to do with the situation here, and especially in context of
my posting saying clearly: "In this specific case of regular data".
The more problematic issue is that $2="1-1" and also {$2="1-1"}
both overwrite fields and thus a reorganization of the fields is
done which has - probably unexpected by a newbie coder - side effects.
But YMMV, of course.
Janis
For example, lets say someone wants to copy the $1 value into $3 and
print every line:
$ printf '1 2 3\n4 5 7\n' | awk '{$3=$1}1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '{$3=$1}1'
1 2 1
0 5 0
$ printf '1 2 3\n4 5 7\n' | awk '$3=$1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '$3=$1'
1 2 1
Note the 2nd line is undesirably (because I wrote the requirements)
missing from that last output.
It happens ALL the time that people don't consider all possible input
values so it's safer to just write the code that reflects your intent
and if you intend for every line to be printed then write code that will
print every line.
Ed.
And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
instead of:
$2="1-1"
Janis
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,030 |
Nodes: | 10 (0 / 10) |
Uptime: | 191:31:54 |
Calls: | 13,338 |
Calls today: | 1 |
Files: | 186,574 |
D/L today: |
1,193 files (338M bytes) |
Messages: | 3,356,796 |