In article <QnROO.226037$EEm7.111715@fx16.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <vefvo0$k1mm$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
Really? So java bytecode will run direct on x86 or ARM will it? Please give
some links to this astounding discovery you've made.
Um, ok. https://en.wikipedia.org/wiki/Jazelle
There was also a company a couple of decades ago that
built an entire processor designed to execute bytecode
directly - with a coprocessor to handle I/O.
IIRC, it was Azul. There were a number of others, including
Sun.
None of them panned out - JIT's ended up winning that battle.
Even ARM no longer includes Jazelle extensions in any of their
mainstream processors.
Sure. But the fact that any of these were going concerns is an
existence proof that one _can_ take bytecodes targetted toward a
"virtual" machine and execute it on silicon,
making the
distinction a lot more fluid than might be naively assumed, in
turn exposing the silliness of this argument that centers around
this weirdly overly-rigid definition of what a "compiler" is.
On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
[...]
No. It translates one computer _language_ to another computer
_language_. In the usual case, that's from a textual source
Machine code isn't a language. Fallen at the first hurdle with that definition.
Irrelevant. Lot of interpreters do partial compilation and the JVM does it
on the fly. A proper compiler writes a standalone binary file to disk.
On 2024-10-11, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
Irrelevant. Lot of interpreters do partial compilation and the JVM does it >> on the fly. A proper compiler writes a standalone binary file to disk.
You might want to check those goalposts again. You can easily make a
"proper compiler" which just writes a canned interpreter executable to
disk, appending to it the program source code.
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
In article <vegmul$ne3v$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>>So what is standard terminology then?
I've already explained this to you.
No you haven't. You explanation seems to be "anything that converts from one >language to another".
What happens inside the CPU is irrelevant. Its a black box as far as the >>>rest of the machine is concerned. As I said in another post, it could be >>>pixies with abacuses, doesn't matter.
So why do you think it's so important that the definition of a
Who said its important? Its just what most people think of as compilers.
CPU"? If, as you admit, what the CPU does is highly variable,
then why do you cling so hard to this meaningless distinction?
You're the one making a big fuss about it with pages of waffle to back up >your claim.
[lots of waffle snipped]
In other words, you discard anything that doesn't fit with your >>preconceptions. Got it.
No, I just have better things to do on a sunday than read all that. Keep
it to the point.
So its incomplete and has to revert to software for some opcodes. Great. >>>FWIW Sun also had a java processor but you still can't run bytecode on >>>normal hardware without a JVM.
Cool. So if I run a program targetting a newer version of an
ISA is run on an older machine, and that machine lacks a newer
instruction present in the program, and the CPU generates an
illegal instruction trap at runtime that the OS catches and
emulates on the program's behalf, the program was not compiled?
And again, what about an emulator for a CPU running on a
different CPU? I can boot 7th Edition Unix on a PDP-11
emulator on my workstation; does that mean that the 7the
edition C compiler wasn't a compiler?
Its all shades of grey. You seem to be getting very worked up about it.
As I said, most people consider a compiler as something that translates source
code to machine code and writes it to a file.
Why, whats the difference? Your definition seems to be any program that can >>>translate from one language to another.
If you can't see that yourself, then you're either ignorant or
obstinant. Take your pick.
So you can't argue the failure of your logic then. Noted.
Yes, they're entirely analoguous.
https://docs.oracle.com/cd/E11882_01/appdev.112/e10825/pc_02prc.htm
Nah, not really.
Oh nice counter arguement, you really sold your POV there.
Who cares about the current state? Has nothing to do with this discussion. >>In other words, "I don't have an argument, so I'll just lamely
try to define things until I'm right."
Im just defining things the way most people see it, not some ivory tower >academics. Anyway, lifes too short for the rest.
[tl;dr]
that a compiler is pretty much any program which translates from one thing to
another.
No. It translates one computer _language_ to another computer
_language_. In the usual case, that's from a textual source
Machine code isn't a language. Fallen at the first hurdle with that >definition.
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
In article <vegmul$ne3v$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>>So what is standard terminology then?
I've already explained this to you.
No you haven't. You explanation seems to be "anything that converts from one >language to another".
What happens inside the CPU is irrelevant. Its a black box as far as the >>>rest of the machine is concerned. As I said in another post, it could be >>>pixies with abacuses, doesn't matter.
So why do you think it's so important that the definition of a
Who said its important? Its just what most people think of as compilers.
CPU"? If, as you admit, what the CPU does is highly variable,
then why do you cling so hard to this meaningless distinction?
You're the one making a big fuss about it with pages of waffle to back up >your claim.
So its incomplete and has to revert to software for some opcodes. Great. >>>FWIW Sun also had a java processor but you still can't run bytecode on >>>normal hardware without a JVM.
Cool. So if I run a program targetting a newer version of an
ISA is run on an older machine, and that machine lacks a newer
instruction present in the program, and the CPU generates an
illegal instruction trap at runtime that the OS catches and
emulates on the program's behalf, the program was not compiled?
And again, what about an emulator for a CPU running on a
different CPU? I can boot 7th Edition Unix on a PDP-11
emulator on my workstation; does that mean that the 7the
edition C compiler wasn't a compiler?
Its all shades of grey. You seem to be getting very worked up about it.
As I said, most people consider a compiler as something that translates source
code to machine code and writes it to a file.
[snip]
Who cares about the current state? Has nothing to do with this discussion. >>In other words, "I don't have an argument, so I'll just lamely
try to define things until I'm right."
Im just defining things the way most people see it, not some ivory tower >academics. Anyway, lifes too short for the rest.
Machine code isn't a language. Fallen at the first hurdle with that >definition.
On 2024-10-12, Rainer Weikusat <rweikusat@talktalk.net> wrote:
Indeed. As far as I know the term, an interpreter is something which
reads text from a file, parses it an checks it for syntax errors
and then executes the code as soon as enough of it has been gathered to
allow for execution of something, ie, a complete statement. This read,
check and parse, execute cycle is repeated until the program
terminates.
I don't really want to participate in this discussion, but what
you're saying there is that all those 1980s home computer BASIC
interpreters, which read and tokenized a program before execution,
were actually compilers.
On 13/10/2024 16:52, Dan Cross wrote:
In article <QnROO.226037$EEm7.111715@fx16.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <vefvo0$k1mm$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
Really? So java bytecode will run direct on x86 or ARM will it? Please give
some links to this astounding discovery you've made.
Um, ok. https://en.wikipedia.org/wiki/Jazelle
There was also a company a couple of decades ago that
built an entire processor designed to execute bytecode
directly - with a coprocessor to handle I/O.
IIRC, it was Azul. There were a number of others, including
Sun.
None of them panned out - JIT's ended up winning that battle.
Even ARM no longer includes Jazelle extensions in any of their
mainstream processors.
Sure. But the fact that any of these were going concerns is an
existence proof that one _can_ take bytecodes targetted toward a
"virtual" machine and execute it on silicon,
making the
distinction a lot more fluid than might be naively assumed, in
turn exposing the silliness of this argument that centers around
this weirdly overly-rigid definition of what a "compiler" is.
I've implemented numerous compilers and interpreters over the last few >decades (and have dabbled in emulators).
To me the distinctions are clear enough because I have to work at the
sharp end!
I'm not sure why people want to try and be clever by blurring the roles
of compiler and interpreter; that's not helpful at all.
Sure, people can write emulators for machine code, which are a kind of >interpreter, or they can implement bytecode in hardware; so what?
That doesn't really affect what I do. Writing compiler backends for
actual CPUs is hard work. Generating bytecode is a lot simpler.
(Especially in my case as I've devised myself, another distinction. >Compilers usually target someone else's instruction set.)
If you want one more distinction, it is this: with my compiler, the >resultant binary is executed by a separate agency: the CPU. Or maybe the
OS loader will run it through an emulator.
With my interpreter, then *I* have to write the dispatch routines and
write code to implement all the instructions.
(My compilers generate an intermediate language, a kind of VM, which is
then processed further into native code.
But I have also tried interpreting that VM; it just runs 20 times slower >than native code. That's what interpreting usually means: slow programs.)
On 2024-10-11, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
Irrelevant. Lot of interpreters do partial compilation and the JVM does it >> on the fly. A proper compiler writes a standalone binary file to disk.
You might want to check those goalposts again. You can easily make a
"proper compiler" which just writes a canned interpreter executable to
disk, appending to it the program source code.
On Sat, 12 Oct 2024 21:25:17 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Sat, 12 Oct 2024 08:42:17 -0000 (UTC), Muttley boring babbled:
Code generated by a compiler does not require an interpreter.
Something has to implement the rules of the “machine language”. This is >>why we use the term “abstract machine”, to avoid having to distinguish >>between “hardware” and “software”.
Think: modern CPUs typically have “microcode” and “firmware” associated
with them. Are those “hardware” or “software”?
Who cares what happens inside the CPU hardware?
On Sat, 12 Oct 2024 16:39:20 +0000
Eric Pozharski <apple.universe@posteo.net> boring babbled:
with <87wmighu4i.fsf@doppelsaurus.mobileactivedefense.com> Rainer
Weikusat wrote:
Muttley@DastartdlyHQ.org writes:
On Wed, 09 Oct 2024 22:25:05 +0100 Rainer Weikusat
<rweikusat@talktalk.net> boring babbled:
Bozo User <anthk@disroot.org> writes:
On 2024-04-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Sun, 07 Apr 2024 00:01:43 +0000, Javier wrote:
*CUT* [ 19 lines 6 levels deep]
Its syntax is also a horrific mess.Which means precisely what?
You're arguing with Unix Haters Handbook. You've already lost.
ITYF the people who dislike Perl are the ones who actually like the unix
way of having simple daisychained tools instead of some lump of a language that does everything messily.
ITYF the people who dislike Perl are the ones who actually like the unix
way of having simple daisychained tools instead of some lump of a
language that does everything messily.
What happens inside the CPU is irrelevant.
You explanation seems to be "anything that converts from one
language to another".
You know there's formal definitions for what constitutes languages.
On Sun, 13 Oct 2024 18:28:32 +0200, Janis Papanagnou wrote:
You know there's formal definitions for what constitutes languages.
Not really. For example, some have preferred the term “notation” instead of “language”.
Regardless of what you call it, machine code still qualifies.
In article <vegs0o$nh5t$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 13/10/2024 16:52, Dan Cross wrote:
In article <QnROO.226037$EEm7.111715@fx16.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <vefvo0$k1mm$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
Really? So java bytecode will run direct on x86 or ARM will it? Please give
some links to this astounding discovery you've made.
Um, ok. https://en.wikipedia.org/wiki/Jazelle
There was also a company a couple of decades ago that
built an entire processor designed to execute bytecode
directly - with a coprocessor to handle I/O.
IIRC, it was Azul. There were a number of others, including
Sun.
None of them panned out - JIT's ended up winning that battle.
Even ARM no longer includes Jazelle extensions in any of their
mainstream processors.
Sure. But the fact that any of these were going concerns is an
existence proof that one _can_ take bytecodes targetted toward a
"virtual" machine and execute it on silicon,
making the
distinction a lot more fluid than might be naively assumed, in
turn exposing the silliness of this argument that centers around
this weirdly overly-rigid definition of what a "compiler" is.
I've implemented numerous compilers and interpreters over the last few
decades (and have dabbled in emulators).
To me the distinctions are clear enough because I have to work at the
sharp end!
I'm not sure why people want to try and be clever by blurring the roles
of compiler and interpreter; that's not helpful at all.
I'm not saying the two are the same; what I'm saying is that
this arbitrary criteria that a compiler must emit a fully
executable binary image is not just inadquate, but also wrong,
as it renders separate compilation impossible. I am further
saying that there are many different _types_ of compilers,
including specialized tools that don't emit machine language.
Sure, people can write emulators for machine code, which are a kind of
interpreter, or they can implement bytecode in hardware; so what?
That's exactly my point.
That doesn't really affect what I do. Writing compiler backends for
actual CPUs is hard work. Generating bytecode is a lot simpler.
That really depends on the bytecode, doesn't it? The JVM is a
complex beast;
MIPS or the unprivileged integer subset of RISC-Vare pretty simple in comparison.
(Especially in my case as I've devised myself, another distinction.
Compilers usually target someone else's instruction set.)
If you want one more distinction, it is this: with my compiler, the
resultant binary is executed by a separate agency: the CPU. Or maybe the
OS loader will run it through an emulator.
Python has a mode by which it will emit bytecode _files_, which
can be separately loaded and interpreted; it even has an
optimizing mode. Is that substantially different?
With my interpreter, then *I* have to write the dispatch routines and
write code to implement all the instructions.
Again, I don't think that anyone disputes that interpreters
exist. But insisting that they must take a particular shape is
just wrong.
(My compilers generate an intermediate language, a kind of VM, which is
then processed further into native code.
Then by the definition of this psuedonyminous guy I've been
responding to, your compiler is not a "proper compiler", no?
But I have also tried interpreting that VM; it just runs 20 times slower
than native code. That's what interpreting usually means: slow programs.)
Not necessarily. The JVM does pretty good, quite honestly.
On 13/10/2024 21:29, Dan Cross wrote:
In article <vegs0o$nh5t$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 13/10/2024 16:52, Dan Cross wrote:
[snip]
Sure. But the fact that any of these were going concerns is an
existence proof that one _can_ take bytecodes targetted toward a
"virtual" machine and execute it on silicon,
making the
distinction a lot more fluid than might be naively assumed, in
turn exposing the silliness of this argument that centers around
this weirdly overly-rigid definition of what a "compiler" is.
I've implemented numerous compilers and interpreters over the last few
decades (and have dabbled in emulators).
To me the distinctions are clear enough because I have to work at the
sharp end!
I'm not sure why people want to try and be clever by blurring the roles
of compiler and interpreter; that's not helpful at all.
I'm not saying the two are the same; what I'm saying is that
this arbitrary criteria that a compiler must emit a fully
executable binary image is not just inadquate, but also wrong,
as it renders separate compilation impossible. I am further
saying that there are many different _types_ of compilers,
including specialized tools that don't emit machine language.
Sure, people can write emulators for machine code, which are a kind of
interpreter, or they can implement bytecode in hardware; so what?
That's exactly my point.
So, then what, we do away with the concepts of 'compiler' and
'interpreter'? Or allow them to be used interchangeably?
Somehow I don't think it is useful to think of gcc as a interpreter for
C, or CPython as an native code compiler for Python.
That doesn't really affect what I do. Writing compiler backends for
actual CPUs is hard work. Generating bytecode is a lot simpler.
That really depends on the bytecode, doesn't it? The JVM is a
complex beast;
Is it? It's not to my taste, but it didn't look too scary to me. Whereas >modern CPU instruction sets are horrendous. (I normally target x64,
which is described in 6 large volumes. RISC ones don't look much better,
eg. RISC V with its dozens of extensions and special types)
Example of JVM:
aload index Push a reference from local variable #index
MIPS or the unprivileged integer subset of RISC-Vare pretty simple in comparison.
(Especially in my case as I've devised myself, another distinction.
Compilers usually target someone else's instruction set.)
If you want one more distinction, it is this: with my compiler, the
resultant binary is executed by a separate agency: the CPU. Or maybe the >>> OS loader will run it through an emulator.
Python has a mode by which it will emit bytecode _files_, which
can be separately loaded and interpreted; it even has an
optimizing mode. Is that substantially different?
Whether there is a discrete bytecode file is besides the point. (I
generated such files for many years.)
You still need software to execute it. Especially for dynamically typed >bytecode which doesn't lend itself easily to either hardware >implementations, or load-time native code translation.
With my interpreter, then *I* have to write the dispatch routines and
write code to implement all the instructions.
Again, I don't think that anyone disputes that interpreters
exist. But insisting that they must take a particular shape is
just wrong.
What shape would that be? Generally they will need some /software/ to
excute the instructions of the program being interpreted, as I said.
Some JIT products may choose to do on-demand translation to native code.
Is there anything else? I'd be interested in anything new!
(My compilers generate an intermediate language, a kind of VM, which is
then processed further into native code.
Then by the definition of this psuedonyminous guy I've been
responding to, your compiler is not a "proper compiler", no?
Actually mine is more of a compiler than many, since it directly
generates native machine code. Others generally stop at ASM code (eg.
gcc) or OBJ code, and will invoke separate programs to finish the job.
The intermediate language here is just a step in the process.
But I have also tried interpreting that VM; it just runs 20 times slower >>> than native code. That's what interpreting usually means: slow programs.) >>Not necessarily. The JVM does pretty good, quite honestly.
But is it actually interpreting? Because if I generated such code for a >statically typed language, then I would first translate to native code,
of any quality, since it's going to be faster than interpreting.
On 13.10.2024 23:10, Lawrence D'Oliveiro wrote:
On Sun, 13 Oct 2024 18:28:32 +0200, Janis Papanagnou wrote:
You know there's formal definitions for what constitutes languages.
Not really. For example, some have preferred the term “notation”
instead of “language”.
A "notation" is not the same as a [formal (or informal)] "language".
(Frankly, I don't know where you're coming from ...
[ X-post list reduced ]
On 13.10.2024 18:02, Muttley@DastartdlyHQ.org wrote:
On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
[...]
No. It translates one computer _language_ to another computer
_language_. In the usual case, that's from a textual source
Machine code isn't a language. Fallen at the first hurdle with that
definition.
Careful (myself included); watch out for the glazed frost!
You know there's formal definitions for what constitutes languages.
At first glance I don't see why machine code wouldn't quality as a
language (either as some specific "mnemonic" representation, or as
a sequence of integral numbers or other "code" representations).
What's the problem, in your opinion, with considering machine code
as a language?
In article <vegqu5$o3ve$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
The people who create the field are the ones who get to make
the defintiions, not you.
Machine code isn't a language. Fallen at the first hurdle with that >>definition.
Oh really? Is that why they call it "machine language"? It's
even in the dictionary with "machine code" as a synonymn: >https://www.merriam-webster.com/dictionary/machine%20language
ITYF the people who dislike Perl are the ones who actually like the unix
way of having simple daisychained tools instead of some lump of a language >> that does everything messily.
Perl is a general-purpose programming language, just like C or Java (or >Python or Javascript or Rust or $whatnot). This means it can be used to >implement anything (with some practical limitation for anything) and not
that it "does everything".
On Sun, 13 Oct 2024 21:33:56 +0100
Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
Muttley@DastartdlyHQ.org writes:
ITYF the people who dislike Perl are the ones who actually like the unix >>> way of having simple daisychained tools instead of some lump of a language >>> that does everything messily.
Perl is a general-purpose programming language, just like C or Java (or >>Python or Javascript or Rust or $whatnot). This means it can be used to >>implement anything (with some practical limitation for anything) and not >>that it "does everything".
I can be , but generally isn't. Its niche tends to be text processing of
some sort
The simple but flexible OO system, reliable automatic memory management
On Mon, 14 Oct 2024 01:16:11 +0200, Janis Papanagnou wrote:
On 13.10.2024 23:10, Lawrence D'Oliveiro wrote:
On Sun, 13 Oct 2024 18:28:32 +0200, Janis Papanagnou wrote:
You know there's formal definitions for what constitutes languages.
Not really. For example, some have preferred the term “notation”
instead of “language”.
A "notation" is not the same as a [formal (or informal)] "language".
(Frankly, I don't know where you're coming from ...
<https://en.wikipedia.org/wiki/Programming_language>:
A programming language is a system of notation for writing computer
programs.
On Sun, 13 Oct 2024 18:28:32 +0200
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
[ X-post list reduced ]
On 13.10.2024 18:02, Muttley@DastartdlyHQ.org wrote:
On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
[...]
No. It translates one computer _language_ to another computer
_language_. In the usual case, that's from a textual source
Machine code isn't a language. Fallen at the first hurdle with that
definition.
Careful (myself included); watch out for the glazed frost!
You know there's formal definitions for what constitutes languages.
At first glance I don't see why machine code wouldn't quality as a
language (either as some specific "mnemonic" representation, or as
a sequence of integral numbers or other "code" representations).
What's the problem, in your opinion, with considering machine code
as a language?
A programming language is an abstraction of machine instructions that is readable by people.
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
Oh really? Is that why they call it "machine language"? It's
even in the dictionary with "machine code" as a synonymn: >>https://www.merriam-webster.com/dictionary/machine%20language
Its not a programming language.
In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
Oh really? Is that why they call it "machine language"? It's
even in the dictionary with "machine code" as a synonymn: >>>https://www.merriam-webster.com/dictionary/machine%20language
Its not a programming language.
That's news to those people who have, and sometimes still do,
write programs in it.
But that's not important. If we go back and look at what I
|No. It translates one computer _language_ to another computer
|_language_. In the usual case, that's from a textual source
Note that I said, "computer language", not "programming
language". Being a human-readable language is not a requirement
for a computer language.
Your claim that "machine language" is not a "language" is simply
not true. Your claim that a "proper" compiler must take the
shape you are pushing is also not true.
On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>>On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
Oh really? Is that why they call it "machine language"? It's
even in the dictionary with "machine code" as a synonymn: >>>>https://www.merriam-webster.com/dictionary/machine%20language
Its not a programming language.
That's news to those people who have, and sometimes still do,
write programs in it.
Really? So if its a language you'll be able to understand this then:
0011101011010101010001110101010010110110001110010100101001010100 >0101001010010010100101010111001010100110100111010101010101010101 >0001110100011101010001001010110011100010101001110010100101100010
On Sun, 13 Oct 2024 18:28:32 +0200
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
[ X-post list reduced ]
A programming language is an abstraction of machine instructions that is >readable by people.
Muttley@DastartdlyHQ.org writes:
On Sun, 13 Oct 2024 18:28:32 +0200
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
[ X-post list reduced ]
A programming language is an abstraction of machine instructions that is >>readable by people.
By that definition, PAL-D is a programming language.
Any assembler is a programming language, by that definition.
On Mon, 14 Oct 2024 11:38:29 +0100
Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
The simple but flexible OO system, reliable automatic memory management
Then there's the whole 2 stage object creation with the "bless"
nonsense. Hacky.
Your claim that "machine language" is not a "language" is simply
not true.
Muttley@DastartdlyHQ.org writes:
On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
Oh really? Is that why they call it "machine language"? It's
even in the dictionary with "machine code" as a synonymn:
https://www.merriam-webster.com/dictionary/machine%20language
Its not a programming language.
That's news to those people who have, and sometimes still do,
write programs in it.
Really? So if its a language you'll be able to understand this then:
0011101011010101010001110101010010110110001110010100101001010100
0101001010010010100101010111001010100110100111010101010101010101
0001110100011101010001001010110011100010101001110010100101100010
I certainly understand this, even four decades later
94A605440C00010200010400000110
On Mon, 14 Oct 2024 11:38:29 +0100
Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
The simple but flexible OO system, reliable automatic memory management
[...]
Then there's the whole 2 stage object creation with the "bless"
nonsense. Hacky.
I was planning to write a longer reply but killed it. You're obviously >argueing about something you reject for political reasons despite you're
not really familiar with it and you even 'argue' like a politician. That
is, you stick peiorative labels on stuff you don't like to emphasize how >really disagreeable you believe it to be. IMHO, such a method of >(pseudo-)discussing anything is completely pointless.
On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
Oh really? Is that why they call it "machine language"? It's
even in the dictionary with "machine code" as a synonymn:
https://www.merriam-webster.com/dictionary/machine%20language
Its not a programming language.
That's news to those people who have, and sometimes still do,
write programs in it.
Really? So if its a language you'll be able to understand this then:
0011101011010101010001110101010010110110001110010100101001010100 0101001010010010100101010111001010100110100111010101010101010101 0001110100011101010001001010110011100010101001110010100101100010
On 14/10/2024 16:53, Scott Lurndal wrote:
Muttley@DastartdlyHQ.org writes:
Really? So if its a language you'll be able to understand this then:
0011101011010101010001110101010010110110001110010100101001010100
0101001010010010100101010111001010100110100111010101010101010101
0001110100011101010001001010110011100010101001110010100101100010
I certainly understand this, even four decades later
94A605440C00010200010400000110
In my early days of assembly programming on my ZX Spectrum, I would hand-assembly to machine code, and I knew at least a few of the codes by heart. (01 is "ld bc, #xxxx", 18 is "jr", c9 is "ret", etc.) So while
I rarely wrote machine code directly, it is certainly still a
programming language - it's a language you can write programs in.
Muttley@DastartdlyHQ.org writes:
On Sun, 13 Oct 2024 18:28:32 +0200
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
[ X-post list reduced ]
A programming language is an abstraction of machine instructions that is
readable by people.
By that definition, PAL-D is a programming language.
Any assembler is a programming language, by that definition.
cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
[snip]
|No. It translates one computer _language_ to another computer >>|_language_. In the usual case, that's from a textual source
Note that I said, "computer language", not "programming
language". Being a human-readable language is not a requirement
for a computer language.
Oh watch those goalpost moves with pedant set to 11. Presumably you
think the values of the address lines is a language too.
Your claim that "machine language" is not a "language" is simply
not true. Your claim that a "proper" compiler must take the
shape you are pushing is also not true.
If you say so.
A programming language is an abstraction of machine instructions that is readable by people.
On 14/10/2024 15:58, Scott Lurndal wrote:
Muttley@DastartdlyHQ.org writes:
On Sun, 13 Oct 2024 18:28:32 +0200
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
[ X-post list reduced ]
A programming language is an abstraction of machine instructions that is >>> readable by people.
By that definition, PAL-D is a programming language.
(I've no idea what PAL-D is in this context.)
Any assembler is a programming language, by that definition.
You mean 'assembly'? An assembler (in the sofware world) is usually a program that translates textual assembly code.
'Compiler' isn't a programming language (although no doubt someone here
will dredge up some obscure language with exactly that name just to
prove me wrong).
On 14/10/2024 15:58, Scott Lurndal wrote:
Muttley@DastartdlyHQ.org writes:
On Sun, 13 Oct 2024 18:28:32 +0200
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
[ X-post list reduced ]
A programming language is an abstraction of machine instructions that is >>> readable by people.
By that definition, PAL-D is a programming language.
(I've no idea what PAL-D is in this context.)
On Wed, 09 Oct 2024 22:25:05 +0100
Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
Bozo User <anthk@disroot.org> writes:
On 2024-04-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Sun, 07 Apr 2024 00:01:43 +0000, Javier wrote:
The downside is the loss of performance because of disk access for
trivial things like 'nfiles=$(ls | wc -l)'.
Well, you could save one process creation by writing
???nfiles=$(echo * | wc -l)??? instead. But that would still not be >>strictly
correct.
I suspect disk access times where
one of the reasons for the development of perl in the early 90s.
Shells were somewhat less powerful in those days. I would describe the >>>> genesis of Perl as ???awk on steroids???. Its big party trick was regular >>>> expressions. And I guess combining that with more sophisticated data-
structuring capabilities.
Perl is more awk+sed+sh in a single language. Basically the killer
of the Unix philophy in late 90's/early 00's, and for the good.
Perl is a high-level programming language with a rich syntax??, with >>support for deterministic automatic memory management, functions as >>first-class objects and message-based OO. It's also a virtual machine
for executing threaded code and a(n optimizing) compiler for translating >>Perl code into the corresponding threaded code.
Its syntax is also a horrific mess. Larry took the worst parts of C and shell syntax and mashed them together. Its no surprise Perl has been ditched in favour of Python just about everywhere for new scripting projects. And while I hate Pythons meangingful whitespace nonsense, I'd use it in preference
to Perl any day.
In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
syntax and mashed them together. Its no surprise Perl has been ditched in
favour of Python just about everywhere for new scripting projects. And while >> I hate Pythons meangingful whitespace nonsense, I'd use it in preference
to Perl any day.
I think you've identified the one language that Python is better than.
On Mon, 11 Nov 2024 07:31:13 -0000 (UTC)
Sebastian <sebastian@here.com.invalid> boring babbled:
In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
syntax and mashed them together. Its no surprise Perl has been ditched in >>> favour of Python just about everywhere for new scripting projects. And while
I hate Pythons meangingful whitespace nonsense, I'd use it in preference >>> to Perl any day.
I think you've identified the one language that Python is better than.
Yes, Python does have a lot of cons as a language. But its syntax lets newbies get up to speed quickly and there are a lot of libraries. However its dog slow and inefficient and I'm amazed its used as a key language for AI development - not traditionally a newbie coder area - when in that application
speed really is essential. Yes it generally calls libraries written in C/C++ but then why not just write the higher level code in C++ too?
Muttley@DastartdlyHQ.org writes:
On Mon, 11 Nov 2024 07:31:13 -0000 (UTC)
Sebastian <sebastian@here.com.invalid> boring babbled:
In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
syntax and mashed them together. Its no surprise Perl has been ditched in >>>> favour of Python just about everywhere for new scripting projects. And >while
I hate Pythons meangingful whitespace nonsense, I'd use it in preference >>>> to Perl any day.
I think you've identified the one language that Python is better than.
Yes, Python does have a lot of cons as a language. But its syntax lets
newbies get up to speed quickly and there are a lot of libraries. However its
dog slow and inefficient and I'm amazed its used as a key language for AI
development - not traditionally a newbie coder area - when in that >application
speed really is essential. Yes it generally calls libraries written in C/C++ >> but then why not just write the higher level code in C++ too?
You'd have to give up the REPL, for instance.
Yes it generally calls libraries written in C/C++
but then why not just write the higher level code in C++ too?
In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
[Perl’s] syntax is also a horrific mess. Larry took the worst parts of
C and shell syntax and mashed them together.
I think you've identified the one language that Python is better than.
Yes, Python does have a lot of cons as a language. But its syntax lets newbies get up to speed quickly
and there are a lot of libraries. However its
dog slow and inefficient and I'm amazed its used as a key language for AI
development - not traditionally a newbie coder area - when in that application
speed really is essential. Yes it generally calls libraries written in C/C++ but then why not just write the higher level code in C++ too?
On 11.11.2024 11:06, Muttley@DastartdlyHQ.org wrote:
and there are a lot of libraries. However its
dog slow and inefficient and I'm amazed its used as a key language for AI
(and not only there; it's ubiquitous, it seems)
development - not traditionally a newbie coder area - when in that >application
speed really is essential. Yes it generally calls libraries written in C/C++ >> but then why not just write the higher level code in C++ too?
Because of its simpler syntax and less syntactical ballast compared
to C++?
On Mon, 11 Nov 2024 07:31:13 -0000 (UTC), Sebastian wrote:
In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
[Perl’s] syntax is also a horrific mess. Larry took the worst parts of >>> C and shell syntax and mashed them together.
I think you've identified the one language that Python is better than.
In terms of the modern era of high-level programming, Perl was the breakthrough language. Before Perl, BASIC was considered to be an example
of a language with “good” string handling. After Perl, BASIC looked old and clunky indeed.
Perl was the language that made regular expressions sexy. Because it made them easy to use.
On Tue, 12 Nov 2024 10:14:20 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
On 11.11.2024 11:06, Muttley@DastartdlyHQ.org wrote:
[ Q: why some prefer Python over C++ ]
Because of its simpler syntax and less syntactical ballast compared
to C++?
When you're dealing with something as complicated and frankly ineffable as
an AI model I doubt syntactic quirks of the programming language matter that much in comparison.
Surely you'd want the fastest implementation possible and
in this case it would be C++.
On 12.11.2024 10:21, Muttley@DastartdlyHQ.org wrote:
On Tue, 12 Nov 2024 10:14:20 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
On 11.11.2024 11:06, Muttley@DastartdlyHQ.org wrote:
[ Q: why some prefer Python over C++ ]
Because of its simpler syntax and less syntactical ballast compared
to C++?
When you're dealing with something as complicated and frankly ineffable as >> an AI model I doubt syntactic quirks of the programming language matter that >> much in comparison.
Oh, I would look at it differently; in whatever application domain I
program I want a syntactic clear and well defined language.
Surely you'd want the fastest implementation possible and
in this case it would be C++.
Speed is one factor (to me), and expressiveness or "modeling power"
(OO) is another one. I also appreciate consistently defined languages
and quality of error catching and usefulness of diagnostic messages.
(There's some more factors, but...)
In which case I'd go with a statically typed language like C++ every time ahead of a dynamic one like python.
C++ is undeniably powerful, but I think the majority would agree now that
its syntax has become an unwieldy mess.
On 12.11.2024 10:53, Muttley@DastartdlyHQ.org wrote:
In which case I'd go with a statically typed language like C++ every time
ahead of a dynamic one like python.
Definitely!
I'm using untyped languages (like Awk) for scripting, though, but
not for code of considerable scale.
Incidentally, on of my children recently spoke about their setups;
they use Fortran with old libraries (hydrodynamic earth processes),
have the higher level tasks implemented in C++, and they do the
"job control" of the simulation tasks with Python. - A multi-tier architecture. - That sounds not unreasonable to me. (But they had
built their system based on existing software, so it might have
been a different decision if they'd have built it from scratch.)
On 12.11.2024 10:53, Muttley@DastartdlyHQ.org wrote:
C++ is undeniably powerful, but I think the majority would agree now that
its syntax has become an unwieldy mess.
Yes. And recent standards made it yet worse - When I saw it the
first time I couldn't believe that this would be possible. ;-)
On Tue, 12 Nov 2024 10:14:20 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
Because of its simpler syntax and less syntactical ballast compared
to C++?
When you're dealing with something as complicated and frankly ineffable as
an AI model I doubt syntactic quirks of the programming language matter that much in comparison. Surely you'd want the fastest implementation possible and in this case it would be C++.
Perl was the language that made regular expressions sexy. Because it made >> them easy to use.
For those of us who used regexps in Unix from the beginning it's not
that shiny as you want us to buy it; Unix was supporting Chomsky-3
Regular Expressions with a syntax that is still used in contemporary languages. Perl supports some nice syntactic shortcuts, but also
patterns that exceed Chomsky-3's; too bad if one doesn't know these differences and any complexity degradation that may be bought with it.
More interesting to me is the fascinating fact that on some non-Unix platforms it took decades before regexps got (slooooowly) introduced
(even in its simplest form).
On 11.11.2024 22:24, Lawrence D'Oliveiro wrote:
Perl was the language that made regular expressions sexy. Because it
made them easy to use.
... Unix was supporting Chomsky-3
Regular Expressions with a syntax that is still used in contemporary languages.
But the app also had an embedded scripting language, which had access to
the app's environment and users' data.
On Tue, 12 Nov 2024 14:50:26 +0000, Bart wrote:
But the app also had an embedded scripting language, which had access to
the app's environment and users' data.
Did you invent your own scripting language? Nowadays you would use
something ready-made, like Lua, Guile or even Python.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
By Chomsky-3 you mean a grammar of type 3 in the Chomsky hierarchy? And
that would be ``regular'' language, recognizable by a finite-state
automaton? If not, could you elaborate on the terminology?
"Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:
"Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:
Lawrence> Perl was the language that made regular expressions
Lawrence> sexy. Because it made them easy to use.
I'm often reminded of this as I've been coding very little in Perl these days, and a lot more in languages like Dart, where the regex feels like
a clumsy bolt-on rather than a proper first-class citizen.
"Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:
Lawrence> Perl was the language that made regular expressions
Lawrence> sexy. Because it made them easy to use.
I'm often reminded of this as I've been coding very little in Perl these >days, and a lot more in languages like Dart, where the regex feels like
a clumsy bolt-on rather than a proper first-class citizen.
On Tue, 19 Nov 2024 18:43:48 -0800
merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
I'm often reminded of this as I've been coding very little in Perl these
days, and a lot more in languages like Dart, where the regex feels like
a clumsy bolt-on rather than a proper first-class citizen.
Regex itself is clumsy beyond simple search and replace patterns. A lot of stuff I've seen done in regex would have better done procedurally at the expense of slightly more code but a LOT more readability. Also given its effectively a compact language with its own grammar and syntax IMO it should not be the core part of any language as it can lead to a syntatic mess, which
is what often happens with Perl.
On 20.11.2024 09:21, Muttley@DastartdlyHQ.org wrote:
On Tue, 19 Nov 2024 18:43:48 -0800
merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
I'm often reminded of this as I've been coding very little in Perl these >>> days, and a lot more in languages like Dart, where the regex feels like
a clumsy bolt-on rather than a proper first-class citizen.
Regex itself is clumsy beyond simple search and replace patterns. A lot of >> stuff I've seen done in regex would have better done procedurally at the
expense of slightly more code but a LOT more readability. Also given its
effectively a compact language with its own grammar and syntax IMO it should >> not be the core part of any language as it can lead to a syntatic mess, >which
is what often happens with Perl.
I wouldn't look at it that way. I've seen Regexps as part of languages >usually in well defined syntactical contexts. For example, like strings
are enclosed in "...", Regexps could be seen within /.../ delimiters.
GNU Awk (in recent versions) went towards first class "strongly typed" >Regexps which are then denoted by the @/.../ syntax.
I'm curious what you mean by Regexps presented in a "procedural" form.
Can you give some examples?
In practice, given that a Regexp conforms to a FSA, any Regexp can be >precompiled and used multiple times. The thing I had used in Java - it
then operate on that same object. (Since there's still typical Regexp
syntax involved I suppose that is not what you meant by "procedural"?)
On Tue, 19 Nov 2024 18:43:48 -0800
merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
"Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:
Lawrence> Perl was the language that made regular expressions
Lawrence> sexy. Because it made them easy to use.
I'm often reminded of this as I've been coding very little in Perl these
days, and a lot more in languages like Dart, where the regex feels like
a clumsy bolt-on rather than a proper first-class citizen.
Regex itself is clumsy beyond simple search and replace patterns. A lot of stuff I've seen done in regex would have better done procedurally at the expense of slightly more code but a LOT more readability.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
"Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:
Lawrence> Perl was the language that made regular expressions
Lawrence> sexy. Because it made them easy to use.
I'm often reminded of this as I've been coding very little in Perl these >>days, and a lot more in languages like Dart, where the regex feels like
a clumsy bolt-on rather than a proper first-class citizen.
Regex itself is clumsy beyond simple search and replace patterns. A lot of stuff I've seen done in regex would have better done procedurally at the expense of slightly more code but a LOT more readability. Also given its effectively a compact language with its own grammar and syntax IMO it should not be the core part of any language as it can lead to a syntatic mess, which
is what often happens with Perl.
On 11/20/2024 2:21 AM, Muttley@DastartdlyHQ.org wrote:
On Tue, 19 Nov 2024 18:43:48 -0800
merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
"Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:
Lawrence> Perl was the language that made regular expressions
Lawrence> sexy. Because it made them easy to use.
I'm often reminded of this as I've been coding very little in Perl these >>> days, and a lot more in languages like Dart, where the regex feels like
a clumsy bolt-on rather than a proper first-class citizen.
Regex itself is clumsy beyond simple search and replace patterns. A lot of >> stuff I've seen done in regex would have better done procedurally at the
expense of slightly more code but a LOT more readability.
Definitely. The most relevant statement about regexps is this:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Obviously regexps are very useful and commonplace but if you find you
have to use some online site or other tools to help you write/understand
one or just generally need more than a couple of minutes to
write/understand it then it's time to back off and figure out a better
way to write your code for the sake of whoever has to read it 6 months
later (and usually for robustness too as it's hard to be sure all rainy
day cases are handled correctly in a lengthy and/or complicated regexp).
On Wed, 20 Nov 2024 11:51:11 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
On 20.11.2024 09:21, Muttley@DastartdlyHQ.org wrote:
On Tue, 19 Nov 2024 18:43:48 -0800which
merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
I'm often reminded of this as I've been coding very little in Perl these >>>> days, and a lot more in languages like Dart, where the regex feels like >>>> a clumsy bolt-on rather than a proper first-class citizen.
Regex itself is clumsy beyond simple search and replace patterns. A lot of >>> stuff I've seen done in regex would have better done procedurally at the >>> expense of slightly more code but a LOT more readability. Also given its >>> effectively a compact language with its own grammar and syntax IMO it should
not be the core part of any language as it can lead to a syntatic mess,
is what often happens with Perl.
I wouldn't look at it that way. I've seen Regexps as part of languages
usually in well defined syntactical contexts. For example, like strings
are enclosed in "...", Regexps could be seen within /.../ delimiters.
GNU Awk (in recent versions) went towards first class "strongly typed"
Regexps which are then denoted by the @/.../ syntax.
I'm curious what you mean by Regexps presented in a "procedural" form.
Can you give some examples?
Anything that can be done in regex can obviously also be done procedurally. At the point regex expression become unwieldy - usually when substitution variables raise their heads - I prefer procedural code as its also often easier to debug.
In practice, given that a Regexp conforms to a FSA, any Regexp can be
precompiled and used multiple times. The thing I had used in Java - it
Precompiled regex is no more efficient than precompiled anything , its all just assembler at the bottom.
then operate on that same object. (Since there's still typical Regexp
syntax involved I suppose that is not what you meant by "procedural"?)
If you don't know the different between declarative syntax like regex and procedural syntax then there's not much point continuing this discussion.
Definitely. The most relevant statement about regexps is this:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
Obviously regexps are very useful and commonplace but if you find you
have to use some online site or other tools to help you write/understand
one or just generally need more than a couple of minutes to
write/understand it then it's time to back off and figure out a better
way to write your code for the sake of whoever has to read it 6 months
later (and usually for robustness too as it's hard to be sure all rainy
day cases are handled correctly in a lengthy and/or complicated regexp).
On 20.11.2024 12:30, Muttley@DastartdlyHQ.org wrote:
Anything that can be done in regex can obviously also be done procedurally. >> At the point regex expression become unwieldy - usually when substitution
variables raise their heads - I prefer procedural code as its also often
easier to debug.
You haven't even tried to honestly answer my (serious) question.
With your statement above and your hostility below, it rather seems
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
With your statement above and your hostility below, it rather seems
If you think my reply was hostile then I suggest you go find a safe space
and cuddle your teddy bear snowflake.
There's surely no reason why anyone could ever think you were inclined
to substitute verbal aggression for arguments.
Edge cases are regex achilles heal, eg an expression that only accounted
for 1 -> N chars, not 0 -> N, or matches in the middle but not at the
ends.
[...]
With your statement above and your hostility below, it rather seems
If you think my reply was hostile then I suggest you go find a safe space
and cuddle your teddy bear snowflake.
There's surely no reason why anyone could ever think you were inclined
to substitute verbal aggression for arguments.
On Wed, 20 Nov 2024 12:27:54 -0000 (UTC), Muttley wrote:
Edge cases are regex achilles heal, eg an expression that only accounted
for 1 -> N chars, not 0 -> N, or matches in the middle but not at the
ends.
That’s what “^” and “$” are for.
On Wed, 20 Nov 2024 17:54:22 +0000
Rainer Weikusat <rweikusat@talktalk.net> wrote:
There's surely no reason why anyone could ever think you were inclined
to substitute verbal aggression for arguments.
I mean, it's his whole thing - why would he stop now?
"Rainer" == Rainer Weikusat <rweikusat@talktalk.net> writes:
On Wed, 20 Nov 2024 17:54:22 +0000
Rainer Weikusat <rweikusat@talktalk.net> wrote:
There's surely no reason why anyone could ever think you were inclined
to substitute verbal aggression for arguments.
I mean, it's his whole thing - why would he stop now?
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
Assuming that p is a pointer to the current position in a string, e is a >pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky'
C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a
general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
Assuming that p is a pointer to the current position in a string, e is a >>pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky'
C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a >>general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
Whats it like being so wet? Do you get cold easily?
I have zero time
In article <20241120100347.00005f10@gmail.com>,
John Ames <commodorejohn@gmail.com> wrote:
On Wed, 20 Nov 2024 17:54:22 +0000
Rainer Weikusat <rweikusat@talktalk.net> wrote:
There's surely no reason why anyone could ever think you were inclined
to substitute verbal aggression for arguments.
I mean, it's his whole thing - why would he stop now?
This is the guy who didn't know what a compiler is, right?
Rainer> ¹ I used to use a JSON parser written in OO-Perl which made"Rainer" == Rainer Weikusat <rweikusat@talktalk.net> writes:
Rainer> extensive use of regexes for that. I've recently replaced that Rainer> with a C/XS version which - while slightly larger (617 vs 410
Rainer> lines of text) - is over a hundred times faster and conceptually Rainer> simpler at the same time.
I wonder if that was my famous "JSON parser in a single regex" from https://www.perlmonks.org/?node_id=995856, or from one of the two CPAN modules that incorporated it.
On 20.11.2024 09:21, Muttley@DastartdlyHQ.org wrote:
Regex itself is clumsy beyond simple search and replace patterns. A lot of >> stuff I've seen done in regex would have better done procedurally at the
expense of slightly more code but a LOT more readability. Also given its
effectively a compact language with its own grammar and syntax IMO it should >> not be the core part of any language as it can lead to a syntatic mess, which
is what often happens with Perl.
I wouldn't look at it that way. I've seen Regexps as part of languages usually in well defined syntactical contexts. For example, like strings
are enclosed in "...", Regexps could be seen within /.../ delimiters.
GNU Awk (in recent versions) went towards first class "strongly typed" Regexps which are then denoted by the @/.../ syntax.
I'm curious what you mean by Regexps presented in a "procedural" form.
Can you give some examples?
On Wed, 20 Nov 2024 21:43:41 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> boring babbled:
On Wed, 20 Nov 2024 12:27:54 -0000 (UTC), Muttley wrote:
Edge cases are regex achilles heal, eg an expression that only
accounted for 1 -> N chars, not 0 -> N, or matches in the middle but
not at the ends.
That’s what “^” and “$” are for.
Yes, but people forget about those (literal) edge cases.
On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
I'm curious what you mean by Regexps presented in a "procedural" form.
Can you give some examples?
Here is an example: using a regex match to capture a C comment /* ... */
in Lex compared to just recognizing the start sequence /* and handling
the discarding of the comment in the action.
Without non-greedy repetition matching, the regex for a C comment is
quite obtuse. The procedural handling is straightforward: read
characters until you see a * immediately followed by a /.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
Assuming that p is a pointer to the current position in a string, e is a pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky'
C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a
general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
[...]
In the wild, you see regexes being used for all sorts of stupid stuff,
like checking whether numeric input is in a certain range, rather than converting it to a number and doing an arithmetic check.
On Thu, 21 Nov 2024 08:15:41 -0000 (UTC), Muttley wrote:
On Wed, 20 Nov 2024 21:43:41 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> boring babbled:
[...]
That’s what “^” and “$” are for.
Yes, but people forget about those (literal) edge cases.
Those of us who are accustomed to using regexes do not.
Another handy one is “\b” for word boundaries.
On 20.11.2024 18:50, Rainer Weikusat wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
Assuming that p is a pointer to the current position in a string, e is a
pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky'
C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a
general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
Okay, I see where you're coming from (and especially in that simple
case).
Personally (and YMMV), even here in this simple case I think that
using pointers is not better but worse - and anyway isn't [in this
form] available in most languages;
in other cases (and languages)
such constructs get yet more clumsy, and for my not very complex
example - /[0-9]+(ABC)?x*foo/ - even a "catastrophe" concerning
readability, error-proneness, and maintainability.
If that is what the other poster meant I'm fine with your answer;
there's no need to even consider abandoning regular expressions
in favor of explicitly codified parsing.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface. >>>> YMMV.
Assuming that p is a pointer to the current position in a string, e is a >>>pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky' >>>C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a >>>general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
The regex won't match any digits if there aren't any. In this case, the
match will fail. I didn't include the code for handling that because it >seemed pretty pointless for the example.
Rainer Weikusat <rweikusat@talktalk.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for something >>>>> like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface. >>>>> YMMV.
Assuming that p is a pointer to the current position in a string, e is a >>>>pointer to the end of it (ie, point just past the last byte) and - >>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a >>>>general-purpose automaton programmed to recognize the same pattern >>>>(which might not matter most of the time, but sometimes, it does).
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
The regex won't match any digits if there aren't any. In this case, the >>match will fail. I didn't include the code for handling that because it >>seemed pretty pointless for the example.
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.
But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.
By the way, something that _would_ match `^[0-9]+$` might be:
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
[snip]
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
The regex won't match any digits if there aren't any. In this case, the >>>match will fail. I didn't include the code for handling that because it >>>seemed pretty pointless for the example.
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.
That's the core part of matching someting equivalent to the regex [0-9]+
and the only part of it is which is at least remotely interesting.
But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.
Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?
[...]
By the way, something that _would_ match `^[0-9]+$` might be:
[too much code]
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.
Rainer Weikusat <rweikusat@talktalk.net> writes:
[...]
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
This needs to be
while (c = *p, c && c - '0' > 9) ++p
In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Rainer Weikusat <rweikusat@talktalk.net> writes:
[...]
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
This needs to be
while (c = *p, c && c - '0' > 9) ++p
No, that's still wrong. Try actually running it.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Rainer Weikusat <rweikusat@talktalk.net> writes:
[...]
Something which would match [0-9]+ in its first argument (if any) would >>>> be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
This needs to be
while (c = *p, c && c - '0' > 9) ++p
No, that's still wrong. Try actually running it.
If you know something that's wrong with that, why not write it instead
of utilizing the claim for pointless (and wrong) snide remarks?
In article <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote: >>cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Rainer Weikusat <rweikusat@talktalk.net> writes:
[...]
Something which would match [0-9]+ in its first argument (if any) would >>>>> be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
This needs to be
while (c = *p, c && c - '0' > 9) ++p
No, that's still wrong. Try actually running it.
If you know something that's wrong with that, why not write it instead
of utilizing the claim for pointless (and wrong) snide remarks?
I did, at length, in my other post.
Rainer Weikusat <rweikusat@talktalk.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
[snip]
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
The regex won't match any digits if there aren't any. In this case, the >>>>match will fail. I didn't include the code for handling that because it >>>>seemed pretty pointless for the example.
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.
That's the core part of matching someting equivalent to the regex [0-9]+ >>and the only part of it is which is at least remotely interesting.
Not really, no. The interesting thing in this case appears to
be knowing whether or not the match succeeded, but you omited
that part.
But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.
Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?
Because absent any surrounding context, there's no indication
that the source is even saved.
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.
This is wrong in many ways. Did you actually test that program?
First of all, why `"string.h"` and not `<string.h>`? Ok, that's
not technically an error, but it's certainly unconventional, and
raises questions that are ultimately a distraction.
Second, suppose that `argc==0` (yes, this can happen under
POSIX).
Third, the loop: why `> 10`? Don't you mean `< 10`? You are
trying to match digits, not non-digits.
Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
at the end, but `!c` there means you've reached the end of the
string; which should be success.
Something which would match [0-9]+ in its first argument (if any) would >>>be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to >>>the problem of recognizing a digit.
This is wrong in many ways. Did you actually test that program?
First of all, why `"string.h"` and not `<string.h>`? Ok, that's
not technically an error, but it's certainly unconventional, and
raises questions that are ultimately a distraction.
Such as your paragraph above.
Second, suppose that `argc==0` (yes, this can happen under
POSIX).
It can happen in case of some piece of functionally hostile software >intentionally creating such a situation. Tangential, irrelevant
point. If you break it, you get to keep the parts.
Third, the loop: why `> 10`? Don't you mean `< 10`? You are
trying to match digits, not non-digits.
Mistake I made. The opposite of < 10 is > 9.
Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
at the end, but `!c` there means you've reached the end of the
string; which should be success.
Mistake you made: [0-9]+ matches if there's at least one digit in the
string. That's why the loop terminates once one was found. In this case,
c cannot be 0.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for something >>>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>>> like \d+ are the better direction to go if targeting a good interface. >>>>>> YMMV.
Assuming that p is a pointer to the current position in a string, e is a >>>>>pointer to the end of it (ie, point just past the last byte) and - >>>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>>C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a >>>>>general-purpose automaton programmed to recognize the same pattern >>>>>(which might not matter most of the time, but sometimes, it does).
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
The regex won't match any digits if there aren't any. In this case, the >>>match will fail. I didn't include the code for handling that because it >>>seemed pretty pointless for the example.
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.
That's the core part of matching someting equivalent to the regex [0-9]+
and the only part of it is which is at least remotely interesting.
But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.
Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?
[...]
By the way, something that _would_ match `^[0-9]+$` might be:
[too much code]
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.
On Thu, 21 Nov 2024 19:12:03 -0000 (UTC)
Kaz Kylheku <643-408-1753@kylheku.com> boring babbled:
On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
I'm curious what you mean by Regexps presented in a "procedural" form.
Can you give some examples?
Here is an example: using a regex match to capture a C comment /* ... */
in Lex compared to just recognizing the start sequence /* and handling
the discarding of the comment in the action.
Without non-greedy repetition matching, the regex for a C comment is
quite obtuse. The procedural handling is straightforward: read
characters until you see a * immediately followed by a /.
Its not that simple I'm afraid since comments can be commented out.
eg:
// int i; /*
int j;
/*
int k;
*/
++j;
A C99 and C++ compiler would see "int j" and compile it, a regex would
simply remove everything from the first /* to */.
Also the same probably applies to #ifdef's.
On 21.11.2024 20:12, Kaz Kylheku wrote:
[...]
In the wild, you see regexes being used for all sorts of stupid stuff,
No one can prevent folks using features for stupid things. Yes.
Rainer Weikusat <rweikusat@talktalk.net> writes: >>cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for something >>>>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>>>> like \d+ are the better direction to go if targeting a good interface. >>>>>>> YMMV.
Assuming that p is a pointer to the current position in a string, e is a >>>>>>pointer to the end of it (ie, point just past the last byte) and - >>>>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>>>C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a >>>>>>general-purpose automaton programmed to recognize the same pattern >>>>>>(which might not matter most of the time, but sometimes, it does).
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
The regex won't match any digits if there aren't any. In this case, the >>>>match will fail. I didn't include the code for handling that because it >>>>seemed pretty pointless for the example.
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.
That's the core part of matching someting equivalent to the regex [0-9]+ >>and the only part of it is which is at least remotely interesting.
But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.
Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?
[...]
By the way, something that _would_ match `^[0-9]+$` might be:
[too much code]
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.
Personally, I'd use:
$ cat /tmp/a.c--- Synchronet 3.20a-Linux NewsLink 1.114
#include <stdint.h>
#include <string.h>
int
main(int argc, const char **argv)
{
char *cp;
uint64_t value;
if (argc < 2) return 1;
value = strtoull(argv[1], &cp, 10);
if ((cp == argv[1])
|| (*cp != '\0')) {
return 1;
}
return 0;
}
$ cc -o /tmp/a /tmp/a.c
$ /tmp/a 13254
$ echo $?
0
$ /tmp/a 23v23
$ echo $?
1
scott@slp53.sl.home (Scott Lurndal) writes:
Rainer Weikusat <rweikusat@talktalk.net> writes: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:It's also not exactly right. `[0-9]+` would match one or more
[...]
Personally I think that writing bulky procedural stuff for something >>>>>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>>>>> like \d+ are the better direction to go if targeting a good interface. >>>>>>>> YMMV.
Assuming that p is a pointer to the current position in a string, e is a >>>>>>>pointer to the end of it (ie, point just past the last byte) and - >>>>>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>>>>C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a >>>>>>>general-purpose automaton programmed to recognize the same pattern >>>>>>>(which might not matter most of the time, but sometimes, it does). >>>>>>
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
The regex won't match any digits if there aren't any. In this case, the >>>>>match will fail. I didn't include the code for handling that because it >>>>>seemed pretty pointless for the example.
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.
That's the core part of matching someting equivalent to the regex [0-9]+ >>>and the only part of it is which is at least remotely interesting.
But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.
Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?
[...]
By the way, something that _would_ match `^[0-9]+$` might be:
[too much code]
Something which would match [0-9]+ in its first argument (if any) would >>>be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to >>>the problem of recognizing a digit.
Personally, I'd use:
Albeit this is limited to strings of digits that sum to less than >ULONG_MAX...
$ cat /tmp/a.c
#include <stdint.h>
#include <string.h>
int
main(int argc, const char **argv)
{
char *cp;
uint64_t value;
if (argc < 2) return 1;
value = strtoull(argv[1], &cp, 10);
if ((cp == argv[1])
|| (*cp != '\0')) {
return 1;
}
return 0;
}
$ cc -o /tmp/a /tmp/a.c
$ /tmp/a 13254
$ echo $?
0
$ /tmp/a 23v23
$ echo $?
1
In any event, this seems simpler than what you posted:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "Usage: matchd <str>\n");
return EXIT_FAILURE;
}
for (const char *p = argv[1]; *p != '\0'; p++)
if ('0' <= *p && *p <= '9')
return EXIT_SUCCESS;
return EXIT_FAILURE;
}
Rainer Weikusat <rweikusat@talktalk.net> writes:
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.
Personally, I'd use:
$ cat /tmp/a.c
#include <stdint.h>
#include <string.h>
int
main(int argc, const char **argv)
{
char *cp;
uint64_t value;
if (argc < 2) return 1;
value = strtoull(argv[1], &cp, 10);
if ((cp == argv[1])
|| (*cp != '\0')) {
return 1;
}
return 0;
}
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
In any event, this seems simpler than what you posted:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "Usage: matchd <str>\n");
return EXIT_FAILURE;
}
for (const char *p = argv[1]; *p != '\0'; p++)
if ('0' <= *p && *p <= '9')
return EXIT_SUCCESS;
return EXIT_FAILURE;
}
It's not only 4 lines longer but in just about every individual aspect >syntactically more complicated and more messy and functionally more
clumsy.
This is particularly noticable in the loop
for (const char *p = argv[1]; *p != '\0'; p++)
if ('0' <= *p && *p <= '9')
return EXIT_SUCCESS;
the loop header containing a spuriously qualified variable declaration,
the loop body and half of the termination condition.
The other half then
follows as special-case in the otherwise useless loop body.
It looks like a copy of my code which each individual bit redesigned
under the guiding principle of "Can we make this more complicated?", eg,
char **argv
declares an array of pointers
(as each pointer in C points to an array)
and
char *argv[]
accomplishes exactly the same but uses both more characters and more >different kinds of characters.
scott@slp53.sl.home (Scott Lurndal) writes:
Rainer Weikusat <rweikusat@talktalk.net> writes:
[...]
Something which would match [0-9]+ in its first argument (if any) would >>>be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to >>>the problem of recognizing a digit.
Personally, I'd use:
$ cat /tmp/a.c
#include <stdint.h>
#include <string.h>
int
main(int argc, const char **argv)
{
char *cp;
uint64_t value;
if (argc < 2) return 1;
value = strtoull(argv[1], &cp, 10);
if ((cp == argv[1])
|| (*cp != '\0')) {
return 1;
}
return 0;
}
This will accept a string of digits whose numerical value is <=
ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
content limits.
return !strstr(argv[1], "0123456789");
would be a better approximation,
just a much more complicated algorithm
than necessary. Even in strictly conforming ISO-C "digitness" of a
character can be determined by a simple calculation instead of some kind
of search loop.
On 2024-11-22, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 21.11.2024 20:12, Kaz Kylheku wrote:
[...]
In the wild, you see regexes being used for all sorts of stupid stuff,
No one can prevent folks using features for stupid things. Yes.
But the thing is that "modern" regular expressions (Perl regex and its progeny) have features that are designed to exclusively cater to these
folks.
Rainer Weikusat <rweikusat@talktalk.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
In any event, this seems simpler than what you posted:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "Usage: matchd <str>\n");
return EXIT_FAILURE;
}
for (const char *p = argv[1]; *p != '\0'; p++)
if ('0' <= *p && *p <= '9')
return EXIT_SUCCESS;
return EXIT_FAILURE;
}
It's not only 4 lines longer but in just about every individual aspect >>syntactically more complicated and more messy and functionally more
clumsy.
That's a lot of opinion, and not particularly well-founded
opinion at that, given that your code was incorrect to begin
with.
In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
Rainer Weikusat <rweikusat@talktalk.net> writes:
[...]
Something which would match [0-9]+ in its first argument (if any) would >>>>be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to >>>>the problem of recognizing a digit.
Personally, I'd use:
$ cat /tmp/a.c
#include <stdint.h>
#include <string.h>
int
main(int argc, const char **argv)
{
char *cp;
uint64_t value;
if (argc < 2) return 1;
value = strtoull(argv[1], &cp, 10);
if ((cp == argv[1])
|| (*cp != '\0')) {
return 1;
}
return 0;
}
This will accept a string of digits whose numerical value is <=
ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
content limits.
He acknowledged this already.
return !strstr(argv[1], "0123456789");
would be a better approximation,
No it wouldn't. That's not even close. `strstr` looks for an
instance of its second argument in its first, not an instance of
any character in it's second argument in its first. Perhaps you
meant something with `strspn` or similar. E.g.,
const char *p = argv[1] + strspn(argv[1], "0123456789");
return *p != '\0';
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 20.11.2024 18:50, Rainer Weikusat wrote:
[...]
while (p < e && *p - '0' < 10) ++p;
That's not too bad. And it's really a hell lot faster than a
general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
Okay, I see where you're coming from (and especially in that simple
case).
Personally (and YMMV), even here in this simple case I think that
using pointers is not better but worse - and anyway isn't [in this
form] available in most languages;
That's a question of using the proper tool for the job. In C, that's
pointer and pointer arithmetic because it's the simplest way to express something like this.
in other cases (and languages)
such constructs get yet more clumsy, and for my not very complex
example - /[0-9]+(ABC)?x*foo/ - even a "catastrophe" concerning
readability, error-proneness, and maintainability.
Procedural code for matching strings constructed in this way is
certainly much simpler¹ than the equally procedural code for a
programmable automaton capable of interpreting regexes.
Your statement
is basically "If we assume that the code interpreting regexes doesn't
exist, regexes need much less code than something equivalent which does exist." Without this assumption, the picture becomes a different one altogether.
[...]
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
In any event, this seems simpler than what you posted:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "Usage: matchd <str>\n");
return EXIT_FAILURE;
}
for (const char *p = argv[1]; *p != '\0'; p++)
if ('0' <= *p && *p <= '9')
return EXIT_SUCCESS;
return EXIT_FAILURE;
}
It's not only 4 lines longer but in just about every individual aspect >>>syntactically more complicated and more messy and functionally more >>>clumsy.
That's a lot of opinion, and not particularly well-founded
opinion at that, given that your code was incorrect to begin
with.
That's not at all an opinion but an observation. My opinion on this is
that this is either a poor man's attempt at winning an obfuscation
context or - simpler - exemplary bad code.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
Rainer Weikusat <rweikusat@talktalk.net> writes:
[...]
Something which would match [0-9]+ in its first argument (if any) would >>>>>be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to >>>>>the problem of recognizing a digit.
Personally, I'd use:
$ cat /tmp/a.c
#include <stdint.h>
#include <string.h>
int
main(int argc, const char **argv)
{
char *cp;
uint64_t value;
if (argc < 2) return 1;
value = strtoull(argv[1], &cp, 10);
if ((cp == argv[1])
|| (*cp != '\0')) {
return 1;
}
return 0;
}
This will accept a string of digits whose numerical value is <= >>>ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
content limits.
He acknowledged this already.
return !strstr(argv[1], "0123456789");
would be a better approximation,
No it wouldn't. That's not even close. `strstr` looks for an
instance of its second argument in its first, not an instance of
any character in it's second argument in its first. Perhaps you
meant something with `strspn` or similar. E.g.,
const char *p = argv[1] + strspn(argv[1], "0123456789");
return *p != '\0';
My bad.
On 21.11.2024 23:05, Lawrence D'Oliveiro wrote:
Another handy one is “\b” for word boundaries.
I prefer \< and \> (that are quite commonly used) for such structural
things ...
On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
Its not that simple I'm afraid since comments can be commented out.
Umm, no.
eg:
// int i; /*
This /* sequence is inside a // comment, and so the machinery that
recognizes /* as the start of a comment would never see it.
A C99 and C++ compiler would see "int j" and compile it, a regex would
simply remove everything from the first /* to */.
No, it won't, because that's not how regexes are used in a lexical
Also the same probably applies to #ifdef's.
Lexically analyzing C requires implementing the translation phases
as described in the standard. There are preprocessor phases which
delimit the input into preprocessor tokens (pp-tokens). Comments
are stripped in preprocessing. But logical lines (backslash
continuations) are recognized below comments; i.e. this is one
comment:
On 20.11.2024 12:46, Ed Morton wrote:
Definitely. The most relevant statement about regexps is this:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
(Worth a scribbling on a WC wall.)
Obviously regexps are very useful and commonplace but if you find you
have to use some online site or other tools to help you write/understand
one or just generally need more than a couple of minutes to
write/understand it then it's time to back off and figure out a better
way to write your code for the sake of whoever has to read it 6 months
later (and usually for robustness too as it's hard to be sure all rainy
day cases are handled correctly in a lengthy and/or complicated regexp).
Regexps are nothing for newbies.
The inherent fine thing with Regexps is that you can incrementally
compose them[*].[**]
It seems you haven't found a sensible way to work with them?
(And I'm really astonished about that since I know you worked with
Regexps for years if not decades.)
In those cases where Regexps *are* the tool for a specific task -
I don't expect you to use them where they are inappropriate?! -
what would be the better solution[***] then?
Janis
[*] Like the corresponding FSMs.
[**] And you can also decompose them if they are merged in a huge
expression, too large for you to grasp it. (BTW, I'm doing such decompositions also with other expressions in program code that
are too bulky.)
[***] Can you answer the question that another poster failed to do?
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for
something like [0-9]+ can only be much worse, and that further
abbreviations like \d+ are the better direction to go if targeting
a good interface. YMMV.
Assuming that p is a pointer to the current position in a string, e
is a pointer to the end of it (ie, point just past the last byte)
and - that's important - both are pointers to unsigned quantities,
the 'bulky' C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
Here is an example: using a regex match to capture a C comment /* ... */
in Lex compared to just recognizing the start sequence /* and handling
the discarding of the comment in the action.
Without non-greedy repetition matching, the regex for a C comment is
quite obtuse. The procedural handling is straightforward: read
characters until you see a * immediately followed by a /.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Tue, 27 Aug 2024 03:15:16 -0000 (UTC), Sebastian wrote:
In comp.unix.programmer Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
(And I have no idea about this “Black” thing. I just do my thing.)
Black is a [bla bla bla]
*Yawn*
The guy was kindly and politely sharing information with you.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Tue, 27 Aug 2024 03:15:16 -0000 (UTC), Sebastian wrote:
In comp.unix.programmer Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
(And I have no idea about this “Black” thing. I just do my thing.)
Black is a [bla bla bla]
*Yawn*
The guy was kindly and politely sharing information with you.
On 2024-08-06, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
Equivalent Lisp, for comparison:
(setf a (cond (b (if c d e))
(f (if g h i))
(t j)))
You can’t avoid the parentheses, but this, too, can be improved:
(setf a
(cond
(b
(if c d e)
)
(f
(if g h i)
)
(t
j
)
) ; cond
)
Nobody is ever going to follow your idio(syncra)tic coding preferences
for Lisp, that wouldn't pass code review in any Lisp shop, and result in patches being rejected in a FOSS setting.
a = b ? (c ? d : e) :
f ? (g ? h : i) :
j;
On 2024-08-06, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
(setf a
(cond
(b
(if c d e)
) (f
(if g h i)
) (t
j
)
) ; cond
)
If "; cond" went inside the cond form then I'd accept it in general
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,073 |
Nodes: | 10 (0 / 10) |
Uptime: | 212:24:47 |
Calls: | 13,782 |
Calls today: | 1 |
Files: | 186,987 |
D/L today: |
4,567 files (1,248M bytes) |
Messages: | 2,434,557 |