• Re: Command Languages Versus Programming Languages

    From Bart@bc@freeuk.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 17:20:40 2024
    From Newsgroup: comp.lang.misc

    On 13/10/2024 16:52, Dan Cross wrote:
    In article <QnROO.226037$EEm7.111715@fx16.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <vefvo0$k1mm$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:

    Really? So java bytecode will run direct on x86 or ARM will it? Please give
    some links to this astounding discovery you've made.

    Um, ok. https://en.wikipedia.org/wiki/Jazelle

    There was also a company a couple of decades ago that
    built an entire processor designed to execute bytecode
    directly - with a coprocessor to handle I/O.

    IIRC, it was Azul. There were a number of others, including
    Sun.

    None of them panned out - JIT's ended up winning that battle.

    Even ARM no longer includes Jazelle extensions in any of their
    mainstream processors.

    Sure. But the fact that any of these were going concerns is an
    existence proof that one _can_ take bytecodes targetted toward a
    "virtual" machine and execute it on silicon,
    making the
    distinction a lot more fluid than might be naively assumed, in
    turn exposing the silliness of this argument that centers around
    this weirdly overly-rigid definition of what a "compiler" is.

    I've implemented numerous compilers and interpreters over the last few
    decades (and have dabbled in emulators).

    To me the distinctions are clear enough because I have to work at the
    sharp end!

    I'm not sure why people want to try and be clever by blurring the roles
    of compiler and interpreter; that's not helpful at all.

    Sure, people can write emulators for machine code, which are a kind of interpreter, or they can implement bytecode in hardware; so what?

    That doesn't really affect what I do. Writing compiler backends for
    actual CPUs is hard work. Generating bytecode is a lot simpler.
    (Especially in my case as I've devised myself, another distinction.
    Compilers usually target someone else's instruction set.)

    If you want one more distinction, it is this: with my compiler, the
    resultant binary is executed by a separate agency: the CPU. Or maybe the
    OS loader will run it through an emulator.

    With my interpreter, then *I* have to write the dispatch routines and
    write code to implement all the instructions.

    (My compilers generate an intermediate language, a kind of VM, which is
    then processed further into native code.

    But I have also tried interpreting that VM; it just runs 20 times slower
    than native code. That's what interpreting usually means: slow programs.)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.programmer,comp.lang.misc on Sun Oct 13 18:28:32 2024
    From Newsgroup: comp.lang.misc

    [ X-post list reduced ]

    On 13.10.2024 18:02, Muttley@DastartdlyHQ.org wrote:
    On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    [...]

    No. It translates one computer _language_ to another computer
    _language_. In the usual case, that's from a textual source

    Machine code isn't a language. Fallen at the first hurdle with that definition.

    Careful (myself included); watch out for the glazed frost!

    You know there's formal definitions for what constitutes languages.

    At first glance I don't see why machine code wouldn't quality as a
    language (either as some specific "mnemonic" representation, or as
    a sequence of integral numbers or other "code" representations).

    What's the problem, in your opinion, with considering machine code
    as a language?

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 16:31:58 2024
    From Newsgroup: comp.lang.misc

    On 2024-10-11, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
    Irrelevant. Lot of interpreters do partial compilation and the JVM does it
    on the fly. A proper compiler writes a standalone binary file to disk.

    You might want to check those goalposts again. You can easily make a
    "proper compiler" which just writes a canned interpreter executable to
    disk, appending to it the program source code.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 20:06:12 2024
    From Newsgroup: comp.lang.misc

    On 13/10/2024 17:31, Kaz Kylheku wrote:
    On 2024-10-11, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
    Irrelevant. Lot of interpreters do partial compilation and the JVM does it >> on the fly. A proper compiler writes a standalone binary file to disk.

    You might want to check those goalposts again. You can easily make a
    "proper compiler" which just writes a canned interpreter executable to
    disk, appending to it the program source code.


    So, an interpreter. The rest is just details of its deployment. In your example, the program being run is just some embedded data.

    Maybe the real question is what is 'hardware', and what is 'software'.
    But the answer won't make everyone happy because because hardware can be emulated in software.

    (Implementing software in hardware, specifically the bit of software
    that interprets a VM, is less common, and generally harder.)

    I prefer that there is a clear distinction between compiler and
    interpreter, because you immediately know what's what. (Here I'm
    excluding complex JIT products that mix up both.)


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 20:15:45 2024
    From Newsgroup: comp.lang.misc

    In article <vegqu5$o3ve$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <vegmul$ne3v$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>>So what is standard terminology then?

    I've already explained this to you.

    No you haven't. You explanation seems to be "anything that converts from one >language to another".

    What happens inside the CPU is irrelevant. Its a black box as far as the >>>rest of the machine is concerned. As I said in another post, it could be >>>pixies with abacuses, doesn't matter.

    So why do you think it's so important that the definition of a

    Who said its important? Its just what most people think of as compilers.

    CPU"? If, as you admit, what the CPU does is highly variable,
    then why do you cling so hard to this meaningless distinction?

    You're the one making a big fuss about it with pages of waffle to back up >your claim.

    [lots of waffle snipped]

    In other words, you discard anything that doesn't fit with your >>preconceptions. Got it.

    No, I just have better things to do on a sunday than read all that. Keep
    it to the point.

    So its incomplete and has to revert to software for some opcodes. Great. >>>FWIW Sun also had a java processor but you still can't run bytecode on >>>normal hardware without a JVM.

    Cool. So if I run a program targetting a newer version of an
    ISA is run on an older machine, and that machine lacks a newer
    instruction present in the program, and the CPU generates an
    illegal instruction trap at runtime that the OS catches and
    emulates on the program's behalf, the program was not compiled?

    And again, what about an emulator for a CPU running on a
    different CPU? I can boot 7th Edition Unix on a PDP-11
    emulator on my workstation; does that mean that the 7the
    edition C compiler wasn't a compiler?

    Its all shades of grey. You seem to be getting very worked up about it.
    As I said, most people consider a compiler as something that translates source
    code to machine code and writes it to a file.

    Why, whats the difference? Your definition seems to be any program that can >>>translate from one language to another.

    If you can't see that yourself, then you're either ignorant or
    obstinant. Take your pick.

    So you can't argue the failure of your logic then. Noted.

    Yes, they're entirely analoguous.

    https://docs.oracle.com/cd/E11882_01/appdev.112/e10825/pc_02prc.htm

    Nah, not really.

    Oh nice counter arguement, you really sold your POV there.

    Who cares about the current state? Has nothing to do with this discussion. >>
    In other words, "I don't have an argument, so I'll just lamely
    try to define things until I'm right."

    Im just defining things the way most people see it, not some ivory tower >academics. Anyway, lifes too short for the rest.

    [tl;dr]

    that a compiler is pretty much any program which translates from one thing to
    another.

    No. It translates one computer _language_ to another computer
    _language_. In the usual case, that's from a textual source

    Machine code isn't a language. Fallen at the first hurdle with that >definition.



    Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
    Subject: Re: Command Languages Versus Programming Languages
    Summary:
    Expires:
    References: <uu54la$3su5b$6@dont-email.me> <vegmul$ne3v$1@dont-email.me> <vegp1r$oqh$1@reader1.panix.com> <vegqu5$o3ve$1@dont-email.me>
    Sender:
    Followup-To:
    Distribution:
    Organization:
    Keywords:
    Cc:

    In article <vegqu5$o3ve$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <vegmul$ne3v$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>>So what is standard terminology then?

    I've already explained this to you.

    No you haven't. You explanation seems to be "anything that converts from one >language to another".

    The context of this specific quote, which you snipped, was your
    insistence on the meaning of the term, "standalone binary."
    There are a number of common terms for what you are describing,
    which is the general term for the executable output artifact
    from a software build, none of which is "standalone binary".

    Common terms are "executable" or "executable file" (that's what
    the ELF standard calls it, for instance), but also "binary",
    "image", etc.

    What happens inside the CPU is irrelevant. Its a black box as far as the >>>rest of the machine is concerned. As I said in another post, it could be >>>pixies with abacuses, doesn't matter.

    So why do you think it's so important that the definition of a

    Who said its important? Its just what most people think of as compilers.

    Well, you seem to think it's rather important.

    CPU"? If, as you admit, what the CPU does is highly variable,
    then why do you cling so hard to this meaningless distinction?

    You're the one making a big fuss about it with pages of waffle to back up >your claim.

    I just don't like misinformation floating around unchallenged.

    You have cited nothing to back up your claims.

    So its incomplete and has to revert to software for some opcodes. Great. >>>FWIW Sun also had a java processor but you still can't run bytecode on >>>normal hardware without a JVM.

    Cool. So if I run a program targetting a newer version of an
    ISA is run on an older machine, and that machine lacks a newer
    instruction present in the program, and the CPU generates an
    illegal instruction trap at runtime that the OS catches and
    emulates on the program's behalf, the program was not compiled?

    And again, what about an emulator for a CPU running on a
    different CPU? I can boot 7th Edition Unix on a PDP-11
    emulator on my workstation; does that mean that the 7the
    edition C compiler wasn't a compiler?

    Its all shades of grey. You seem to be getting very worked up about it.

    Nah, I don't really care, aside from not wanting misinformation
    to stand unchallenged.

    As I said, most people consider a compiler as something that translates source
    code to machine code and writes it to a file.

    Sure, if you're talking informally and you mention "a compiler"
    most people will know more or less what you're talking about.
    But back in <vebffc$3n6jv$1@dont-email.me> you wrote,

    |Does it produce a standalone binary as output? No, so its an
    |intepreter not a compiler.

    I said that was a bad distinction, to which you replied in <vebi0j$3nhvq$1@dont-email.me>:

    |A proper compiler writes a standalone binary file to disk.

    Except that, well, it doesn't. Even the "proper compilers" that
    you claim familiarity with basically don't do that; as I pointed
    out to you, they generate object files and a driver invokes a
    linker.

    For that matter, the compiler itself may not even generate
    object code, but rather, may generate textual assembly and let a
    separate assembler pass turn _that_ into object code.

    So yeah. What you've defined to be a "proper compiler" isn't
    really what you seem to think that it is.

    [snip]
    Who cares about the current state? Has nothing to do with this discussion. >>
    In other words, "I don't have an argument, so I'll just lamely
    try to define things until I'm right."

    Im just defining things the way most people see it, not some ivory tower >academics. Anyway, lifes too short for the rest.

    The people who create the field are the ones who get to make
    the defintiions, not you.

    Machine code isn't a language. Fallen at the first hurdle with that >definition.

    Oh really? Is that why they call it "machine language"? It's
    even in the dictionary with "machine code" as a synonymn: https://www.merriam-webster.com/dictionary/machine%20language

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 21:25:51 2024
    From Newsgroup: comp.lang.misc

    Christian Weisgerber <naddy@mips.inka.de> writes:
    On 2024-10-12, Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Indeed. As far as I know the term, an interpreter is something which
    reads text from a file, parses it an checks it for syntax errors
    and then executes the code as soon as enough of it has been gathered to
    allow for execution of something, ie, a complete statement. This read,
    check and parse, execute cycle is repeated until the program
    terminates.

    I don't really want to participate in this discussion, but what
    you're saying there is that all those 1980s home computer BASIC
    interpreters, which read and tokenized a program before execution,
    were actually compilers.

    If they contained something which compiled all of the source code prior
    to execution in order to transform it some actually executable
    intermediate representation whose execution didn't require future access
    to the source code and thus, also didn't include checking the source
    code for syntactical correctness, this something can be called a
    compiler and the execution engine some sort of virtual machine which
    could principally execute programs compiled from source code in any
    programming language.

    But judging from Wikipedia, Murkysoft Basic stored programs as linked
    list of preprocessed lines and interpreted these, ie, doing string
    lookups of keywords from the source code at run time in order to
    determine what code to execute. Insofar I vaguely remember this from
    Apple //c BASIC (has been a while) syntax errors would also be found at runtime, ie, once execution reached the line with the error. This would
    make it an interpreter.

    In constrast to this, this somewhat amusing small Perl program:

    while (<>) {
    while (length) {
    s/^(\w+)// and print(scalar reverse($1));
    s/^(\W+)// and print($1);
    }
    }

    [reads lines from stdin and prints them with each word reversed]

    gets translated into an op tree whose textual representation (perl -MO=Concise,-basic) looks like
    this:

    y <@> leave[1 ref] vKP/REFC ->(end)
    1 <0> enter v ->2
    2 <;> nextstate(main 1 a.pl:1) v:{ ->3
    x <2> leaveloop vKP/2 ->y
    3 <{> enterloop(next->r last->x redo->4) v ->s
    - <1> null vK/1 ->x
    w <|> and(other->4) vK/1 ->x
    v <1> defined sK/1 ->w
    - <1> null sK/2 ->v
    - <1> ex-rv2sv sKRM*/1 ->t
    s <#> gvsv[*_] s ->t
    u <1> readline[t2] sKS/1 ->v
    t <#> gv[*ARGV] s ->u
    - <@> lineseq vKP ->-
    4 <;> nextstate(main 3 a.pl:2) v:{ ->5
    q <2> leaveloop vKP/2 ->r
    5 <{> enterloop(next->m last->q redo->6) v ->n
    - <1> null vK/1 ->q
    p <|> and(other->6) vK/1 ->q
    o <1> length[t4] sK/BOOL,1 ->p
    - <1> ex-rv2sv sK/1 ->o
    n <#> gvsv[*_] s ->o
    - <@> lineseq vKP ->-
    6 <;> nextstate(main 5 a.pl:3) v:{ ->7
    - <1> null vK/1 ->f
    9 <|> and(other->a) vK/1 ->f
    8 </> subst(/"^(\\w+)"/) sK/BOOL ->9
    7 <$> const[PV ""] s ->8
    e <@> print vK ->f
    a <0> pushmark s ->b
    - <1> scalar sK/1 ->e
    d <@> reverse[t6] sK/1 ->e
    b <0> pushmark s ->c
    - <1> ex-rv2sv sK/1 ->d
    c <#> gvsv[*1] s ->d
    f <;> nextstate(main 5 a.pl:4) v:{ ->g
    - <1> null vK/1 ->m
    i <|> and(other->j) vK/1 ->m
    h </> subst(/"^(\\W+)"/) sK/BOOL ->i
    g <$> const[PV ""] s ->h
    l <@> print vK ->m
    j <0> pushmark s ->k
    - <1> ex-rv2sv sK/1 ->l
    k <#> gvsv[*1] s ->l
    m <0> unstack v ->n
    r <0> unstack v ->s

    Each line represents a node on this tree and the names refer to builtin
    'ops'. In the actual tree, they're pointers to C functions and execution happens as preorder traversal of this tree and invoking the op functions
    from the leaves to root to produce the arguments necessary for invoking
    op functions residing at a higher level in this tree.

    Modules for writing this internal representation to a file and loading
    it back from there and even for translating it into C exist. They're
    just not part of the core distribution anymore.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 20:29:46 2024
    From Newsgroup: comp.lang.misc

    In article <vegs0o$nh5t$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 13/10/2024 16:52, Dan Cross wrote:
    In article <QnROO.226037$EEm7.111715@fx16.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <vefvo0$k1mm$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:

    Really? So java bytecode will run direct on x86 or ARM will it? Please give
    some links to this astounding discovery you've made.

    Um, ok. https://en.wikipedia.org/wiki/Jazelle

    There was also a company a couple of decades ago that
    built an entire processor designed to execute bytecode
    directly - with a coprocessor to handle I/O.

    IIRC, it was Azul. There were a number of others, including
    Sun.

    None of them panned out - JIT's ended up winning that battle.

    Even ARM no longer includes Jazelle extensions in any of their
    mainstream processors.

    Sure. But the fact that any of these were going concerns is an
    existence proof that one _can_ take bytecodes targetted toward a
    "virtual" machine and execute it on silicon,
    making the
    distinction a lot more fluid than might be naively assumed, in
    turn exposing the silliness of this argument that centers around
    this weirdly overly-rigid definition of what a "compiler" is.

    I've implemented numerous compilers and interpreters over the last few >decades (and have dabbled in emulators).

    To me the distinctions are clear enough because I have to work at the
    sharp end!

    I'm not sure why people want to try and be clever by blurring the roles
    of compiler and interpreter; that's not helpful at all.

    I'm not saying the two are the same; what I'm saying is that
    this arbitrary criteria that a compiler must emit a fully
    executable binary image is not just inadquate, but also wrong,
    as it renders separate compilation impossible. I am further
    saying that there are many different _types_ of compilers,
    including specialized tools that don't emit machine language.

    Sure, people can write emulators for machine code, which are a kind of >interpreter, or they can implement bytecode in hardware; so what?

    That's exactly my point.

    That doesn't really affect what I do. Writing compiler backends for
    actual CPUs is hard work. Generating bytecode is a lot simpler.

    That really depends on the bytecode, doesn't it? The JVM is a
    complex beast; MIPS or the unprivileged integer subset of RISC-V
    are pretty simple in comparison.

    (Especially in my case as I've devised myself, another distinction. >Compilers usually target someone else's instruction set.)

    If you want one more distinction, it is this: with my compiler, the >resultant binary is executed by a separate agency: the CPU. Or maybe the
    OS loader will run it through an emulator.

    Python has a mode by which it will emit bytecode _files_, which
    can be separately loaded and interpreted; it even has an
    optimizing mode. Is that substantially different?

    With my interpreter, then *I* have to write the dispatch routines and
    write code to implement all the instructions.

    Again, I don't think that anyone disputes that interpreters
    exist. But insisting that they must take a particular shape is
    just wrong.

    (My compilers generate an intermediate language, a kind of VM, which is
    then processed further into native code.

    Then by the definition of this psuedonyminous guy I've been
    responding to, your compiler is not a "proper compiler", no?

    But I have also tried interpreting that VM; it just runs 20 times slower >than native code. That's what interpreting usually means: slow programs.)

    Not necessarily. The JVM does pretty good, quite honestly.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 20:30:08 2024
    From Newsgroup: comp.lang.misc

    In article <20241013093004.251@kylheku.com>,
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    On 2024-10-11, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
    Irrelevant. Lot of interpreters do partial compilation and the JVM does it >> on the fly. A proper compiler writes a standalone binary file to disk.

    You might want to check those goalposts again. You can easily make a
    "proper compiler" which just writes a canned interpreter executable to
    disk, appending to it the program source code.

    Indeed; this is what the Moscow ML compiler does.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 20:33:10 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 08:22:53 -0000 (UTC), Muttley boring babbled:

    On Sat, 12 Oct 2024 21:25:17 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Sat, 12 Oct 2024 08:42:17 -0000 (UTC), Muttley boring babbled:

    Code generated by a compiler does not require an interpreter.

    Something has to implement the rules of the “machine language”. This is >>why we use the term “abstract machine”, to avoid having to distinguish >>between “hardware” and “software”.

    Think: modern CPUs typically have “microcode” and “firmware” associated
    with them. Are those “hardware” or “software”?

    Who cares what happens inside the CPU hardware?

    Because that’s where your “software” runs.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 21:33:56 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:
    On Sat, 12 Oct 2024 16:39:20 +0000
    Eric Pozharski <apple.universe@posteo.net> boring babbled:
    with <87wmighu4i.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:
    Muttley@DastartdlyHQ.org writes:
    On Wed, 09 Oct 2024 22:25:05 +0100 Rainer Weikusat
    <rweikusat@talktalk.net> boring babbled:
    Bozo User <anthk@disroot.org> writes:
    On 2024-04-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Sun, 07 Apr 2024 00:01:43 +0000, Javier wrote:

    *CUT* [ 19 lines 6 levels deep]

    Its syntax is also a horrific mess.
    Which means precisely what?

    You're arguing with Unix Haters Handbook. You've already lost.

    ITYF the people who dislike Perl are the ones who actually like the unix
    way of having simple daisychained tools instead of some lump of a language that does everything messily.

    Perl is a general-purpose programming language, just like C or Java (or
    Python or Javascript or Rust or $whatnot). This means it can be used to implement anything (with some practical limitation for anything) and not
    that it "does everything".


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 20:34:47 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 08:19:16 -0000 (UTC), Muttley wrote:

    ITYF the people who dislike Perl are the ones who actually like the unix
    way of having simple daisychained tools instead of some lump of a
    language that does everything messily.

    Not sure how those small tools can work without the support of much bigger lumps like the shell, the compiler/interpreter for those tools and the
    kernel itself.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 21:08:09 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 14:54:13 -0000 (UTC), Muttley wrote:

    What happens inside the CPU is irrelevant.

    But that’s where your “software” runs.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sun Oct 13 21:09:13 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 16:02:13 -0000 (UTC), Muttley wrote:

    You explanation seems to be "anything that converts from one
    language to another".

    You would call that a “translator”. That term was used more in the early days, but that’s essentially synonymous with “compiler”.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.programmer,comp.lang.misc on Sun Oct 13 21:10:06 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 18:28:32 +0200, Janis Papanagnou wrote:

    You know there's formal definitions for what constitutes languages.

    Not really. For example, some have preferred the term “notation” instead of “language”.

    Regardless of what you call it, machine code still qualifies.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.programmer,comp.lang.misc on Mon Oct 14 01:16:11 2024
    From Newsgroup: comp.lang.misc

    On 13.10.2024 23:10, Lawrence D'Oliveiro wrote:
    On Sun, 13 Oct 2024 18:28:32 +0200, Janis Papanagnou wrote:

    You know there's formal definitions for what constitutes languages.

    Not really. For example, some have preferred the term “notation” instead of “language”.

    A "notation" is not the same as a [formal (or informal)] "language".

    (Frankly, I don't know where you're coming from; mind to explain your
    point if you think it's relevant. - But since you wrote "_some_ have
    preferred" it might anyway have been only an opinion or a historic
    inaccuracy so it's probably not worth expanding on that?)

    I think we should be clear about terminology.

    I was speaking about [formal] languages as introduced by Chomsky and
    used (and extended) by scientists (specifically computer scientists)
    since then. And these formal characteristics of languages and grammars
    are also the base of the books that have been mentioned and recently
    quoted in this sub-thread.

    Regardless of what you call it, machine code still qualifies.

    Glad you agree.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 01:20:45 2024
    From Newsgroup: comp.lang.misc

    On 13/10/2024 21:29, Dan Cross wrote:
    In article <vegs0o$nh5t$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 13/10/2024 16:52, Dan Cross wrote:
    In article <QnROO.226037$EEm7.111715@fx16.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <vefvo0$k1mm$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:

    Really? So java bytecode will run direct on x86 or ARM will it? Please give
    some links to this astounding discovery you've made.

    Um, ok. https://en.wikipedia.org/wiki/Jazelle

    There was also a company a couple of decades ago that
    built an entire processor designed to execute bytecode
    directly - with a coprocessor to handle I/O.

    IIRC, it was Azul. There were a number of others, including
    Sun.

    None of them panned out - JIT's ended up winning that battle.

    Even ARM no longer includes Jazelle extensions in any of their
    mainstream processors.

    Sure. But the fact that any of these were going concerns is an
    existence proof that one _can_ take bytecodes targetted toward a
    "virtual" machine and execute it on silicon,
    making the
    distinction a lot more fluid than might be naively assumed, in
    turn exposing the silliness of this argument that centers around
    this weirdly overly-rigid definition of what a "compiler" is.

    I've implemented numerous compilers and interpreters over the last few
    decades (and have dabbled in emulators).

    To me the distinctions are clear enough because I have to work at the
    sharp end!

    I'm not sure why people want to try and be clever by blurring the roles
    of compiler and interpreter; that's not helpful at all.

    I'm not saying the two are the same; what I'm saying is that
    this arbitrary criteria that a compiler must emit a fully
    executable binary image is not just inadquate, but also wrong,
    as it renders separate compilation impossible. I am further
    saying that there are many different _types_ of compilers,
    including specialized tools that don't emit machine language.

    Sure, people can write emulators for machine code, which are a kind of
    interpreter, or they can implement bytecode in hardware; so what?

    That's exactly my point.

    So, then what, we do away with the concepts of 'compiler' and
    'interpreter'? Or allow them to be used interchangeably?

    Somehow I don't think it is useful to think of gcc as a interpreter for
    C, or CPython as an native code compiler for Python.

    That doesn't really affect what I do. Writing compiler backends for
    actual CPUs is hard work. Generating bytecode is a lot simpler.

    That really depends on the bytecode, doesn't it? The JVM is a
    complex beast;

    Is it? It's not to my taste, but it didn't look too scary to me. Whereas modern CPU instruction sets are horrendous. (I normally target x64,
    which is described in 6 large volumes. RISC ones don't look much better,
    eg. RISC V with its dozens of extensions and special types)

    Example of JVM:

    aload index Push a reference from local variable #index

    MIPS or the unprivileged integer subset of RISC-V
    are pretty simple in comparison.

    (Especially in my case as I've devised myself, another distinction.
    Compilers usually target someone else's instruction set.)

    If you want one more distinction, it is this: with my compiler, the
    resultant binary is executed by a separate agency: the CPU. Or maybe the
    OS loader will run it through an emulator.

    Python has a mode by which it will emit bytecode _files_, which
    can be separately loaded and interpreted; it even has an
    optimizing mode. Is that substantially different?

    Whether there is a discrete bytecode file is besides the point. (I
    generated such files for many years.)

    You still need software to execute it. Especially for dynamically typed bytecode which doesn't lend itself easily to either hardware
    implementations, or load-time native code translation.


    With my interpreter, then *I* have to write the dispatch routines and
    write code to implement all the instructions.

    Again, I don't think that anyone disputes that interpreters
    exist. But insisting that they must take a particular shape is
    just wrong.

    What shape would that be? Generally they will need some /software/ to
    excute the instructions of the program being interpreted, as I said.
    Some JIT products may choose to do on-demand translation to native code.

    Is there anything else? I'd be interested in anything new!

    (My compilers generate an intermediate language, a kind of VM, which is
    then processed further into native code.

    Then by the definition of this psuedonyminous guy I've been
    responding to, your compiler is not a "proper compiler", no?

    Actually mine is more of a compiler than many, since it directly
    generates native machine code. Others generally stop at ASM code (eg.
    gcc) or OBJ code, and will invoke separate programs to finish the job.

    The intermediate language here is just a step in the process.

    But I have also tried interpreting that VM; it just runs 20 times slower
    than native code. That's what interpreting usually means: slow programs.)

    Not necessarily. The JVM does pretty good, quite honestly.

    But is it actually interpreting? Because if I generated such code for a statically typed language, then I would first translate to native code,
    of any quality, since it's going to be faster than interpreting.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 00:58:11 2024
    From Newsgroup: comp.lang.misc

    In article <veho4s$sghb$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 13/10/2024 21:29, Dan Cross wrote:
    In article <vegs0o$nh5t$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 13/10/2024 16:52, Dan Cross wrote:
    [snip]
    Sure. But the fact that any of these were going concerns is an
    existence proof that one _can_ take bytecodes targetted toward a
    "virtual" machine and execute it on silicon,
    making the
    distinction a lot more fluid than might be naively assumed, in
    turn exposing the silliness of this argument that centers around
    this weirdly overly-rigid definition of what a "compiler" is.

    I've implemented numerous compilers and interpreters over the last few
    decades (and have dabbled in emulators).

    To me the distinctions are clear enough because I have to work at the
    sharp end!

    I'm not sure why people want to try and be clever by blurring the roles
    of compiler and interpreter; that's not helpful at all.

    I'm not saying the two are the same; what I'm saying is that
    this arbitrary criteria that a compiler must emit a fully
    executable binary image is not just inadquate, but also wrong,
    as it renders separate compilation impossible. I am further
    saying that there are many different _types_ of compilers,
    including specialized tools that don't emit machine language.

    Sure, people can write emulators for machine code, which are a kind of
    interpreter, or they can implement bytecode in hardware; so what?

    That's exactly my point.

    So, then what, we do away with the concepts of 'compiler' and
    'interpreter'? Or allow them to be used interchangeably?

    I don't see how you can credibly draw that conclusion from what
    I've been saying.

    But it's really pretty straight-forward; a compiler effects a
    translation from one computer language to another (the
    definition from Aho et al). An interpreter takes a program
    written in some computer language and executes it. Of course
    there's some gray area here; is a load-and-go compiler a
    compiler in this sense (yes; it is still translating between
    its source language and a machine language) or an interpreter?
    (Possibly; after all, it's taking a source language and causing
    a program written in it to be executed.)

    Java is an interesting case in point here; the Java compiler is
    obviously a compiler; the JVM is an interpreter. I don't think
    anyone would dispute this. But by suggesting some hard and fast
    division that can be rigidly upheld in all cases we're ignoring
    so much nuance as to be reductive; but by pointing these things
    out, we see how inane it is to assert that a "proper compiler"
    is only one that takes a textual source input and emits machine
    code for a silicon target.

    Somehow I don't think it is useful to think of gcc as a interpreter for
    C, or CPython as an native code compiler for Python.

    I don't think anyone suggested that. But we _do_ have examples
    of true compilers emitting "code" for interpreters; cf LLVM and
    eBPF, which I mentioned previously in this thread, or compilers
    that emit code for hypothetical machines like MMIX, or compilers
    that emit instructions that aren't implemented everywhere, or
    more precisely are implemented by trap and emulation.

    That doesn't really affect what I do. Writing compiler backends for
    actual CPUs is hard work. Generating bytecode is a lot simpler.

    That really depends on the bytecode, doesn't it? The JVM is a
    complex beast;

    Is it? It's not to my taste, but it didn't look too scary to me. Whereas >modern CPU instruction sets are horrendous. (I normally target x64,
    which is described in 6 large volumes. RISC ones don't look much better,
    eg. RISC V with its dozens of extensions and special types)

    I dunno. Wirth wrote an Oberon compiler targeting MIPS in ~5000
    lines of code. It was pretty straight-forward.

    And most of those ten volumes in the SDM have to do with the
    privileged instruction set and details of the memory model like
    segmentation and paging, most of which don't impact the compiler
    author much at all: beyond, perhaps providing an intrinsic for
    the `rdmsr` and `wrmsr` instructions, I don't think you care
    much about MSRs, let alone VMX or the esoterica of under what
    locked cycles the hardware sets the "A" bit on page table
    entries on a TLB miss.

    Example of JVM:

    aload index Push a reference from local variable #index

    Ok. `leaq index(%rip), %rax; pushq %rax` isn't that hard either.

    MIPS or the unprivileged integer subset of RISC-V
    are pretty simple in comparison.

    (Especially in my case as I've devised myself, another distinction.
    Compilers usually target someone else's instruction set.)

    If you want one more distinction, it is this: with my compiler, the
    resultant binary is executed by a separate agency: the CPU. Or maybe the >>> OS loader will run it through an emulator.

    Python has a mode by which it will emit bytecode _files_, which
    can be separately loaded and interpreted; it even has an
    optimizing mode. Is that substantially different?

    Whether there is a discrete bytecode file is besides the point. (I
    generated such files for many years.)

    You still need software to execute it. Especially for dynamically typed >bytecode which doesn't lend itself easily to either hardware >implementations, or load-time native code translation.

    Sure. But if execution requires a "separate agency", and you
    acknowledge that could be a CPU or a separate program, how is
    that all that different than what Python _does_? That doesn't
    imply that the Python interpreter is the same as a CPU, or that
    an interpreter is the same as a compiler. But it does imply
    that the definitions being thrown about here aren't particularly
    good.

    With my interpreter, then *I* have to write the dispatch routines and
    write code to implement all the instructions.

    Again, I don't think that anyone disputes that interpreters
    exist. But insisting that they must take a particular shape is
    just wrong.

    What shape would that be? Generally they will need some /software/ to
    excute the instructions of the program being interpreted, as I said.
    Some JIT products may choose to do on-demand translation to native code.

    Is there anything else? I'd be interested in anything new!

    I actually meant to write that "insisting that _compilers_ take
    a specific shape is just wrong." But I think the point holds
    reasonably well for interpreters, as well: they need not
    directly interpret the text of a program; they may well create
    some sort of internal bytecode after several optimization and
    type checking steps, looking more like a load-and-go compiler
    than, say, the 6th Edition Unix shell.

    Comparison to Roslyn-style compilers blurs the distinction
    further still.

    (My compilers generate an intermediate language, a kind of VM, which is
    then processed further into native code.

    Then by the definition of this psuedonyminous guy I've been
    responding to, your compiler is not a "proper compiler", no?

    Actually mine is more of a compiler than many, since it directly
    generates native machine code. Others generally stop at ASM code (eg.
    gcc) or OBJ code, and will invoke separate programs to finish the job.

    The intermediate language here is just a step in the process.

    But I have also tried interpreting that VM; it just runs 20 times slower >>> than native code. That's what interpreting usually means: slow programs.) >>
    Not necessarily. The JVM does pretty good, quite honestly.

    But is it actually interpreting? Because if I generated such code for a >statically typed language, then I would first translate to native code,
    of any quality, since it's going to be faster than interpreting.

    Doesn't that reinforce my thesis that these things are much
    blurier than all this uninformed talk of a mythical "proper
    compiler" would lead one to believe?

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.programmer,comp.lang.misc on Mon Oct 14 01:45:49 2024
    From Newsgroup: comp.lang.misc

    On Mon, 14 Oct 2024 01:16:11 +0200, Janis Papanagnou wrote:

    On 13.10.2024 23:10, Lawrence D'Oliveiro wrote:

    On Sun, 13 Oct 2024 18:28:32 +0200, Janis Papanagnou wrote:

    You know there's formal definitions for what constitutes languages.

    Not really. For example, some have preferred the term “notation”
    instead of “language”.

    A "notation" is not the same as a [formal (or informal)] "language".

    (Frankly, I don't know where you're coming from ...

    <https://en.wikipedia.org/wiki/Programming_language>:

    A programming language is a system of notation for writing computer
    programs.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.programmer,comp.lang.misc on Mon Oct 14 08:23:20 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 18:28:32 +0200
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    [ X-post list reduced ]

    On 13.10.2024 18:02, Muttley@DastartdlyHQ.org wrote:
    On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    [...]

    No. It translates one computer _language_ to another computer
    _language_. In the usual case, that's from a textual source

    Machine code isn't a language. Fallen at the first hurdle with that
    definition.

    Careful (myself included); watch out for the glazed frost!

    You know there's formal definitions for what constitutes languages.

    At first glance I don't see why machine code wouldn't quality as a
    language (either as some specific "mnemonic" representation, or as
    a sequence of integral numbers or other "code" representations).
    What's the problem, in your opinion, with considering machine code
    as a language?

    A programming language is an abstraction of machine instructions that is readable by people.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 08:25:37 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <vegqu5$o3ve$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:

    [tl;dr]

    The people who create the field are the ones who get to make
    the defintiions, not you.

    ITYF people in the field as a whole make the definitions.

    Machine code isn't a language. Fallen at the first hurdle with that >>definition.

    Oh really? Is that why they call it "machine language"? It's
    even in the dictionary with "machine code" as a synonymn: >https://www.merriam-webster.com/dictionary/machine%20language

    Its not a programming language.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 08:28:57 2024
    From Newsgroup: comp.lang.misc

    On Sun, 13 Oct 2024 21:33:56 +0100
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled: >Muttley@DastartdlyHQ.org writes:
    ITYF the people who dislike Perl are the ones who actually like the unix
    way of having simple daisychained tools instead of some lump of a language >> that does everything messily.

    Perl is a general-purpose programming language, just like C or Java (or >Python or Javascript or Rust or $whatnot). This means it can be used to >implement anything (with some practical limitation for anything) and not
    that it "does everything".

    I can be , but generally isn't. Its niche tends to be text processing of
    some sort and for that there are better tools IMO. It used to be big in web backend but those days are long gone.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 11:38:29 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:
    On Sun, 13 Oct 2024 21:33:56 +0100
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
    Muttley@DastartdlyHQ.org writes:
    ITYF the people who dislike Perl are the ones who actually like the unix >>> way of having simple daisychained tools instead of some lump of a language >>> that does everything messily.

    Perl is a general-purpose programming language, just like C or Java (or >>Python or Javascript or Rust or $whatnot). This means it can be used to >>implement anything (with some practical limitation for anything) and not >>that it "does everything".

    I can be , but generally isn't. Its niche tends to be text processing of
    some sort

    It is. That sysadmin-types using it don't use it to create actual
    programs is of no concern for this, because they never do that and this
    use only needs a very small subset of the features of the language. I've
    been using it as system programming language for programs with up to
    21,000 LOC in the main program (and some more thousands in auxiliary
    modules) and it's very well-suited to that.

    The simple but flexible OO system, reliable automatic memory management
    and support for functions/ subroutine as first-class objects make it
    very nice for implementing event-driven, asynchronous "crossbar"
    programs connecting various external entities both running locallly and
    on other computers on the internet to create complex applications from
    them.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 11:05:06 2024
    From Newsgroup: comp.lang.misc

    On Mon, 14 Oct 2024 11:38:29 +0100
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
    The simple but flexible OO system, reliable automatic memory management

    For a certain definition of OO. The requirement to have to use $self-> everywhere to denote object method/var access makes it little better than
    doing OO in C. Then there's the whole 2 stage object creation with the "bless" nonsense. Hacky.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.programmer,comp.lang.misc on Mon Oct 14 14:13:11 2024
    From Newsgroup: comp.lang.misc

    On 14.10.2024 03:45, Lawrence D'Oliveiro wrote:
    On Mon, 14 Oct 2024 01:16:11 +0200, Janis Papanagnou wrote:
    On 13.10.2024 23:10, Lawrence D'Oliveiro wrote:
    On Sun, 13 Oct 2024 18:28:32 +0200, Janis Papanagnou wrote:

    You know there's formal definitions for what constitutes languages.

    Not really. For example, some have preferred the term “notation”
    instead of “language”.

    A "notation" is not the same as a [formal (or informal)] "language".

    (Frankly, I don't know where you're coming from ...

    <https://en.wikipedia.org/wiki/Programming_language>:

    A programming language is a system of notation for writing computer
    programs.

    Okay, a "system of notation" (not a "notation") is used here to
    _describe_ it. I'm fine with that formulation. Thanks.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.programmer,comp.lang.misc on Mon Oct 14 14:36:38 2024
    From Newsgroup: comp.lang.misc

    On 14.10.2024 10:23, Muttley@DastartdlyHQ.org wrote:
    On Sun, 13 Oct 2024 18:28:32 +0200
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    [ X-post list reduced ]

    On 13.10.2024 18:02, Muttley@DastartdlyHQ.org wrote:
    On Sun, 13 Oct 2024 15:30:03 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    [...]

    No. It translates one computer _language_ to another computer
    _language_. In the usual case, that's from a textual source

    Machine code isn't a language. Fallen at the first hurdle with that
    definition.

    Careful (myself included); watch out for the glazed frost!

    You know there's formal definitions for what constitutes languages.

    At first glance I don't see why machine code wouldn't quality as a
    language (either as some specific "mnemonic" representation, or as
    a sequence of integral numbers or other "code" representations).
    What's the problem, in your opinion, with considering machine code
    as a language?

    A programming language is an abstraction of machine instructions that is readable by people.

    Yes, you can explain "programming language" that way.

    The topic that was cited (Aho, et al.) upthread (and what I spoke
    about) was more generally about [formal] "language", the base also
    of programming languages.

    (In early days of computers they programmed in binary, but that is
    just a side note and unnecessary to support the definition of the
    upthread cited text.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 13:38:04 2024
    From Newsgroup: comp.lang.misc

    In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    Oh really? Is that why they call it "machine language"? It's
    even in the dictionary with "machine code" as a synonymn: >>https://www.merriam-webster.com/dictionary/machine%20language

    Its not a programming language.

    That's news to those people who have, and sometimes still do,
    write programs in it.

    But that's not important. If we go back and look at what I
    wrote that you were responding to, it was this statement, about
    what a compiler does, and your claim that I was asserting it
    was translating anything to anything, which I was not:

    |No. It translates one computer _language_ to another computer
    |_language_. In the usual case, that's from a textual source

    Note that I said, "computer language", not "programming
    language". Being a human-readable language is not a requirement
    for a computer language.

    Your claim that "machine language" is not a "language" is simply
    not true. Your claim that a "proper" compiler must take the
    shape you are pushing is also not true.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 14:47:58 2024
    From Newsgroup: comp.lang.misc

    On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    Oh really? Is that why they call it "machine language"? It's
    even in the dictionary with "machine code" as a synonymn: >>>https://www.merriam-webster.com/dictionary/machine%20language

    Its not a programming language.

    That's news to those people who have, and sometimes still do,
    write programs in it.

    Really? So if its a language you'll be able to understand this then:

    0011101011010101010001110101010010110110001110010100101001010100 0101001010010010100101010111001010100110100111010101010101010101 0001110100011101010001001010110011100010101001110010100101100010

    But that's not important. If we go back and look at what I

    Oh right.


    |No. It translates one computer _language_ to another computer
    |_language_. In the usual case, that's from a textual source

    Note that I said, "computer language", not "programming
    language". Being a human-readable language is not a requirement
    for a computer language.

    Oh watch those goalpost moves with pedant set to 11. Presumably you
    think the values of the address lines is a language too.

    Your claim that "machine language" is not a "language" is simply
    not true. Your claim that a "proper" compiler must take the
    shape you are pushing is also not true.

    If you say so.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 14:53:49 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:
    On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >>>On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    Oh really? Is that why they call it "machine language"? It's
    even in the dictionary with "machine code" as a synonymn: >>>>https://www.merriam-webster.com/dictionary/machine%20language

    Its not a programming language.

    That's news to those people who have, and sometimes still do,
    write programs in it.

    Really? So if its a language you'll be able to understand this then:

    0011101011010101010001110101010010110110001110010100101001010100 >0101001010010010100101010111001010100110100111010101010101010101 >0001110100011101010001001010110011100010101001110010100101100010

    I certainly understand this, even four decades later

    94A605440C00010200010400000110

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.unix.programmer,comp.lang.misc on Mon Oct 14 14:58:13 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:
    On Sun, 13 Oct 2024 18:28:32 +0200
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    [ X-post list reduced ]


    A programming language is an abstraction of machine instructions that is >readable by people.

    By that definition, PAL-D is a programming language.

    Any assembler is a programming language, by that definition.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.programmer,comp.lang.misc on Mon Oct 14 14:59:22 2024
    From Newsgroup: comp.lang.misc

    On Mon, 14 Oct 2024 14:58:13 GMT
    scott@slp53.sl.home (Scott Lurndal) boring babbled:
    Muttley@DastartdlyHQ.org writes:
    On Sun, 13 Oct 2024 18:28:32 +0200
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    [ X-post list reduced ]


    A programming language is an abstraction of machine instructions that is >>readable by people.

    By that definition, PAL-D is a programming language.

    Any assembler is a programming language, by that definition.

    Where did I say it wasn't? Of course assembler is a programming language.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 16:04:18 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:
    On Mon, 14 Oct 2024 11:38:29 +0100
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
    The simple but flexible OO system, reliable automatic memory management

    [...]

    Then there's the whole 2 stage object creation with the "bless"
    nonsense. Hacky.

    I was planning to write a longer reply but killed it. You're obviously
    argueing about something you reject for political reasons despite you're
    not really familiar with it and you even 'argue' like a politician. That
    is, you stick peiorative labels on stuff you don't like to emphasize how
    really disagreeable you believe it to be. IMHO, such a method of (pseudo-)discussing anything is completely pointless.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.misc on Mon Oct 14 15:19:03 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
    Your claim that "machine language" is not a "language" is simply
    not true.

    Machine language is a language.

    (It might not be a /formal/ language when the specification
    is not definite. For example, when one says, "6502", are the
    "undocumented" opcodes a part of this language or not? So, for
    a formal language, you have to make sure that it's definite.)

    Not related to unix. So, not,

    Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc

    , but,

    Newsgroups: comp.lang.misc

    .


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 17:27:18 2024
    From Newsgroup: comp.lang.misc

    On 14/10/2024 16:53, Scott Lurndal wrote:
    Muttley@DastartdlyHQ.org writes:
    On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
    On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    Oh really? Is that why they call it "machine language"? It's
    even in the dictionary with "machine code" as a synonymn:
    https://www.merriam-webster.com/dictionary/machine%20language

    Its not a programming language.

    That's news to those people who have, and sometimes still do,
    write programs in it.

    Really? So if its a language you'll be able to understand this then:

    0011101011010101010001110101010010110110001110010100101001010100
    0101001010010010100101010111001010100110100111010101010101010101
    0001110100011101010001001010110011100010101001110010100101100010

    I certainly understand this, even four decades later

    94A605440C00010200010400000110


    In my early days of assembly programming on my ZX Spectrum, I would hand-assembly to machine code, and I knew at least a few of the codes by heart. (01 is "ld bc, #xxxx", 18 is "jr", c9 is "ret", etc.) So while
    I rarely wrote machine code directly, it is certainly still a
    programming language - it's a language you can write programs in.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 15:39:19 2024
    From Newsgroup: comp.lang.misc

    On Mon, 14 Oct 2024 16:04:18 +0100
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled: >Muttley@DastartdlyHQ.org writes:
    On Mon, 14 Oct 2024 11:38:29 +0100
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
    The simple but flexible OO system, reliable automatic memory management

    [...]

    Then there's the whole 2 stage object creation with the "bless"
    nonsense. Hacky.

    I was planning to write a longer reply but killed it. You're obviously >argueing about something you reject for political reasons despite you're
    not really familiar with it and you even 'argue' like a politician. That
    is, you stick peiorative labels on stuff you don't like to emphasize how >really disagreeable you believe it to be. IMHO, such a method of >(pseudo-)discussing anything is completely pointless.

    Umm, whatever. I was just saying why I didn't like Perl but if you want to
    read some grand motive into it knock yourself out.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.programmer,comp.lang.misc on Mon Oct 14 17:43:59 2024
    From Newsgroup: comp.lang.misc

    [ X-post list reduced ]

    On 14.10.2024 16:47, Muttley@DastartdlyHQ.org wrote:
    On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <veiki1$14g6h$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote:
    On Sun, 13 Oct 2024 20:15:45 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    Oh really? Is that why they call it "machine language"? It's
    even in the dictionary with "machine code" as a synonymn:
    https://www.merriam-webster.com/dictionary/machine%20language

    Its not a programming language.

    That's news to those people who have, and sometimes still do,
    write programs in it.

    Really? So if its a language you'll be able to understand this then:

    0011101011010101010001110101010010110110001110010100101001010100 0101001010010010100101010111001010100110100111010101010101010101 0001110100011101010001001010110011100010101001110010100101100010

    It's substantially (for me) not different from, e.g., Chinese text.

    You need context information to understand it. But understanding a
    language is not a condition for defining and handling a language.
    If there's context information then people can associate semantical
    meaning with it (and understand it).

    To illustrate (just playing)...

    if then else then if or if and else end if

    Are you able to understand that? On what abstraction level do you
    understand it? Does it make [semantical] sense to you?
    (Note: Using the proper translator and interpreter this is quite
    dangerous code. For the puzzler; it's a coded shell fork-bomb.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.programmer,comp.lang.misc on Mon Oct 14 17:55:14 2024
    From Newsgroup: comp.lang.misc

    [ X-post list reduced ]

    On 14.10.2024 17:27, David Brown wrote:
    On 14/10/2024 16:53, Scott Lurndal wrote:
    Muttley@DastartdlyHQ.org writes:

    Really? So if its a language you'll be able to understand this then:

    0011101011010101010001110101010010110110001110010100101001010100
    0101001010010010100101010111001010100110100111010101010101010101
    0001110100011101010001001010110011100010101001110010100101100010

    I certainly understand this, even four decades later

    94A605440C00010200010400000110

    In my early days of assembly programming on my ZX Spectrum, I would hand-assembly to machine code, and I knew at least a few of the codes by heart. (01 is "ld bc, #xxxx", 18 is "jr", c9 is "ret", etc.) So while
    I rarely wrote machine code directly, it is certainly still a
    programming language - it's a language you can write programs in.

    Your post triggered some own memories...

    I have an old pocket calculator (Sharp PC-1401) programmable in
    BASIC. When I found out that it supports undocumented features to
    read machine code numbers from memory and write code numbers into
    memory (and call them as subprograms) I coded programs in decimal
    byte sequences. (A pain, for sure, but in earlier computer eras
    even a normal process.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.unix.programmer,comp.lang.misc on Mon Oct 14 17:23:11 2024
    From Newsgroup: comp.lang.misc

    On 14/10/2024 15:58, Scott Lurndal wrote:
    Muttley@DastartdlyHQ.org writes:
    On Sun, 13 Oct 2024 18:28:32 +0200
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    [ X-post list reduced ]


    A programming language is an abstraction of machine instructions that is
    readable by people.

    By that definition, PAL-D is a programming language.

    (I've no idea what PAL-D is in this context.)

    Any assembler is a programming language, by that definition.


    You mean 'assembly'? An assembler (in the sofware world) is usually a
    program that translates textual assembly code.

    'Compiler' isn't a programming language (although no doubt someone here
    will dredge up some obscure language with exactly that name just to
    prove me wrong).


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.misc on Mon Oct 14 16:51:06 2024
    From Newsgroup: comp.lang.misc

    In article <vejauu$186ln$1@dont-email.me>, <Muttley@DastartdlyHQ.org> wrote: >On Mon, 14 Oct 2024 13:38:04 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    [snip]
    |No. It translates one computer _language_ to another computer >>|_language_. In the usual case, that's from a textual source

    Note that I said, "computer language", not "programming
    language". Being a human-readable language is not a requirement
    for a computer language.

    Oh watch those goalpost moves with pedant set to 11. Presumably you
    think the values of the address lines is a language too.

    Dunno what to tell you: pretty sure you're the one who
    asserted I meant something something I didn't write.

    Your claim that "machine language" is not a "language" is simply
    not true. Your claim that a "proper" compiler must take the
    shape you are pushing is also not true.

    If you say so.

    Not just me.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.programmer,comp.lang.misc on Mon Oct 14 21:04:44 2024
    From Newsgroup: comp.lang.misc

    On Mon, 14 Oct 2024 08:23:20 -0000 (UTC), Muttley wrote:

    A programming language is an abstraction of machine instructions that is readable by people.

    Like converting circuit voltages to human-readable “1” and “0” symbols,
    perhaps?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.unix.programmer,comp.lang.misc on Tue Oct 15 13:27:09 2024
    From Newsgroup: comp.lang.misc

    On 14/10/2024 18:23, Bart wrote:
    On 14/10/2024 15:58, Scott Lurndal wrote:
    Muttley@DastartdlyHQ.org writes:
    On Sun, 13 Oct 2024 18:28:32 +0200
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    [ X-post list reduced ]


    A programming language is an abstraction of machine instructions that is >>> readable by people.

    By that definition, PAL-D is a programming language.

    (I've no idea what PAL-D is in this context.)

    Any assembler is a programming language, by that definition.


    You mean 'assembly'? An assembler (in the sofware world) is usually a program that translates textual assembly code.


    I took "an assembler" to mean "an assembler language", which is a common alternative way to write "an assembly language". And IMHO, any assembly language /is/ a programming language.

    'Compiler' isn't a programming language (although no doubt someone here
    will dredge up some obscure language with exactly that name just to
    prove me wrong).


    I tried, just to please you, but I couldn't find such a language :-)


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.programmer,comp.lang.misc on Tue Oct 15 15:18:21 2024
    From Newsgroup: comp.lang.misc

    [Followup-To: set to comp.lang.misc, -comp.unix.programmer]

    In article <vejghe$192vs$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 14/10/2024 15:58, Scott Lurndal wrote:
    Muttley@DastartdlyHQ.org writes:
    On Sun, 13 Oct 2024 18:28:32 +0200
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    [ X-post list reduced ]


    A programming language is an abstraction of machine instructions that is >>> readable by people.

    By that definition, PAL-D is a programming language.

    (I've no idea what PAL-D is in this context.)

    PAL-D is an assembler for the PDP-8 computer. I don't know why
    one wouldn't consider it's input a programming language. https://bitsavers.org/pdf/dec/pdp8/handbooks/programmingLanguages_May70.pdf

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Sebastian@sebastian@here.com.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Nov 11 07:31:13 2024
    From Newsgroup: comp.lang.misc

    In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
    On Wed, 09 Oct 2024 22:25:05 +0100
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled:
    Bozo User <anthk@disroot.org> writes:
    On 2024-04-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Sun, 07 Apr 2024 00:01:43 +0000, Javier wrote:

    The downside is the loss of performance because of disk access for
    trivial things like 'nfiles=$(ls | wc -l)'.

    Well, you could save one process creation by writing
    ???nfiles=$(echo * | wc -l)??? instead. But that would still not be >>strictly
    correct.

    I suspect disk access times where
    one of the reasons for the development of perl in the early 90s.

    Shells were somewhat less powerful in those days. I would describe the >>>> genesis of Perl as ???awk on steroids???. Its big party trick was regular >>>> expressions. And I guess combining that with more sophisticated data-
    structuring capabilities.

    Perl is more awk+sed+sh in a single language. Basically the killer
    of the Unix philophy in late 90's/early 00's, and for the good.

    Perl is a high-level programming language with a rich syntax??, with >>support for deterministic automatic memory management, functions as >>first-class objects and message-based OO. It's also a virtual machine
    for executing threaded code and a(n optimizing) compiler for translating >>Perl code into the corresponding threaded code.

    Its syntax is also a horrific mess. Larry took the worst parts of C and shell syntax and mashed them together. Its no surprise Perl has been ditched in favour of Python just about everywhere for new scripting projects. And while I hate Pythons meangingful whitespace nonsense, I'd use it in preference
    to Perl any day.

    I think you've identified the one language that Python is better than.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Nov 11 10:06:40 2024
    From Newsgroup: comp.lang.misc

    On Mon, 11 Nov 2024 07:31:13 -0000 (UTC)
    Sebastian <sebastian@here.com.invalid> boring babbled:
    In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
    syntax and mashed them together. Its no surprise Perl has been ditched in
    favour of Python just about everywhere for new scripting projects. And while >> I hate Pythons meangingful whitespace nonsense, I'd use it in preference
    to Perl any day.

    I think you've identified the one language that Python is better than.

    Yes, Python does have a lot of cons as a language. But its syntax lets
    newbies get up to speed quickly and there are a lot of libraries. However its dog slow and inefficient and I'm amazed its used as a key language for AI development - not traditionally a newbie coder area - when in that application speed really is essential. Yes it generally calls libraries written in C/C++ but then why not just write the higher level code in C++ too?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Wolfgang Agnes@wagnes@jemoni.to to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Nov 11 08:28:51 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:

    On Mon, 11 Nov 2024 07:31:13 -0000 (UTC)
    Sebastian <sebastian@here.com.invalid> boring babbled:
    In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
    syntax and mashed them together. Its no surprise Perl has been ditched in >>> favour of Python just about everywhere for new scripting projects. And while
    I hate Pythons meangingful whitespace nonsense, I'd use it in preference >>> to Perl any day.

    I think you've identified the one language that Python is better than.

    Yes, Python does have a lot of cons as a language. But its syntax lets newbies get up to speed quickly and there are a lot of libraries. However its dog slow and inefficient and I'm amazed its used as a key language for AI development - not traditionally a newbie coder area - when in that application
    speed really is essential. Yes it generally calls libraries written in C/C++ but then why not just write the higher level code in C++ too?

    You'd have to give up the REPL, for instance.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@dastardlyhq.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Nov 11 16:21:26 2024
    From Newsgroup: comp.lang.misc

    On Mon, 11 Nov 2024 08:28:51 -0300
    Wolfgang Agnes <wagnes@jemoni.to> gabbled:
    Muttley@DastartdlyHQ.org writes:

    On Mon, 11 Nov 2024 07:31:13 -0000 (UTC)
    Sebastian <sebastian@here.com.invalid> boring babbled:
    In comp.unix.programmer Muttley@dastartdlyhq.org wrote:
    syntax and mashed them together. Its no surprise Perl has been ditched in >>>> favour of Python just about everywhere for new scripting projects. And >while
    I hate Pythons meangingful whitespace nonsense, I'd use it in preference >>>> to Perl any day.

    I think you've identified the one language that Python is better than.

    Yes, Python does have a lot of cons as a language. But its syntax lets
    newbies get up to speed quickly and there are a lot of libraries. However its

    dog slow and inefficient and I'm amazed its used as a key language for AI
    development - not traditionally a newbie coder area - when in that >application
    speed really is essential. Yes it generally calls libraries written in C/C++ >> but then why not just write the higher level code in C++ too?

    You'd have to give up the REPL, for instance.

    Not that big a deal especially if the model takes hours or days to train anyway.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Nov 11 20:55:15 2024
    From Newsgroup: comp.lang.misc

    On Mon, 11 Nov 2024 10:06:40 -0000 (UTC), Muttley wrote:

    Yes it generally calls libraries written in C/C++
    but then why not just write the higher level code in C++ too?

    Because it’s easier to do higher-level stuff in Python.

    Example: <https://github.com/HamPUG/meetings/tree/master/2018/2018-08-13/ldo-creating-api-bindings-using-ctypes>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Nov 11 21:24:14 2024
    From Newsgroup: comp.lang.misc

    On Mon, 11 Nov 2024 07:31:13 -0000 (UTC), Sebastian wrote:

    In comp.unix.programmer Muttley@dastartdlyhq.org wrote:

    [Perl’s] syntax is also a horrific mess. Larry took the worst parts of
    C and shell syntax and mashed them together.

    I think you've identified the one language that Python is better than.

    In terms of the modern era of high-level programming, Perl was the breakthrough language. Before Perl, BASIC was considered to be an example
    of a language with “good” string handling. After Perl, BASIC looked old and clunky indeed.

    Perl was the language that made regular expressions sexy. Because it made
    them easy to use.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 10:14:20 2024
    From Newsgroup: comp.lang.misc

    On 11.11.2024 11:06, Muttley@DastartdlyHQ.org wrote:

    Yes, Python does have a lot of cons as a language. But its syntax lets newbies get up to speed quickly

    and to abruptly get stopped again due to obscure, misleading, or
    (at best), non-informative error messages

    and there are a lot of libraries. However its
    dog slow and inefficient and I'm amazed its used as a key language for AI

    (and not only there; it's ubiquitous, it seems)

    development - not traditionally a newbie coder area - when in that application
    speed really is essential. Yes it generally calls libraries written in C/C++ but then why not just write the higher level code in C++ too?

    Because of its simpler syntax and less syntactical ballast compared
    to C++?

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 09:21:51 2024
    From Newsgroup: comp.lang.misc

    On Tue, 12 Nov 2024 10:14:20 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 11.11.2024 11:06, Muttley@DastartdlyHQ.org wrote:
    and there are a lot of libraries. However its
    dog slow and inefficient and I'm amazed its used as a key language for AI

    (and not only there; it's ubiquitous, it seems)

    Yes, certainly seems to be the case now.

    development - not traditionally a newbie coder area - when in that >application
    speed really is essential. Yes it generally calls libraries written in C/C++ >> but then why not just write the higher level code in C++ too?

    Because of its simpler syntax and less syntactical ballast compared
    to C++?

    When you're dealing with something as complicated and frankly ineffable as
    an AI model I doubt syntactic quirks of the programming language matter that much in comparison. Surely you'd want the fastest implementation possible and in this case it would be C++.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 10:23:38 2024
    From Newsgroup: comp.lang.misc

    On 11.11.2024 22:24, Lawrence D'Oliveiro wrote:
    On Mon, 11 Nov 2024 07:31:13 -0000 (UTC), Sebastian wrote:

    In comp.unix.programmer Muttley@dastartdlyhq.org wrote:

    [Perl’s] syntax is also a horrific mess. Larry took the worst parts of >>> C and shell syntax and mashed them together.

    I think you've identified the one language that Python is better than.

    In terms of the modern era of high-level programming, Perl was the breakthrough language. Before Perl, BASIC was considered to be an example
    of a language with “good” string handling. After Perl, BASIC looked old and clunky indeed.

    I'm not, erm.., a fan of Perl or anything, but comparing it to BASIC
    is way off; Perl is not *that* bad. - N.B.: Of course no one can say
    what "BASIC" actually is given the many variants and dialects. - I'm
    sure you must have some modern variant in mind that might have little
    to do with the various former BASIC dialects (that I happened to use
    in the 1970's; e.g., Wang, Olivetti, Commodore, and a mainframe that
    I don't recall).

    It's more interesting what Perl added compared to BRE/ERE, what Unix
    provided since its beginning (and long before Perl).


    Perl was the language that made regular expressions sexy. Because it made them easy to use.

    For those of us who used regexps in Unix from the beginning it's not
    that shiny as you want us to buy it; Unix was supporting Chomsky-3
    Regular Expressions with a syntax that is still used in contemporary
    languages. Perl supports some nice syntactic shortcuts, but also
    patterns that exceed Chomsky-3's; too bad if one doesn't know these
    differences and any complexity degradation that may be bought with it.

    More interesting to me is the fascinating fact that on some non-Unix
    platforms it took decades before regexps got (slooooowly) introduced
    (even in its simplest form).

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 10:31:58 2024
    From Newsgroup: comp.lang.misc

    On 12.11.2024 10:21, Muttley@DastartdlyHQ.org wrote:
    On Tue, 12 Nov 2024 10:14:20 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 11.11.2024 11:06, Muttley@DastartdlyHQ.org wrote:
    [ Q: why some prefer Python over C++ ]

    Because of its simpler syntax and less syntactical ballast compared
    to C++?

    When you're dealing with something as complicated and frankly ineffable as
    an AI model I doubt syntactic quirks of the programming language matter that much in comparison.

    Oh, I would look at it differently; in whatever application domain I
    program I want a syntactic clear and well defined language.

    Surely you'd want the fastest implementation possible and
    in this case it would be C++.

    Speed is one factor (to me), and expressiveness or "modeling power"
    (OO) is another one. I also appreciate consistently defined languages
    and quality of error catching and usefulness of diagnostic messages.
    (There's some more factors, but...)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 09:53:59 2024
    From Newsgroup: comp.lang.misc

    On Tue, 12 Nov 2024 10:31:58 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 12.11.2024 10:21, Muttley@DastartdlyHQ.org wrote:
    On Tue, 12 Nov 2024 10:14:20 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 11.11.2024 11:06, Muttley@DastartdlyHQ.org wrote:
    [ Q: why some prefer Python over C++ ]

    Because of its simpler syntax and less syntactical ballast compared
    to C++?

    When you're dealing with something as complicated and frankly ineffable as >> an AI model I doubt syntactic quirks of the programming language matter that >> much in comparison.

    Oh, I would look at it differently; in whatever application domain I
    program I want a syntactic clear and well defined language.

    In which case I'd go with a statically typed language like C++ every time
    ahead of a dynamic one like python.

    Surely you'd want the fastest implementation possible and
    in this case it would be C++.

    Speed is one factor (to me), and expressiveness or "modeling power"
    (OO) is another one. I also appreciate consistently defined languages
    and quality of error catching and usefulness of diagnostic messages.
    (There's some more factors, but...)

    C++ is undeniably powerful, but I think the majority would agree now that
    its syntax has become an unwieldy mess.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 15:05:00 2024
    From Newsgroup: comp.lang.misc

    On 12.11.2024 10:53, Muttley@DastartdlyHQ.org wrote:

    In which case I'd go with a statically typed language like C++ every time ahead of a dynamic one like python.

    Definitely!

    I'm using untyped languages (like Awk) for scripting, though, but
    not for code of considerable scale.

    Incidentally, on of my children recently spoke about their setups;
    they use Fortran with old libraries (hydrodynamic earth processes),
    have the higher level tasks implemented in C++, and they do the
    "job control" of the simulation tasks with Python. - A multi-tier
    architecture. - That sounds not unreasonable to me. (But they had
    built their system based on existing software, so it might have
    been a different decision if they'd have built it from scratch.)


    C++ is undeniably powerful, but I think the majority would agree now that
    its syntax has become an unwieldy mess.

    Yes. And recent standards made it yet worse - When I saw it the
    first time I couldn't believe that this would be possible. ;-)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.misc on Tue Nov 12 14:50:26 2024
    From Newsgroup: comp.lang.misc

    On 12/11/2024 14:05, Janis Papanagnou wrote:
    On 12.11.2024 10:53, Muttley@DastartdlyHQ.org wrote:

    In which case I'd go with a statically typed language like C++ every time
    ahead of a dynamic one like python.

    Definitely!

    I'm using untyped languages (like Awk) for scripting, though, but
    not for code of considerable scale.

    Incidentally, on of my children recently spoke about their setups;
    they use Fortran with old libraries (hydrodynamic earth processes),
    have the higher level tasks implemented in C++, and they do the
    "job control" of the simulation tasks with Python. - A multi-tier architecture. - That sounds not unreasonable to me. (But they had
    built their system based on existing software, so it might have
    been a different decision if they'd have built it from scratch.)


    My last major app (now over 20 years ago), had such a 2-language solution.

    It was a GUI-based low-end 2D/3D CAD app, written in my lower level
    systems language.

    But the app also had an embedded scripting language, which had access to
    the app's environment and users' data.

    That was partly so that users (both OEMs and end-users) could write
    their own scripts. To this end it was moderately successful as OEMs
    could write their own add-on applications (for example, to help design lighting rigs).

    But I also used it exclusively for the GUI side of the application:
    menus, dialogs, cursor control, layouts, the simpler file conversions
    (eg. export my data models to 3DS format) while the native code parts
    dealt with the critical parts: the 3D maths, managing the 3D models the display drivers, etc.

    The whole thing was perhaps 150-200Kloc (not including OEM or user
    programs), which was about half static/compiled code and half dynamic/interpreted.

    (One of the original motivations, when it had to run on constrained
    systems, was to allow a lot of the code to exist as standalone scripts,
    which resided on floppy disks, and which ere only loaded as needed.)


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 15:09:23 2024
    From Newsgroup: comp.lang.misc

    On Tue, 12 Nov 2024 15:05:00 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 12.11.2024 10:53, Muttley@DastartdlyHQ.org wrote:
    C++ is undeniably powerful, but I think the majority would agree now that
    its syntax has become an unwieldy mess.

    Yes. And recent standards made it yet worse - When I saw it the
    first time I couldn't believe that this would be possible. ;-)

    Unfortunately these days the C++ steering committee (or whatever its called) simply seem to be using the language to justify their positions and keep chucking in "features" that no one asked for or care about, with the end result of the language becoming a huge mess that no single person could
    ever learn (or at least remember if they tried).

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Wolfgang Agnes@wagnes@jemoni.to to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 13:47:15 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:

    On Tue, 12 Nov 2024 10:14:20 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:

    [...]

    Because of its simpler syntax and less syntactical ballast compared
    to C++?

    When you're dealing with something as complicated and frankly ineffable as
    an AI model I doubt syntactic quirks of the programming language matter that much in comparison. Surely you'd want the fastest implementation possible and in this case it would be C++.

    I really wouldn't be so sure. :)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Wolfgang Agnes@wagnes@jemoni.to to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 13:50:58 2024
    From Newsgroup: comp.lang.misc

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Perl was the language that made regular expressions sexy. Because it made >> them easy to use.

    For those of us who used regexps in Unix from the beginning it's not
    that shiny as you want us to buy it; Unix was supporting Chomsky-3
    Regular Expressions with a syntax that is still used in contemporary languages. Perl supports some nice syntactic shortcuts, but also
    patterns that exceed Chomsky-3's; too bad if one doesn't know these differences and any complexity degradation that may be bought with it.

    By Chomsky-3 you mean a grammar of type 3 in the Chomsky hierarchy? And
    that would be ``regular'' language, recognizable by a finite-state
    automaton? If not, could you elaborate on the terminology?

    More interesting to me is the fascinating fact that on some non-Unix platforms it took decades before regexps got (slooooowly) introduced
    (even in its simplest form).

    Such as which platform?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 12 20:29:02 2024
    From Newsgroup: comp.lang.misc

    On Tue, 12 Nov 2024 10:23:38 +0100, Janis Papanagnou wrote:

    On 11.11.2024 22:24, Lawrence D'Oliveiro wrote:

    Perl was the language that made regular expressions sexy. Because it
    made them easy to use.

    ... Unix was supporting Chomsky-3
    Regular Expressions with a syntax that is still used in contemporary languages.

    Not in anything resembling a general-purpose high-level language. That’s what Perl pioneered.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.misc on Tue Nov 12 20:35:30 2024
    From Newsgroup: comp.lang.misc

    On Tue, 12 Nov 2024 14:50:26 +0000, Bart wrote:

    But the app also had an embedded scripting language, which had access to
    the app's environment and users' data.

    Did you invent your own scripting language? Nowadays you would use
    something ready-made, like Lua, Guile or even Python.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.misc on Tue Nov 12 21:48:39 2024
    From Newsgroup: comp.lang.misc

    On 12/11/2024 20:35, Lawrence D'Oliveiro wrote:
    On Tue, 12 Nov 2024 14:50:26 +0000, Bart wrote:

    But the app also had an embedded scripting language, which had access to
    the app's environment and users' data.

    Did you invent your own scripting language? Nowadays you would use
    something ready-made, like Lua, Guile or even Python.

    At that (late 80s) I had to invent pretty much everything.

    I still do, language-wise.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.misc on Tue Nov 19 06:14:27 2024
    From Newsgroup: comp.lang.misc

    On 12.11.2024 17:50, Wolfgang Agnes wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    [...]

    By Chomsky-3 you mean a grammar of type 3 in the Chomsky hierarchy? And
    that would be ``regular'' language, recognizable by a finite-state
    automaton? If not, could you elaborate on the terminology?

    Yes. I hoped the term was clear enough. If I had used too sloppy
    wording in my ad hoc writing I apologize for the inconvenience.

    My point was about runtime guarantees and complexities (O(N)) of
    Regexp processing, which are also reflected by the FSA model.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From merlyn@merlyn@stonehenge.com (Randal L. Schwartz) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Tue Nov 19 18:43:48 2024
    From Newsgroup: comp.lang.misc

    "Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    Lawrence> Perl was the language that made regular expressions
    Lawrence> sexy. Because it made them easy to use.

    I'm often reminded of this as I've been coding very little in Perl these
    days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    There are times I miss Perl. But not too often any more. :)
    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Dart/Flutter consulting, Technical writing, Comedy, etc. etc.
    Still trying to think of something clever for the fourth line of this .sig
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 04:34:31 2024
    From Newsgroup: comp.lang.misc

    On Tue, 19 Nov 2024 18:43:48 -0800, Randal L. Schwartz wrote:

    "Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    Lawrence> Perl was the language that made regular expressions
    Lawrence> sexy. Because it made them easy to use.

    I'm often reminded of this as I've been coding very little in Perl these days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    Python has regexes as a bolt-on -- a library module, not a core part of
    the language. But I think the way it leverages the core language -- e.g.
    being able to iterate over pattern matches, and collecting information
    about matches in a “Match” object -- keeps it quite useful in a nicely functional way.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 08:21:17 2024
    From Newsgroup: comp.lang.misc

    On Tue, 19 Nov 2024 18:43:48 -0800
    merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
    "Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    Lawrence> Perl was the language that made regular expressions
    Lawrence> sexy. Because it made them easy to use.

    I'm often reminded of this as I've been coding very little in Perl these >days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    Regex itself is clumsy beyond simple search and replace patterns. A lot of stuff I've seen done in regex would have better done procedurally at the expense of slightly more code but a LOT more readability. Also given its effectively a compact language with its own grammar and syntax IMO it should not be the core part of any language as it can lead to a syntatic mess, which is what often happens with Perl.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 11:51:11 2024
    From Newsgroup: comp.lang.misc

    On 20.11.2024 09:21, Muttley@DastartdlyHQ.org wrote:
    On Tue, 19 Nov 2024 18:43:48 -0800
    merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:

    I'm often reminded of this as I've been coding very little in Perl these
    days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    Regex itself is clumsy beyond simple search and replace patterns. A lot of stuff I've seen done in regex would have better done procedurally at the expense of slightly more code but a LOT more readability. Also given its effectively a compact language with its own grammar and syntax IMO it should not be the core part of any language as it can lead to a syntatic mess, which
    is what often happens with Perl.

    I wouldn't look at it that way. I've seen Regexps as part of languages
    usually in well defined syntactical contexts. For example, like strings
    are enclosed in "...", Regexps could be seen within /.../ delimiters.
    GNU Awk (in recent versions) went towards first class "strongly typed"
    Regexps which are then denoted by the @/.../ syntax.

    I'm curious what you mean by Regexps presented in a "procedural" form.
    Can you give some examples?

    Personally I'm fine with the typical lexical meta-symbols in Regexps
    which resembles the FSA and allows a simple transformation forth/back.

    In practice, given that a Regexp conforms to a FSA, any Regexp can be precompiled and used multiple times. The thing I had used in Java - it
    was a library from Apache, IIRC, not the bulky thing that got included
    later - was easily usable; create a Regexp object by a RE expression,
    then operate on that same object. (Since there's still typical Regexp
    syntax involved I suppose that is not what you meant by "procedural"?)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 11:30:44 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 11:51:11 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 20.11.2024 09:21, Muttley@DastartdlyHQ.org wrote:
    On Tue, 19 Nov 2024 18:43:48 -0800
    merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:

    I'm often reminded of this as I've been coding very little in Perl these >>> days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    Regex itself is clumsy beyond simple search and replace patterns. A lot of >> stuff I've seen done in regex would have better done procedurally at the
    expense of slightly more code but a LOT more readability. Also given its
    effectively a compact language with its own grammar and syntax IMO it should >> not be the core part of any language as it can lead to a syntatic mess, >which
    is what often happens with Perl.

    I wouldn't look at it that way. I've seen Regexps as part of languages >usually in well defined syntactical contexts. For example, like strings
    are enclosed in "...", Regexps could be seen within /.../ delimiters.
    GNU Awk (in recent versions) went towards first class "strongly typed" >Regexps which are then denoted by the @/.../ syntax.

    I'm curious what you mean by Regexps presented in a "procedural" form.
    Can you give some examples?

    Anything that can be done in regex can obviously also be done procedurally.
    At the point regex expression become unwieldy - usually when substitution variables raise their heads - I prefer procedural code as its also often easier to debug.

    In practice, given that a Regexp conforms to a FSA, any Regexp can be >precompiled and used multiple times. The thing I had used in Java - it

    Precompiled regex is no more efficient than precompiled anything , its all
    just assembler at the bottom.

    then operate on that same object. (Since there's still typical Regexp
    syntax involved I suppose that is not what you meant by "procedural"?)

    If you don't know the different between declarative syntax like regex and procedural syntax then there's not much point continuing this discussion.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 05:46:49 2024
    From Newsgroup: comp.lang.misc

    On 11/20/2024 2:21 AM, Muttley@DastartdlyHQ.org wrote:
    On Tue, 19 Nov 2024 18:43:48 -0800
    merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
    "Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    Lawrence> Perl was the language that made regular expressions
    Lawrence> sexy. Because it made them easy to use.

    I'm often reminded of this as I've been coding very little in Perl these
    days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    Regex itself is clumsy beyond simple search and replace patterns. A lot of stuff I've seen done in regex would have better done procedurally at the expense of slightly more code but a LOT more readability.

    Definitely. The most relevant statement about regexps is this:

    Some people, when confronted with a problem, think "I know, I'll use
    regular expressions." Now they have two problems.

    attributed to Jamie Zawinski, see https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/.

    Obviously regexps are very useful and commonplace but if you find you
    have to use some online site or other tools to help you write/understand
    one or just generally need more than a couple of minutes to
    write/understand it then it's time to back off and figure out a better
    way to write your code for the sake of whoever has to read it 6 months
    later (and usually for robustness too as it's hard to be sure all rainy
    day cases are handled correctly in a lengthy and/or complicated regexp).

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 12:21:04 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:
    merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
    "Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    Lawrence> Perl was the language that made regular expressions
    Lawrence> sexy. Because it made them easy to use.

    I'm often reminded of this as I've been coding very little in Perl these >>days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    Regex itself is clumsy beyond simple search and replace patterns. A lot of stuff I've seen done in regex would have better done procedurally at the expense of slightly more code but a LOT more readability. Also given its effectively a compact language with its own grammar and syntax IMO it should not be the core part of any language as it can lead to a syntatic mess, which
    is what often happens with Perl.

    A mess is something which often happens when people who can't organize
    their thoughts just trudge on nevertheless. They're perfectly capable of accomplishing that in any programming language.

    A real problem with regexes in Perl is that they're pretty slow for simple
    use cases (like lexical analysis) and thus, not suitable for volume data processing outside of throwaway code¹.

    ¹ I used to use a JSON parser written in OO-Perl which made extensive
    use of regexes for that. I've recently replaced that with a C/XS version
    which - while slightly larger (617 vs 410 lines of text) - is over a
    hundred times faster and conceptually simpler at the same time.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 12:27:54 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 05:46:49 -0600
    Ed Morton <mortonspam@gmail.com> boring babbled:
    On 11/20/2024 2:21 AM, Muttley@DastartdlyHQ.org wrote:
    On Tue, 19 Nov 2024 18:43:48 -0800
    merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:
    "Lawrence" == Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    Lawrence> Perl was the language that made regular expressions
    Lawrence> sexy. Because it made them easy to use.

    I'm often reminded of this as I've been coding very little in Perl these >>> days, and a lot more in languages like Dart, where the regex feels like
    a clumsy bolt-on rather than a proper first-class citizen.

    Regex itself is clumsy beyond simple search and replace patterns. A lot of >> stuff I've seen done in regex would have better done procedurally at the
    expense of slightly more code but a LOT more readability.

    Definitely. The most relevant statement about regexps is this:

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    Very true!

    Obviously regexps are very useful and commonplace but if you find you
    have to use some online site or other tools to help you write/understand
    one or just generally need more than a couple of minutes to
    write/understand it then it's time to back off and figure out a better
    way to write your code for the sake of whoever has to read it 6 months
    later (and usually for robustness too as it's hard to be sure all rainy
    day cases are handled correctly in a lengthy and/or complicated regexp).

    Edge cases are regex achilles heal, eg an expression that only accounted
    for 1 -> N chars, not 0 -> N, or matches in the middle but not at the ends.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 16:38:24 2024
    From Newsgroup: comp.lang.misc

    On 20.11.2024 12:30, Muttley@DastartdlyHQ.org wrote:
    On Wed, 20 Nov 2024 11:51:11 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 20.11.2024 09:21, Muttley@DastartdlyHQ.org wrote:
    On Tue, 19 Nov 2024 18:43:48 -0800
    merlyn@stonehenge.com (Randal L. Schwartz) boring babbled:

    I'm often reminded of this as I've been coding very little in Perl these >>>> days, and a lot more in languages like Dart, where the regex feels like >>>> a clumsy bolt-on rather than a proper first-class citizen.

    Regex itself is clumsy beyond simple search and replace patterns. A lot of >>> stuff I've seen done in regex would have better done procedurally at the >>> expense of slightly more code but a LOT more readability. Also given its >>> effectively a compact language with its own grammar and syntax IMO it should
    not be the core part of any language as it can lead to a syntatic mess,
    which
    is what often happens with Perl.

    I wouldn't look at it that way. I've seen Regexps as part of languages
    usually in well defined syntactical contexts. For example, like strings
    are enclosed in "...", Regexps could be seen within /.../ delimiters.
    GNU Awk (in recent versions) went towards first class "strongly typed"
    Regexps which are then denoted by the @/.../ syntax.

    I'm curious what you mean by Regexps presented in a "procedural" form.
    Can you give some examples?

    Anything that can be done in regex can obviously also be done procedurally. At the point regex expression become unwieldy - usually when substitution variables raise their heads - I prefer procedural code as its also often easier to debug.

    You haven't even tried to honestly answer my (serious) question.
    With your statement above and your hostility below, it rather seems
    you have no clue of what I am talking about.


    In practice, given that a Regexp conforms to a FSA, any Regexp can be
    precompiled and used multiple times. The thing I had used in Java - it

    Precompiled regex is no more efficient than precompiled anything , its all just assembler at the bottom.

    The Regexps are a way to specify the words of a regular language;
    for pattern matching the expression gets interpreted or compiled; you
    specify it, e.g., using strings of characters and meta-characters.
    If you have a programming language where that string gets repeatedly interpreted then it's slower than a precompiled Regexp expression.

    I give you examples...

    (1) DES encryption function

    (1a) ciphertext = des_encode (key, plaintext)

    (1b) cipher = des (key)
    ciphertext = cipher.encode (plaintext)

    In case (1) you can either call the des encription (decription) for
    any (key, plaintext)-pair in a procedural function as in (1a), or
    you can create the key-specific encryption once and encode various
    texts with the same cipher object as in (1b).

    (2) regexp matching

    (2a) location = regexp (pattern, string)

    (2b) fsm = rexexp (pattern)
    location = fsm.match (string)

    In case (2) you can either do the match in a string with a pattern
    in a procedural form as in (2a) or you can create the FSM for the
    given Regexp just once and apply it on various strings as in (2b).

    That's what I was talking about.

    Only if key (in (1)) or pattern (in (2)) are static or "constant"
    that compilation could (but only theoretically) be done in advance
    and optimizing system may (or may not) precompile it (both) to
    [similar] assembler code. How should that work with regexps or DES?
    The optimizing system would need knowledge how to use the library
    code (DES, Regexps, ...) to create binary structures based on the
    algorithms (key-initialization in DES, FSM-generation in Regexps).
    This is [statically] not done.

    Otherwise - i.e. the normal, expected case - there's an efficiency
    difference to observe between the respective cases of (a) and (b).


    then operate on that same object. (Since there's still typical Regexp
    syntax involved I suppose that is not what you meant by "procedural"?)

    If you don't know the different between declarative syntax like regex and procedural syntax then there's not much point continuing this discussion.

    Why do you think so, and why are you saying that? - That wasn't and
    still isn't the point. - You said upthread

    "A lot of stuff I've seen done in regex would have better done
    procedurally at the expense of slightly more code but a LOT more
    readability."

    and I asked

    "I'm curious what you mean by Regexps presented in a "procedural"
    form.
    Can you give some examples?"

    What you wanted to say wasn't clear to me, since you were complaining
    about the _Regexp syntax_. So it couldn't be meant to just write
    regexp (pattern, string) instead of pattern ~ string
    but to somehow(!) transform "pattern", say, like /[0-9]+(ABC)?x*foo/,
    to something syntactically "better".
    I was interested in that "somehow" (that I emphasized), and in an
    example how that would look like in your opinion.
    If you're unable to answer that simple question then just take that
    simple regexp /[0-9]+(ABC)?x*foo/ example and show us your preferred
    procedural variant.

    But my expectation is that you cannot provide any reasonable example
    anyway.

    Personally I think that writing bulky procedural stuff for something
    like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface.
    YMMV.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 16:53:38 2024
    From Newsgroup: comp.lang.misc

    On 20.11.2024 12:46, Ed Morton wrote:

    Definitely. The most relevant statement about regexps is this:

    Some people, when confronted with a problem, think "I know, I'll use
    regular expressions." Now they have two problems.

    (Worth a scribbling on a WC wall.)


    Obviously regexps are very useful and commonplace but if you find you
    have to use some online site or other tools to help you write/understand
    one or just generally need more than a couple of minutes to
    write/understand it then it's time to back off and figure out a better
    way to write your code for the sake of whoever has to read it 6 months
    later (and usually for robustness too as it's hard to be sure all rainy
    day cases are handled correctly in a lengthy and/or complicated regexp).

    Regexps are nothing for newbies.

    The inherent fine thing with Regexps is that you can incrementally
    compose them[*].[**]

    It seems you haven't found a sensible way to work with them?
    (And I'm really astonished about that since I know you worked with
    Regexps for years if not decades.)

    In those cases where Regexps *are* the tool for a specific task -
    I don't expect you to use them where they are inappropriate?! -
    what would be the better solution[***] then?

    Janis

    [*] Like the corresponding FSMs.

    [**] And you can also decompose them if they are merged in a huge
    expression, too large for you to grasp it. (BTW, I'm doing such
    decompositions also with other expressions in program code that
    are too bulky.)

    [***] Can you answer the question that another poster failed to do?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 16:38:15 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 16:38:24 +0100
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> boring babbled:
    On 20.11.2024 12:30, Muttley@DastartdlyHQ.org wrote:
    Anything that can be done in regex can obviously also be done procedurally. >> At the point regex expression become unwieldy - usually when substitution
    variables raise their heads - I prefer procedural code as its also often
    easier to debug.

    You haven't even tried to honestly answer my (serious) question.

    You mean you can't figure out how to do something like string search and replace
    procedurally? I'm not going to show you, ask a kid who knows Python or Basic.

    With your statement above and your hostility below, it rather seems

    If you think my reply was hostile then I suggest you go find a safe space
    and cuddle your teddy bear snowflake.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 17:50:13 2024
    From Newsgroup: comp.lang.misc

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something
    like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface.
    YMMV.

    Assuming that p is a pointer to the current position in a string, e is a pointer to the end of it (ie, point just past the last byte) and -
    that's important - both are pointers to unsigned quantities, the 'bulky'
    C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a
    general-purpose automaton programmed to recognize the same pattern
    (which might not matter most of the time, but sometimes, it does).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 17:54:22 2024
    From Newsgroup: comp.lang.misc

    Muttley@DastartdlyHQ.org writes:

    [...]

    With your statement above and your hostility below, it rather seems

    If you think my reply was hostile then I suggest you go find a safe space
    and cuddle your teddy bear snowflake.

    There's surely no reason why anyone could ever think you were inclined
    to substitute verbal aggression for arguments.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From John Ames@commodorejohn@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 10:03:47 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 17:54:22 +0000
    Rainer Weikusat <rweikusat@talktalk.net> wrote:

    There's surely no reason why anyone could ever think you were inclined
    to substitute verbal aggression for arguments.

    I mean, it's his whole thing - why would he stop now?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Wed Nov 20 21:43:41 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 12:27:54 -0000 (UTC), Muttley wrote:

    Edge cases are regex achilles heal, eg an expression that only accounted
    for 1 -> N chars, not 0 -> N, or matches in the middle but not at the
    ends.

    That’s what “^” and “$” are for.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 08:13:39 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 17:54:22 +0000
    Rainer Weikusat <rweikusat@talktalk.net> boring babbled: >Muttley@DastartdlyHQ.org writes:

    [...]

    With your statement above and your hostility below, it rather seems

    If you think my reply was hostile then I suggest you go find a safe space
    and cuddle your teddy bear snowflake.

    There's surely no reason why anyone could ever think you were inclined
    to substitute verbal aggression for arguments.

    I have zero time for anyone who claims hurt feelings or being slighted as
    soon as they're losing an argument.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 08:15:41 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 21:43:41 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> boring babbled:
    On Wed, 20 Nov 2024 12:27:54 -0000 (UTC), Muttley wrote:

    Edge cases are regex achilles heal, eg an expression that only accounted
    for 1 -> N chars, not 0 -> N, or matches in the middle but not at the
    ends.

    That’s what “^” and “$” are for.

    Yes, but people forget about those (literal) edge cases.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 08:18:06 2024
    From Newsgroup: comp.lang.misc

    On Wed, 20 Nov 2024 10:03:47 -0800
    John Ames <commodorejohn@gmail.com> boring babbled:
    On Wed, 20 Nov 2024 17:54:22 +0000
    Rainer Weikusat <rweikusat@talktalk.net> wrote:

    There's surely no reason why anyone could ever think you were inclined
    to substitute verbal aggression for arguments.

    I mean, it's his whole thing - why would he stop now?

    Whats it like being so wet? Do you get cold easily?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From merlyn@merlyn@stonehenge.com (Randal L. Schwartz) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 05:38:45 2024
    From Newsgroup: comp.lang.misc

    "Rainer" == Rainer Weikusat <rweikusat@talktalk.net> writes:

    Rainer> ¹ I used to use a JSON parser written in OO-Perl which made
    Rainer> extensive use of regexes for that. I've recently replaced that
    Rainer> with a C/XS version which - while slightly larger (617 vs 410
    Rainer> lines of text) - is over a hundred times faster and conceptually Rainer> simpler at the same time.

    I wonder if that was my famous "JSON parser in a single regex" from https://www.perlmonks.org/?node_id=995856, or from one of the two CPAN
    modules that incorporated it.
    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Dart/Flutter consulting, Technical writing, Comedy, etc. etc.
    Still trying to think of something clever for the fourth line of this .sig
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 14:13:37 2024
    From Newsgroup: comp.lang.misc

    In article <20241120100347.00005f10@gmail.com>,
    John Ames <commodorejohn@gmail.com> wrote:
    On Wed, 20 Nov 2024 17:54:22 +0000
    Rainer Weikusat <rweikusat@talktalk.net> wrote:

    There's surely no reason why anyone could ever think you were inclined
    to substitute verbal aggression for arguments.

    I mean, it's his whole thing - why would he stop now?

    This is the guy who didn't know what a compiler is, right?

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 14:40:18 2024
    From Newsgroup: comp.lang.misc

    In article <875xohbxre.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something
    like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface.
    YMMV.

    Assuming that p is a pointer to the current position in a string, e is a >pointer to the end of it (ie, point just past the last byte) and -
    that's important - both are pointers to unsigned quantities, the 'bulky'
    C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a
    general-purpose automaton programmed to recognize the same pattern
    (which might not matter most of the time, but sometimes, it does).

    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 15:07:42 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something
    like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface.
    YMMV.

    Assuming that p is a pointer to the current position in a string, e is a >>pointer to the end of it (ie, point just past the last byte) and -
    that's important - both are pointers to unsigned quantities, the 'bulky'
    C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a >>general-purpose automaton programmed to recognize the same pattern
    (which might not matter most of the time, but sometimes, it does).

    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the
    match will fail. I didn't include the code for handling that because it
    seemed pretty pointless for the example.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From John Ames@commodorejohn@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 07:56:48 2024
    From Newsgroup: comp.lang.misc

    On Thu, 21 Nov 2024 08:18:06 -0000 (UTC)
    Muttley@DastartdlyHQ.org wrote:

    Whats it like being so wet? Do you get cold easily?

    No, I have a soft gray hoodie with a nice fleecy lining that I quite
    like. It's very warm for not being too heavy.

    Also: *huh?*

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From John Ames@commodorejohn@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 07:58:06 2024
    From Newsgroup: comp.lang.misc

    On Thu, 21 Nov 2024 08:13:39 -0000 (UTC)
    Muttley@DastartdlyHQ.org wrote:

    I have zero time

    I approve, it's a wonderful album!

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 16:06:01 2024
    From Newsgroup: comp.lang.misc

    On Thu, 21 Nov 2024 14:13:37 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) boring babbled:
    In article <20241120100347.00005f10@gmail.com>,
    John Ames <commodorejohn@gmail.com> wrote:
    On Wed, 20 Nov 2024 17:54:22 +0000
    Rainer Weikusat <rweikusat@talktalk.net> wrote:

    There's surely no reason why anyone could ever think you were inclined
    to substitute verbal aggression for arguments.

    I mean, it's his whole thing - why would he stop now?

    This is the guy who didn't know what a compiler is, right?

    Wrong. Want another go?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 17:01:48 2024
    From Newsgroup: comp.lang.misc

    merlyn@stonehenge.com (Randal L. Schwartz) writes:
    "Rainer" == Rainer Weikusat <rweikusat@talktalk.net> writes:
    Rainer> ¹ I used to use a JSON parser written in OO-Perl which made
    Rainer> extensive use of regexes for that. I've recently replaced that Rainer> with a C/XS version which - while slightly larger (617 vs 410
    Rainer> lines of text) - is over a hundred times faster and conceptually Rainer> simpler at the same time.

    I wonder if that was my famous "JSON parser in a single regex" from https://www.perlmonks.org/?node_id=995856, or from one of the two CPAN modules that incorporated it.

    No. One of my use-cases is an interactive shell running in a web browser
    using ActionCable messages to relay data between the browser and the
    shell process on the computer supposed to be accessed in this way. For
    this, I absolutely do need \u escapes. I also need this to be fast. Eg,
    one of the nice properties of JSON is that the type of a value can be determined by looking at the first character of it. This cries for an implementation based on an array of pointers to 'value parsing routines'
    of size 256 and determining the parser routine to use by using the first character as index into this table (which will either yield a pointer to
    the correct parser routine or NULL for a syntax error).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 19:12:03 2024
    From Newsgroup: comp.lang.misc

    On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 20.11.2024 09:21, Muttley@DastartdlyHQ.org wrote:
    Regex itself is clumsy beyond simple search and replace patterns. A lot of >> stuff I've seen done in regex would have better done procedurally at the
    expense of slightly more code but a LOT more readability. Also given its
    effectively a compact language with its own grammar and syntax IMO it should >> not be the core part of any language as it can lead to a syntatic mess, which
    is what often happens with Perl.

    I wouldn't look at it that way. I've seen Regexps as part of languages usually in well defined syntactical contexts. For example, like strings
    are enclosed in "...", Regexps could be seen within /.../ delimiters.
    GNU Awk (in recent versions) went towards first class "strongly typed" Regexps which are then denoted by the @/.../ syntax.

    These features solve the problem of regexes being stored as character
    strings not being recognized by the language compiler and then having
    to be compiled at run-time.

    They don't solve all the ergonomics of regexes that Muttley is talking
    about.

    I'm curious what you mean by Regexps presented in a "procedural" form.
    Can you give some examples?

    Here is an example: using a regex match to capture a C comment /* ... */
    in Lex compared to just recognizing the start sequence /* and handling
    the discarding of the comment in the action.

    Without non-greedy repetition matching, the regex for a C comment is
    quite obtuse. The procedural handling is straightforward: read
    characters until you see a * immediately followed by a /.

    In the wild, you see regexes being used for all sorts of stupid stuff,
    like checking whether numeric input is in a certain range, rather than converting it to a number and doing an arithmetic check.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Thu Nov 21 22:05:13 2024
    From Newsgroup: comp.lang.misc

    On Thu, 21 Nov 2024 08:15:41 -0000 (UTC), Muttley wrote:

    On Wed, 20 Nov 2024 21:43:41 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> boring babbled:

    On Wed, 20 Nov 2024 12:27:54 -0000 (UTC), Muttley wrote:

    Edge cases are regex achilles heal, eg an expression that only
    accounted for 1 -> N chars, not 0 -> N, or matches in the middle but
    not at the ends.

    That’s what “^” and “$” are for.

    Yes, but people forget about those (literal) edge cases.

    Those of us who are accustomed to using regexes do not.

    Another handy one is “\b” for word boundaries.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 10:09:48 2024
    From Newsgroup: comp.lang.misc

    On Thu, 21 Nov 2024 19:12:03 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> boring babbled:
    On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    I'm curious what you mean by Regexps presented in a "procedural" form.
    Can you give some examples?

    Here is an example: using a regex match to capture a C comment /* ... */
    in Lex compared to just recognizing the start sequence /* and handling
    the discarding of the comment in the action.

    Without non-greedy repetition matching, the regex for a C comment is
    quite obtuse. The procedural handling is straightforward: read
    characters until you see a * immediately followed by a /.

    Its not that simple I'm afraid since comments can be commented out.

    eg:

    // int i; /*
    int j;
    /*
    int k;
    */
    ++j;

    A C99 and C++ compiler would see "int j" and compile it, a regex would
    simply remove everything from the first /* to */.

    Also the same probably applies to #ifdef's.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 12:14:32 2024
    From Newsgroup: comp.lang.misc

    On 20.11.2024 18:50, Rainer Weikusat wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    Personally I think that writing bulky procedural stuff for something
    like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface.
    YMMV.

    Assuming that p is a pointer to the current position in a string, e is a pointer to the end of it (ie, point just past the last byte) and -
    that's important - both are pointers to unsigned quantities, the 'bulky'
    C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a
    general-purpose automaton programmed to recognize the same pattern
    (which might not matter most of the time, but sometimes, it does).

    Okay, I see where you're coming from (and especially in that simple
    case).

    Personally (and YMMV), even here in this simple case I think that
    using pointers is not better but worse - and anyway isn't [in this
    form] available in most languages; in other cases (and languages)
    such constructs get yet more clumsy, and for my not very complex
    example - /[0-9]+(ABC)?x*foo/ - even a "catastrophe" concerning
    readability, error-proneness, and maintainability.

    If that is what the other poster meant I'm fine with your answer;
    there's no need to even consider abandoning regular expressions
    in favor of explicitly codified parsing.

    Janis

    PS: And thanks for answering on behalf of the other poster whom I
    see in his followups just continuing his very personal style.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 12:17:56 2024
    From Newsgroup: comp.lang.misc

    On 21.11.2024 20:12, Kaz Kylheku wrote:
    [...]

    In the wild, you see regexes being used for all sorts of stupid stuff,

    No one can prevent folks using features for stupid things. Yes.

    like checking whether numeric input is in a certain range, rather than converting it to a number and doing an arithmetic check.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 12:47:16 2024
    From Newsgroup: comp.lang.misc

    On 21.11.2024 23:05, Lawrence D'Oliveiro wrote:
    On Thu, 21 Nov 2024 08:15:41 -0000 (UTC), Muttley wrote:
    On Wed, 20 Nov 2024 21:43:41 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> boring babbled:
    [...]

    That’s what “^” and “$” are for.

    Yes, but people forget about those (literal) edge cases.

    But *only* _literally_ "edge cases". Rather they're simple
    and basics of regexp parsers since their beginning.

    Those of us who are accustomed to using regexes do not.

    It's one of the first things that regexp newbies learn,
    I'd say.


    Another handy one is “\b” for word boundaries.

    I prefer \< and \> (that are quite commonly used) for such
    structural things, also \( and \) for allowing references
    to matched parts. And I prefer the \alpha regexp pattern
    extension forms for things like \d \D \w \W \s \S . (But
    that's not only a matter of taste but also a question of
    what any regexp parser actually supports.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 11:56:26 2024
    From Newsgroup: comp.lang.misc

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 20.11.2024 18:50, Rainer Weikusat wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    Personally I think that writing bulky procedural stuff for something
    like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface.
    YMMV.

    Assuming that p is a pointer to the current position in a string, e is a
    pointer to the end of it (ie, point just past the last byte) and -
    that's important - both are pointers to unsigned quantities, the 'bulky'
    C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a
    general-purpose automaton programmed to recognize the same pattern
    (which might not matter most of the time, but sometimes, it does).

    Okay, I see where you're coming from (and especially in that simple
    case).

    Personally (and YMMV), even here in this simple case I think that
    using pointers is not better but worse - and anyway isn't [in this
    form] available in most languages;

    That's a question of using the proper tool for the job. In C, that's
    pointer and pointer arithmetic because it's the simplest way to express something like this.

    in other cases (and languages)
    such constructs get yet more clumsy, and for my not very complex
    example - /[0-9]+(ABC)?x*foo/ - even a "catastrophe" concerning
    readability, error-proneness, and maintainability.

    Procedural code for matching strings constructed in this way is
    certainly much simpler¹ than the equally procedural code for a
    programmable automaton capable of interpreting regexes. Your statement
    is basically "If we assume that the code interpreting regexes doesn't
    exist, regexes need much less code than something equivalent which does
    exist." Without this assumption, the picture becomes a different one altogether.

    ¹ This doesn't even need a real state machine, just four subroutines
    executed in succession (and two of these can share an implementation as "matching ABC" and "matching foo" are both cases of matching a constant
    string.

    If that is what the other poster meant I'm fine with your answer;
    there's no need to even consider abandoning regular expressions
    in favor of explicitly codified parsing.

    This depends on the specific problem and the constraints applicable to a solution. For the common case, regexes, if easily available, are an
    obvious good solution. But not all cases are common.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 13:30:34 2024
    From Newsgroup: comp.lang.misc

    In article <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something
    like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface. >>>> YMMV.

    Assuming that p is a pointer to the current position in a string, e is a >>>pointer to the end of it (ie, point just past the last byte) and -
    that's important - both are pointers to unsigned quantities, the 'bulky' >>>C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a >>>general-purpose automaton programmed to recognize the same pattern
    (which might not matter most of the time, but sometimes, it does).

    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the
    match will fail. I didn't include the code for handling that because it >seemed pretty pointless for the example.

    That's rather the point though, isn't it? The program snippet
    (modulo the promotion to signed int via the "usual arithmetic
    conversions" before the subtraction and comparison giving you
    unexpected values; nothing to do with whether `char` is signed
    or not) is a snippet that advances a pointer while it points to
    a digit, starting at the current pointer position; that is, it
    just increments a pointer over a run of digits.

    But that's not the same as a regex matcher, which has a semantic
    notion of success or failure. I could run your snippet against
    a string such as, say, "ZZZZZZ" and it would "succeed" just as
    it would against an empty string or a string of one or more
    digits. And then there are other matters of context; does the
    user intend for the regexp to match the _whole_ string? Or any
    portion of the string (a la `grep`)? So, for example, does the
    string "aaa1234aaa" match `[0-9]+`? As written, the above
    snippet is actually closer to advancing `p` over `^[0-9]*`. One
    might differentiate between `*` and `+` after the fact, by
    examining `p` against some (presumably saved) source value, but
    that's more code.

    These are just not equivalent. That's not to say that your
    snippet is not _useful_ in context, but to pretend that it's the
    same as the regular expression is pointlessly reductive.

    By the way, something that _would_ match `^[0-9]+$` might be:

    term% cat mdp.c
    #include <assert.h>
    #include <stdbool.h>
    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    static bool
    mdigit(unsigned int c)
    {
    return c - '0' < 10;
    }

    bool
    mdp(const char *str, const char *estr)
    {
    if (str == NULL || estr == NULL || str == estr)
    return false;
    if (!mdigit(*str))
    return false;
    while (str < estr && mdigit(*str))
    str++;
    return str == estr;
    }

    bool
    probe(const char *s, bool expected)
    {
    if (mdp(s, s + strlen(s)) != expected) {
    fprintf(stderr, "test failure: `%s` (expected %s)\n",
    s, expected ? "true" : "false");
    return false;
    }
    return true;
    }

    int
    main(void)
    {
    bool success = true;

    success = probe("1234", true) && success;
    success = probe("", false) && success;
    success = probe("ab", false) && success;
    success = probe("0", true) && success;
    success = probe("0123456789", true) && success;
    success = probe("a0123456", false) && success;
    success = probe("0123456b", false) && success;
    success = probe("0123c456", false) && success;
    success = probe("0123#456", false) && success;

    return success ? EXIT_SUCCESS : EXIT_FAILURE;
    }
    term% cc -Wall -Wextra -Werror -pedantic -std=c11 mdp.c -o mdp
    term% ./mdp
    term% echo $?
    0
    term%

    Granted the test scaffolding and `#include` boilerplate makes
    this appear rather longer than it would be in context, but it's
    still not nearly as succinct.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 15:41:09 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something >>>>> like [0-9]+ can only be much worse, and that further abbreviations
    like \d+ are the better direction to go if targeting a good interface. >>>>> YMMV.

    Assuming that p is a pointer to the current position in a string, e is a >>>>pointer to the end of it (ie, point just past the last byte) and - >>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a >>>>general-purpose automaton programmed to recognize the same pattern >>>>(which might not matter most of the time, but sometimes, it does).

    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the >>match will fail. I didn't include the code for handling that because it >>seemed pretty pointless for the example.

    That's rather the point though, isn't it? The program snippet
    (modulo the promotion to signed int via the "usual arithmetic
    conversions" before the subtraction and comparison giving you
    unexpected values; nothing to do with whether `char` is signed
    or not) is a snippet that advances a pointer while it points to
    a digit, starting at the current pointer position; that is, it
    just increments a pointer over a run of digits.

    That's the core part of matching someting equivalent to the regex [0-9]+
    and the only part of it is which is at least remotely interesting.

    But that's not the same as a regex matcher, which has a semantic
    notion of success or failure. I could run your snippet against
    a string such as, say, "ZZZZZZ" and it would "succeed" just as
    it would against an empty string or a string of one or more
    digits.

    Why do you believe that p being equivalent to the starting position
    would be considered a "successful match", considering that this
    obviously doesn't make any sense?

    [...]

    By the way, something that _would_ match `^[0-9]+$` might be:

    [too much code]

    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to
    the problem of recognizing a digit.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 15:52:41 2024
    From Newsgroup: comp.lang.misc

    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]


    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;

    This needs to be

    while (c = *p, c && c - '0' > 9) ++p
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 17:17:46 2024
    From Newsgroup: comp.lang.misc

    In article <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [snip]
    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the >>>match will fail. I didn't include the code for handling that because it >>>seemed pretty pointless for the example.

    That's rather the point though, isn't it? The program snippet
    (modulo the promotion to signed int via the "usual arithmetic
    conversions" before the subtraction and comparison giving you
    unexpected values; nothing to do with whether `char` is signed
    or not) is a snippet that advances a pointer while it points to
    a digit, starting at the current pointer position; that is, it
    just increments a pointer over a run of digits.

    That's the core part of matching someting equivalent to the regex [0-9]+
    and the only part of it is which is at least remotely interesting.

    Not really, no. The interesting thing in this case appears to
    be knowing whether or not the match succeeded, but you omited
    that part.

    But that's not the same as a regex matcher, which has a semantic
    notion of success or failure. I could run your snippet against
    a string such as, say, "ZZZZZZ" and it would "succeed" just as
    it would against an empty string or a string of one or more
    digits.

    Why do you believe that p being equivalent to the starting position
    would be considered a "successful match", considering that this
    obviously doesn't make any sense?

    Because absent any surrounding context, there's no indication
    that the source is even saved. You'll note that I did mention
    that as a means to differentiate later on, but that's not the
    snippet you posted.

    [...]

    By the way, something that _would_ match `^[0-9]+$` might be:

    [too much code]

    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to
    the problem of recognizing a digit.

    This is wrong in many ways. Did you actually test that program?

    First of all, why `"string.h"` and not `<string.h>`? Ok, that's
    not technically an error, but it's certainly unconventional, and
    raises questions that are ultimately a distraction.

    Second, suppose that `argc==0` (yes, this can happen under
    POSIX).

    Third, the loop: why `> 10`? Don't you mean `< 10`? You are
    trying to match digits, not non-digits.

    Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
    at the end, but `!c` there means you've reached the end of the
    string; which should be success.

    Fifth and finally, you `return 0;` which is EXIT_SUCCESS, in the
    failure case.

    Compare:

    #include <regex.h>
    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>

    int
    main(int argc, char *argv[])
    {
    regex_t reprog;
    int ret;

    if (argc != 2) {
    fprintf(stderr, "Usage: regexp pattern\n");
    return(EXIT_FAILURE);
    }
    (void)regcomp(&reprog, "^[0-9]+$", REG_EXTENDED | REG_NOSUB);
    ret = regexec(&reprog, argv[1], 0, NULL, 0);
    regfree(&reprog);

    return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
    }

    This is only marginally longer, but is correct.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 17:18:26 2024
    From Newsgroup: comp.lang.misc

    In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]


    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;

    This needs to be

    while (c = *p, c && c - '0' > 9) ++p

    No, that's still wrong. Try actually running it.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 17:35:29 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]


    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;

    This needs to be

    while (c = *p, c && c - '0' > 9) ++p

    No, that's still wrong. Try actually running it.

    If you know something that's wrong with that, why not write it instead
    of utilizing the claim for pointless (and wrong) snide remarks?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 17:43:24 2024
    From Newsgroup: comp.lang.misc

    In article <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]


    Something which would match [0-9]+ in its first argument (if any) would >>>> be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;

    This needs to be

    while (c = *p, c && c - '0' > 9) ++p

    No, that's still wrong. Try actually running it.

    If you know something that's wrong with that, why not write it instead
    of utilizing the claim for pointless (and wrong) snide remarks?

    I did, at length, in my other post.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 17:43:59 2024
    From Newsgroup: comp.lang.misc

    In article <vhqfrs$bit$1@reader2.panix.com>,
    Dan Cross <cross@spitfire.i.gajendra.net> wrote:
    In article <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote: >>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]


    Something which would match [0-9]+ in its first argument (if any) would >>>>> be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;

    This needs to be

    while (c = *p, c && c - '0' > 9) ++p

    No, that's still wrong. Try actually running it.

    If you know something that's wrong with that, why not write it instead
    of utilizing the claim for pointless (and wrong) snide remarks?

    I did, at length, in my other post.

    Cf. <vhqebq$c71$1@reader2.panix.com>

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 17:48:37 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [snip]
    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the >>>>match will fail. I didn't include the code for handling that because it >>>>seemed pretty pointless for the example.

    That's rather the point though, isn't it? The program snippet
    (modulo the promotion to signed int via the "usual arithmetic
    conversions" before the subtraction and comparison giving you
    unexpected values; nothing to do with whether `char` is signed
    or not) is a snippet that advances a pointer while it points to
    a digit, starting at the current pointer position; that is, it
    just increments a pointer over a run of digits.

    That's the core part of matching someting equivalent to the regex [0-9]+ >>and the only part of it is which is at least remotely interesting.

    Not really, no. The interesting thing in this case appears to
    be knowing whether or not the match succeeded, but you omited
    that part.

    This of interest to you as it enables you to base an 'argumentation'
    (sarcasm) on arbitrary assumptions you've chosen to make. It's not
    something I consider interesting and it's besides the point of the
    example I posted.

    But that's not the same as a regex matcher, which has a semantic
    notion of success or failure. I could run your snippet against
    a string such as, say, "ZZZZZZ" and it would "succeed" just as
    it would against an empty string or a string of one or more
    digits.

    Why do you believe that p being equivalent to the starting position
    would be considered a "successful match", considering that this
    obviously doesn't make any sense?

    Because absent any surrounding context, there's no indication
    that the source is even saved.

    A text usually doesn't contain information about things which aren't
    part of its content. I congratulate you to this rather obvious observation.

    [...]

    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to
    the problem of recognizing a digit.

    This is wrong in many ways. Did you actually test that program?

    First of all, why `"string.h"` and not `<string.h>`? Ok, that's
    not technically an error, but it's certainly unconventional, and
    raises questions that are ultimately a distraction.

    Such as your paragraph above.

    Second, suppose that `argc==0` (yes, this can happen under
    POSIX).

    It can happen in case of some piece of functionally hostile software intentionally creating such a situation. Tangential, irrelevant
    point. If you break it, you get to keep the parts.

    Third, the loop: why `> 10`? Don't you mean `< 10`? You are
    trying to match digits, not non-digits.

    Mistake I made. The opposite of < 10 is > 9.

    Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
    at the end, but `!c` there means you've reached the end of the
    string; which should be success.

    Mistake you made: [0-9]+ matches if there's at least one digit in the
    string. That's why the loop terminates once one was found. In this case,
    c cannot be 0.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:12:34 2024
    From Newsgroup: comp.lang.misc

    In article <87o727rwga.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Something which would match [0-9]+ in its first argument (if any) would >>>be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to >>>the problem of recognizing a digit.

    This is wrong in many ways. Did you actually test that program?

    First of all, why `"string.h"` and not `<string.h>`? Ok, that's
    not technically an error, but it's certainly unconventional, and
    raises questions that are ultimately a distraction.

    Such as your paragraph above.

    Second, suppose that `argc==0` (yes, this can happen under
    POSIX).

    It can happen in case of some piece of functionally hostile software >intentionally creating such a situation. Tangential, irrelevant
    point. If you break it, you get to keep the parts.

    Third, the loop: why `> 10`? Don't you mean `< 10`? You are
    trying to match digits, not non-digits.

    Mistake I made. The opposite of < 10 is > 9.

    I see. So you want to skip non-digits and exit the first time
    you see a digit. Ok, fair enough, though that program has
    already been written, and is called `grep`.

    Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
    at the end, but `!c` there means you've reached the end of the
    string; which should be success.

    Mistake you made: [0-9]+ matches if there's at least one digit in the
    string. That's why the loop terminates once one was found. In this case,
    c cannot be 0.

    Ah, you are trying to match `[0-9]` (though you're calling it
    `[0-9]+`). Yeah, your program was not at all equivalent to one
    I wrote, though this is what you posted in response to mine, so
    I assumed you were trying to emulate that behavior (matching
    `^[0-9]+$`).

    But I see above that you mentioned `[0-9]+`. But as I mentioned
    above, really you're just matching any digit, so you may as well
    be matching `[0-9]`; again, this not the same as the actual
    regexp, because you are ignoring the semantics of what regular
    expressions actually describe.

    In any event, this seems simpler than what you posted:

    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>

    int
    main(int argc, char *argv[])
    {
    if (argc != 2) {
    fprintf(stderr, "Usage: matchd <str>\n");
    return EXIT_FAILURE;
    }

    for (const char *p = argv[1]; *p != '\0'; p++)
    if ('0' <= *p && *p <= '9')
    return EXIT_SUCCESS;

    return EXIT_FAILURE;
    }

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:14:48 2024
    From Newsgroup: comp.lang.misc

    Rainer Weikusat <rweikusat@talktalk.net> writes:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something >>>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>>> like \d+ are the better direction to go if targeting a good interface. >>>>>> YMMV.

    Assuming that p is a pointer to the current position in a string, e is a >>>>>pointer to the end of it (ie, point just past the last byte) and - >>>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>>C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a >>>>>general-purpose automaton programmed to recognize the same pattern >>>>>(which might not matter most of the time, but sometimes, it does).

    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the >>>match will fail. I didn't include the code for handling that because it >>>seemed pretty pointless for the example.

    That's rather the point though, isn't it? The program snippet
    (modulo the promotion to signed int via the "usual arithmetic
    conversions" before the subtraction and comparison giving you
    unexpected values; nothing to do with whether `char` is signed
    or not) is a snippet that advances a pointer while it points to
    a digit, starting at the current pointer position; that is, it
    just increments a pointer over a run of digits.

    That's the core part of matching someting equivalent to the regex [0-9]+
    and the only part of it is which is at least remotely interesting.

    But that's not the same as a regex matcher, which has a semantic
    notion of success or failure. I could run your snippet against
    a string such as, say, "ZZZZZZ" and it would "succeed" just as
    it would against an empty string or a string of one or more
    digits.

    Why do you believe that p being equivalent to the starting position
    would be considered a "successful match", considering that this
    obviously doesn't make any sense?

    [...]

    By the way, something that _would_ match `^[0-9]+$` might be:

    [too much code]

    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to
    the problem of recognizing a digit.

    Personally, I'd use:

    $ cat /tmp/a.c
    #include <stdint.h>
    #include <string.h>

    int
    main(int argc, const char **argv)
    {
    char *cp;
    uint64_t value;

    if (argc < 2) return 1;

    value = strtoull(argv[1], &cp, 10);
    if ((cp == argv[1])
    || (*cp != '\0')) {
    return 1;
    }
    return 0;
    }
    $ cc -o /tmp/a /tmp/a.c
    $ /tmp/a 13254
    $ echo $?
    0
    $ /tmp/a 23v23
    $ echo $?
    1
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:18:04 2024
    From Newsgroup: comp.lang.misc

    On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
    On Thu, 21 Nov 2024 19:12:03 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> boring babbled:
    On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    I'm curious what you mean by Regexps presented in a "procedural" form.
    Can you give some examples?

    Here is an example: using a regex match to capture a C comment /* ... */
    in Lex compared to just recognizing the start sequence /* and handling
    the discarding of the comment in the action.

    Without non-greedy repetition matching, the regex for a C comment is
    quite obtuse. The procedural handling is straightforward: read
    characters until you see a * immediately followed by a /.

    Its not that simple I'm afraid since comments can be commented out.

    Umm, no.

    eg:

    // int i; /*

    This /* sequence is inside a // comment, and so the machinery that
    recognizes /* as the start of a comment would never see it.

    Just like "int i;" is in a string literal and so not recognized
    as a keyword, whitespace, identifier and semicolon.

    int j;
    /*
    int k;
    */
    ++j;

    A C99 and C++ compiler would see "int j" and compile it, a regex would
    simply remove everything from the first /* to */.

    No, it won't, because that's not how regexes are used in a lexical
    analyzer. At the start of the input, the lexical analyzer faces
    the characters "// int i; /*\n". This will trigger the pattern match
    for // comments. Essentially that entire sequence through the newline
    is treated as a kind of token, equivalent to a space.

    Once a token is recognized and removed from the input, it is gone;
    no other regular expression can match into it.

    Also the same probably applies to #ifdef's.

    Lexically analyzing C requires implementing the translation phases
    as described in the standard. There are preprocessor phases which
    delimit the input into preprocessor tokens (pp-tokens). Comments
    are stripped in preprocessing. But logical lines (backslash
    continuations) are recognized below comments; i.e. this is one
    comment:

    \\ comment \
    split \
    into \
    physical \
    lines

    A lexical scanner can have an input routine which transparently handles
    this low-level detail, so that it doesn't have to deal with the
    line continuations in every token pattern.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:19:30 2024
    From Newsgroup: comp.lang.misc

    On 2024-11-22, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 21.11.2024 20:12, Kaz Kylheku wrote:
    [...]

    In the wild, you see regexes being used for all sorts of stupid stuff,

    No one can prevent folks using features for stupid things. Yes.

    But the thing is that "modern" regular expressions (Perl regex and its
    progeny) have features that are designed to exclusively cater to these
    folks.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:22:45 2024
    From Newsgroup: comp.lang.misc

    scott@slp53.sl.home (Scott Lurndal) writes:
    Rainer Weikusat <rweikusat@talktalk.net> writes: >>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something >>>>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>>>> like \d+ are the better direction to go if targeting a good interface. >>>>>>> YMMV.

    Assuming that p is a pointer to the current position in a string, e is a >>>>>>pointer to the end of it (ie, point just past the last byte) and - >>>>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>>>C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a >>>>>>general-purpose automaton programmed to recognize the same pattern >>>>>>(which might not matter most of the time, but sometimes, it does).

    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the >>>>match will fail. I didn't include the code for handling that because it >>>>seemed pretty pointless for the example.

    That's rather the point though, isn't it? The program snippet
    (modulo the promotion to signed int via the "usual arithmetic
    conversions" before the subtraction and comparison giving you
    unexpected values; nothing to do with whether `char` is signed
    or not) is a snippet that advances a pointer while it points to
    a digit, starting at the current pointer position; that is, it
    just increments a pointer over a run of digits.

    That's the core part of matching someting equivalent to the regex [0-9]+ >>and the only part of it is which is at least remotely interesting.

    But that's not the same as a regex matcher, which has a semantic
    notion of success or failure. I could run your snippet against
    a string such as, say, "ZZZZZZ" and it would "succeed" just as
    it would against an empty string or a string of one or more
    digits.

    Why do you believe that p being equivalent to the starting position
    would be considered a "successful match", considering that this
    obviously doesn't make any sense?

    [...]

    By the way, something that _would_ match `^[0-9]+$` might be:

    [too much code]

    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to
    the problem of recognizing a digit.

    Personally, I'd use:

    Albeit this is limited to strings of digits that sum to less than
    ULONG_MAX...



    $ cat /tmp/a.c
    #include <stdint.h>
    #include <string.h>

    int
    main(int argc, const char **argv)
    {
    char *cp;
    uint64_t value;

    if (argc < 2) return 1;

    value = strtoull(argv[1], &cp, 10);
    if ((cp == argv[1])
    || (*cp != '\0')) {
    return 1;
    }
    return 0;
    }
    $ cc -o /tmp/a /tmp/a.c
    $ /tmp/a 13254
    $ echo $?
    0
    $ /tmp/a 23v23
    $ echo $?
    1
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:30:31 2024
    From Newsgroup: comp.lang.misc

    In article <VZ30P.4664$YSkc.1894@fx40.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    Rainer Weikusat <rweikusat@talktalk.net> writes: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for something >>>>>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>>>>> like \d+ are the better direction to go if targeting a good interface. >>>>>>>> YMMV.

    Assuming that p is a pointer to the current position in a string, e is a >>>>>>>pointer to the end of it (ie, point just past the last byte) and - >>>>>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>>>>C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a >>>>>>>general-purpose automaton programmed to recognize the same pattern >>>>>>>(which might not matter most of the time, but sometimes, it does). >>>>>>
    It's also not exactly right. `[0-9]+` would match one or more
    characters; this possibly matches 0 (ie, if `p` pointed to
    something that wasn't a digit).

    The regex won't match any digits if there aren't any. In this case, the >>>>>match will fail. I didn't include the code for handling that because it >>>>>seemed pretty pointless for the example.

    That's rather the point though, isn't it? The program snippet
    (modulo the promotion to signed int via the "usual arithmetic
    conversions" before the subtraction and comparison giving you
    unexpected values; nothing to do with whether `char` is signed
    or not) is a snippet that advances a pointer while it points to
    a digit, starting at the current pointer position; that is, it
    just increments a pointer over a run of digits.

    That's the core part of matching someting equivalent to the regex [0-9]+ >>>and the only part of it is which is at least remotely interesting.

    But that's not the same as a regex matcher, which has a semantic
    notion of success or failure. I could run your snippet against
    a string such as, say, "ZZZZZZ" and it would "succeed" just as
    it would against an empty string or a string of one or more
    digits.

    Why do you believe that p being equivalent to the starting position
    would be considered a "successful match", considering that this
    obviously doesn't make any sense?

    [...]

    By the way, something that _would_ match `^[0-9]+$` might be:

    [too much code]

    Something which would match [0-9]+ in its first argument (if any) would >>>be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to >>>the problem of recognizing a digit.

    Personally, I'd use:

    Albeit this is limited to strings of digits that sum to less than >ULONG_MAX...

    It's not quite equivalent to his program, which just exit's with
    success if it sees any input string with a digit in it; your's
    is closer to what I wrote, which matches `^[0-9]+$`. His is not
    an interesting program and certainly not a recognizable
    equivalent a regular expression matcher in any reasonable sense,
    but I think the cognitive dissonance is too strong to get that
    across.

    - Dan C.

    $ cat /tmp/a.c
    #include <stdint.h>
    #include <string.h>

    int
    main(int argc, const char **argv)
    {
    char *cp;
    uint64_t value;

    if (argc < 2) return 1;

    value = strtoull(argv[1], &cp, 10);
    if ((cp == argv[1])
    || (*cp != '\0')) {
    return 1;
    }
    return 0;
    }
    $ cc -o /tmp/a /tmp/a.c
    $ /tmp/a 13254
    $ echo $?
    0
    $ /tmp/a 23v23
    $ echo $?
    1


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:48:55 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    [...]

    In any event, this seems simpler than what you posted:

    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>

    int
    main(int argc, char *argv[])
    {
    if (argc != 2) {
    fprintf(stderr, "Usage: matchd <str>\n");
    return EXIT_FAILURE;
    }

    for (const char *p = argv[1]; *p != '\0'; p++)
    if ('0' <= *p && *p <= '9')
    return EXIT_SUCCESS;

    return EXIT_FAILURE;
    }

    It's not only 4 lines longer but in just about every individual aspect syntactically more complicated and more messy and functionally more
    clumsy. This is particularly noticable in the loop

    for (const char *p = argv[1]; *p != '\0'; p++)
    if ('0' <= *p && *p <= '9')
    return EXIT_SUCCESS;

    the loop header containing a spuriously qualified variable declaration,
    the loop body and half of the termination condition. The other half then follows as special-case in the otherwise useless loop body.

    It looks like a copy of my code which each individual bit redesigned
    under the guiding principle of "Can we make this more complicated?", eg,

    char **argv

    declares an array of pointers (as each pointer in C points to an array)
    and

    char *argv[]

    accomplishes exactly the same but uses both more characters and more
    different kinds of characters.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 18:59:43 2024
    From Newsgroup: comp.lang.misc

    scott@slp53.sl.home (Scott Lurndal) writes:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]

    Something which would match [0-9]+ in its first argument (if any) would
    be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to
    the problem of recognizing a digit.

    Personally, I'd use:

    $ cat /tmp/a.c
    #include <stdint.h>
    #include <string.h>

    int
    main(int argc, const char **argv)
    {
    char *cp;
    uint64_t value;

    if (argc < 2) return 1;

    value = strtoull(argv[1], &cp, 10);
    if ((cp == argv[1])
    || (*cp != '\0')) {
    return 1;
    }
    return 0;
    }

    This will accept a string of digits whose numerical value is <=
    ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
    content limits.

    return !strstr(argv[1], "0123456789");

    would be a better approximation, just a much more complicated algorithm
    than necessary. Even in strictly conforming ISO-C "digitness" of a
    character can be determined by a simple calculation instead of some kind
    of search loop.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 19:05:42 2024
    From Newsgroup: comp.lang.misc

    In article <87h67zrtns.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    [...]

    In any event, this seems simpler than what you posted:

    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>

    int
    main(int argc, char *argv[])
    {
    if (argc != 2) {
    fprintf(stderr, "Usage: matchd <str>\n");
    return EXIT_FAILURE;
    }

    for (const char *p = argv[1]; *p != '\0'; p++)
    if ('0' <= *p && *p <= '9')
    return EXIT_SUCCESS;

    return EXIT_FAILURE;
    }

    It's not only 4 lines longer but in just about every individual aspect >syntactically more complicated and more messy and functionally more
    clumsy.

    That's a lot of opinion, and not particularly well-founded
    opinion at that, given that your code was incorrect to begin
    with.

    This is particularly noticable in the loop

    for (const char *p = argv[1]; *p != '\0'; p++)
    if ('0' <= *p && *p <= '9')
    return EXIT_SUCCESS;

    the loop header containing a spuriously qualified variable declaration,

    Ibid. Const qualifying a pointer that I'm not going to assign
    through is just good hygiene, IMHO.

    the loop body and half of the termination condition.

    I think you're trying to project a value judgement onto that
    loop in order to make it fit a particular world view, but I
    think this is an odd way to look at it.

    Another way to loop at it is that the loop is only concerned
    with the iteration over the string, while the body is concerned
    with applying some predicate to the element, and doing something
    if that predicate evaluates it to true.

    The other half then
    follows as special-case in the otherwise useless loop body.

    That's a way to look at it, but I submit that's an outlier point
    of view.

    It looks like a copy of my code which each individual bit redesigned
    under the guiding principle of "Can we make this more complicated?", eg,

    Uh, no.

    char **argv

    declares an array of pointers

    No, it declares a pointer to a pointer to char.

    (as each pointer in C points to an array)

    That's absolutely not true. A pointer in C may refer to
    an array, or a scalar. Consider,

    char c;
    char *p = &c;
    char **pp = &p;

    For a concrete example of how this works in a real function,
    consider the second argument to `strtol` et al in the standard
    library.

    and

    char *argv[]

    accomplishes exactly the same but uses both more characters and more >different kinds of characters.

    "more characters" is a poor metric.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 19:15:07 2024
    From Newsgroup: comp.lang.misc

    In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]

    Something which would match [0-9]+ in its first argument (if any) would >>>be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to >>>the problem of recognizing a digit.

    Personally, I'd use:

    $ cat /tmp/a.c
    #include <stdint.h>
    #include <string.h>

    int
    main(int argc, const char **argv)
    {
    char *cp;
    uint64_t value;

    if (argc < 2) return 1;

    value = strtoull(argv[1], &cp, 10);
    if ((cp == argv[1])
    || (*cp != '\0')) {
    return 1;
    }
    return 0;
    }

    This will accept a string of digits whose numerical value is <=
    ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
    content limits.

    He acknowledged this already.

    return !strstr(argv[1], "0123456789");

    would be a better approximation,

    No it wouldn't. That's not even close. `strstr` looks for an
    instance of its second argument in its first, not an instance of
    any character in it's second argument in its first. Perhaps you
    meant something with `strspn` or similar. E.g.,

    const char *p = argv[1] + strspn(argv[1], "0123456789");
    return *p != '\0';

    just a much more complicated algorithm
    than necessary. Even in strictly conforming ISO-C "digitness" of a
    character can be determined by a simple calculation instead of some kind
    of search loop.

    Yes, one can do that, but why bother?

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 20:20:06 2024
    From Newsgroup: comp.lang.misc

    On 22.11.2024 19:19, Kaz Kylheku wrote:
    On 2024-11-22, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 21.11.2024 20:12, Kaz Kylheku wrote:
    [...]

    In the wild, you see regexes being used for all sorts of stupid stuff,

    No one can prevent folks using features for stupid things. Yes.

    But the thing is that "modern" regular expressions (Perl regex and its progeny) have features that are designed to exclusively cater to these
    folks.

    Which ones are you specifically thinking of?

    Since I'm not using Perl I don't know all the Perl RE details. Besides
    the basic REs I'm aware of the abbreviations (like '\d') (that I like),
    then extensions of Chomsky-3 (like back-references) (that I also like
    to have in cases I need them; but one must know what we buy with them),
    then the minimum-match (as opposed to matching the longest substring)
    (which I think is useful to simplify some types of expressions), and
    there was another one that evades my memories, something like context
    dependent patterns (also useful), and wasn't there also some syntax to
    match subexpression-hierarchies (useful as well) (similar like in GNU
    Awk's gensub() (probably in a more primitive variant there), and also
    existing in Kornshell patterns that also supports some more from above [Perl-]features, like the abbreviations).

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 19:24:23 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    [...]

    In any event, this seems simpler than what you posted:

    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>

    int
    main(int argc, char *argv[])
    {
    if (argc != 2) {
    fprintf(stderr, "Usage: matchd <str>\n");
    return EXIT_FAILURE;
    }

    for (const char *p = argv[1]; *p != '\0'; p++)
    if ('0' <= *p && *p <= '9')
    return EXIT_SUCCESS;

    return EXIT_FAILURE;
    }

    It's not only 4 lines longer but in just about every individual aspect >>syntactically more complicated and more messy and functionally more
    clumsy.

    That's a lot of opinion, and not particularly well-founded
    opinion at that, given that your code was incorrect to begin
    with.

    That's not at all an opinion but an observation. My opinion on this is
    that this is either a poor man's attempt at winning an obfuscation
    context or - simpler - exemplary bad code.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rainer Weikusat@rweikusat@talktalk.net to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 19:26:07 2024
    From Newsgroup: comp.lang.misc

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]

    Something which would match [0-9]+ in its first argument (if any) would >>>>be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to >>>>the problem of recognizing a digit.

    Personally, I'd use:

    $ cat /tmp/a.c
    #include <stdint.h>
    #include <string.h>

    int
    main(int argc, const char **argv)
    {
    char *cp;
    uint64_t value;

    if (argc < 2) return 1;

    value = strtoull(argv[1], &cp, 10);
    if ((cp == argv[1])
    || (*cp != '\0')) {
    return 1;
    }
    return 0;
    }

    This will accept a string of digits whose numerical value is <=
    ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
    content limits.

    He acknowledged this already.

    return !strstr(argv[1], "0123456789");

    would be a better approximation,

    No it wouldn't. That's not even close. `strstr` looks for an
    instance of its second argument in its first, not an instance of
    any character in it's second argument in its first. Perhaps you
    meant something with `strspn` or similar. E.g.,

    const char *p = argv[1] + strspn(argv[1], "0123456789");
    return *p != '\0';

    My bad.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 20:33:24 2024
    From Newsgroup: comp.lang.misc

    On 22.11.2024 12:56, Rainer Weikusat wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 20.11.2024 18:50, Rainer Weikusat wrote:
    [...]
    while (p < e && *p - '0' < 10) ++p;

    That's not too bad. And it's really a hell lot faster than a
    general-purpose automaton programmed to recognize the same pattern
    (which might not matter most of the time, but sometimes, it does).

    Okay, I see where you're coming from (and especially in that simple
    case).

    Personally (and YMMV), even here in this simple case I think that
    using pointers is not better but worse - and anyway isn't [in this
    form] available in most languages;

    That's a question of using the proper tool for the job. In C, that's
    pointer and pointer arithmetic because it's the simplest way to express something like this.

    Yes, in "C" you'd use that primitive (error-prone) pointer feature.
    That's what I said. And that in other languages it's less terse than
    in "C" but equally error-prone if you have to create all the parsing
    code yourself (without an existing engine and in a non-standard way).
    And if you extend the expression to parse it's IME much simpler done
    in Regex than adjusting the algorithm of the ad hoc procedural code.


    in other cases (and languages)
    such constructs get yet more clumsy, and for my not very complex
    example - /[0-9]+(ABC)?x*foo/ - even a "catastrophe" concerning
    readability, error-proneness, and maintainability.

    Procedural code for matching strings constructed in this way is
    certainly much simpler¹ than the equally procedural code for a
    programmable automaton capable of interpreting regexes.

    The point is that Regexps and the equivalence to FSA (with guaranteed
    runtime complexity) is an [efficient] abstraction with a formalized
    syntax; that are huge advantages compared to ad hoc parsing code in C
    (or in any other language).

    Your statement
    is basically "If we assume that the code interpreting regexes doesn't
    exist, regexes need much less code than something equivalent which does exist." Without this assumption, the picture becomes a different one altogether.

    I don't speak of assumptions. I speak about the fact that there's a well-understood model with existing [parsing-]implementations already
    available to handle a huge class of algorithms in a standardized way
    with a guaranteed runtime-efficiency and in an error-resilient way.

    Janis

    [...]

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 19:46:31 2024
    From Newsgroup: comp.lang.misc

    In article <878qtbrs0o.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

    [...]

    In any event, this seems simpler than what you posted:

    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>

    int
    main(int argc, char *argv[])
    {
    if (argc != 2) {
    fprintf(stderr, "Usage: matchd <str>\n");
    return EXIT_FAILURE;
    }

    for (const char *p = argv[1]; *p != '\0'; p++)
    if ('0' <= *p && *p <= '9')
    return EXIT_SUCCESS;

    return EXIT_FAILURE;
    }

    It's not only 4 lines longer but in just about every individual aspect >>>syntactically more complicated and more messy and functionally more >>>clumsy.

    That's a lot of opinion, and not particularly well-founded
    opinion at that, given that your code was incorrect to begin
    with.

    That's not at all an opinion but an observation. My opinion on this is
    that this is either a poor man's attempt at winning an obfuscation
    context or - simpler - exemplary bad code.

    Opinion (noun)
    a view or judgment formed about something, not necessarily based on
    fact or knowledge. "I'm writing to voice my opinion on an issue of
    little importance"

    You mentioned snark earlier. Physician, heal thyself.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 19:51:18 2024
    From Newsgroup: comp.lang.misc

    In article <874j3zrrxs.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
    Rainer Weikusat <rweikusat@talktalk.net> wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    [...]

    Something which would match [0-9]+ in its first argument (if any) would >>>>>be:

    #include "string.h"
    #include "stdlib.h"

    int main(int argc, char **argv)
    {
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
    }

    but that's 14 lines of text, 13 of which have absolutely no relation to >>>>>the problem of recognizing a digit.

    Personally, I'd use:

    $ cat /tmp/a.c
    #include <stdint.h>
    #include <string.h>

    int
    main(int argc, const char **argv)
    {
    char *cp;
    uint64_t value;

    if (argc < 2) return 1;

    value = strtoull(argv[1], &cp, 10);
    if ((cp == argv[1])
    || (*cp != '\0')) {
    return 1;
    }
    return 0;
    }

    This will accept a string of digits whose numerical value is <= >>>ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
    content limits.

    He acknowledged this already.

    return !strstr(argv[1], "0123456789");

    would be a better approximation,

    No it wouldn't. That's not even close. `strstr` looks for an
    instance of its second argument in its first, not an instance of
    any character in it's second argument in its first. Perhaps you
    meant something with `strspn` or similar. E.g.,

    const char *p = argv[1] + strspn(argv[1], "0123456789");
    return *p != '\0';

    My bad.

    You've made a lot of "bad"s in this thread, and been rude about
    it to boot, crying foul when someone's pointed out ways that
    your code is deficient; claiming offense at what you perceive as
    "snark" while dishing the same out in kind, making basic errors
    that show you haven't done the barest minimum of testing, and
    making statements that show you have, at best, a limited grasp
    on the language you're choosing to use.

    I'm done being polite. My conclusion is that perhaps you are
    not as up on these things as you seem to think that you are.

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Nov 22 20:41:21 2024
    From Newsgroup: comp.lang.misc

    On Fri, 22 Nov 2024 12:47:16 +0100, Janis Papanagnou wrote:

    On 21.11.2024 23:05, Lawrence D'Oliveiro wrote:

    Another handy one is “\b” for word boundaries.

    I prefer \< and \> (that are quite commonly used) for such structural
    things ...

    “\<” only matches the beginning of a word, “\>” only matches the end, “\b” matches both <https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Backslash.html>.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@dastardlyhq.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sat Nov 23 11:40:37 2024
    From Newsgroup: comp.lang.misc

    On Fri, 22 Nov 2024 18:18:04 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> gabbled:
    On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
    Its not that simple I'm afraid since comments can be commented out.

    Umm, no.

    Umm, yes, they can.

    eg:

    // int i; /*

    This /* sequence is inside a // comment, and so the machinery that
    recognizes /* as the start of a comment would never see it.

    Yes, thats kind of the point. You seem to be arguing against yourself.

    A C99 and C++ compiler would see "int j" and compile it, a regex would
    simply remove everything from the first /* to */.

    No, it won't, because that's not how regexes are used in a lexical

    Yes, it will.

    Also the same probably applies to #ifdef's.

    Lexically analyzing C requires implementing the translation phases
    as described in the standard. There are preprocessor phases which
    delimit the input into preprocessor tokens (pp-tokens). Comments
    are stripped in preprocessing. But logical lines (backslash
    continuations) are recognized below comments; i.e. this is one
    comment:

    Not sure what your point is. A regex cannot be used to parse C comments because its doesn't know C/C++ grammar.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sat Nov 23 18:17:41 2024
    From Newsgroup: comp.lang.misc

    On 11/20/2024 9:53 AM, Janis Papanagnou wrote:
    On 20.11.2024 12:46, Ed Morton wrote:

    Definitely. The most relevant statement about regexps is this:

    Some people, when confronted with a problem, think "I know, I'll use
    regular expressions." Now they have two problems.

    (Worth a scribbling on a WC wall.)


    Obviously regexps are very useful and commonplace but if you find you
    have to use some online site or other tools to help you write/understand
    one or just generally need more than a couple of minutes to
    write/understand it then it's time to back off and figure out a better
    way to write your code for the sake of whoever has to read it 6 months
    later (and usually for robustness too as it's hard to be sure all rainy
    day cases are handled correctly in a lengthy and/or complicated regexp).

    Regexps are nothing for newbies.

    The inherent fine thing with Regexps is that you can incrementally
    compose them[*].[**]

    It seems you haven't found a sensible way to work with them?
    (And I'm really astonished about that since I know you worked with
    Regexps for years if not decades.)

    I have no problem working with regexps, I just don't write lengthy or complicated regexps, just brief, simple BREs or EREs, and I don't
    restrict myself to trying to solve problems with a single regexp.

    In those cases where Regexps *are* the tool for a specific task -
    I don't expect you to use them where they are inappropriate?! -

    Right, I don't, but I see many people using them for tasks that could be
    done more clearly and robustly if not done with a single regexp.

    what would be the better solution[***] then?

    It all depends on the problem. For example, if you need to match an
    input string that must contain each of a, b, and c in any order then you
    could do that in awk with this regexp or similar:

    awk '/(a.*(b.*c|c.*b))|(b.*(a.*c|c.*a))|(c.*(a.*b|b.*a))/'

    or you could do it with this condition comprised of regexp segments:

    awk '/a/ && /b/ && /c/'

    I would prefer the second solution as it's more concise and easier to
    enhance (try adding "and d" to both).

    As another example, someone on StackOverflow recently said they had
    written the following regexp to isolate the last string before a set of
    parens in a line that contains multiple such strings, some of them
    nested, and they said it works in python:

    ^(?:^[^(]+\([^)]+\) \(([^(]+)\([^)]+\)\))|[^(]+\(([^(]+)\([^)]+\),\s([^\(]+)\([^)]+\)\s\([^\)]+\)\)|(?:(?:.*?)\((.*?)\(.*?\)\))|(?:[^(]+\(([^)]+)\))$

    I personally wouldn't consider anything remotely as lengthy or
    complicated as that regexp despite their assurances that it works, I'd
    use this any-awk script or similar instead:

    {
    rec = $0
    while ( match(rec, /\([^()]*)/) ) {
    tgt = substr($0,RSTART+1,RLENGTH-2)
    rec = substr(rec,1,RSTART-1) RS substr(rec,RSTART+1,RLENGTH-2)
    RS substr(rec,RSTART+RLENGTH)
    }
    gsub(/ *\([^()]*) */, "", tgt)
    print tgt
    }

    It's a bit more code but, unlike that regexp, anyone assigned to
    maintain this code in future can tell what it does with just a little
    thought (and maybe adding a debugging print in the loop if they aren't
    very familiar with awk), can then be sure it does what is required and
    nothing else, and could easily maintain/enhance it if necessary.

    Ed.


    Janis

    [*] Like the corresponding FSMs.

    [**] And you can also decompose them if they are merged in a huge
    expression, too large for you to grasp it. (BTW, I'm doing such decompositions also with other expressions in program code that
    are too bulky.)

    [***] Can you answer the question that another poster failed to do?


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.misc on Sun Nov 24 06:42:59 2024
    From Newsgroup: comp.lang.misc

    Rainer Weikusat <rweikusat@talktalk.net> writes:

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for
    something like [0-9]+ can only be much worse, and that further
    abbreviations like \d+ are the better direction to go if targeting
    a good interface. YMMV.

    Assuming that p is a pointer to the current position in a string, e
    is a pointer to the end of it (ie, point just past the last byte)
    and - that's important - both are pointers to unsigned quantities,
    the 'bulky' C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    To force the comparison to be done as unsigned:

    while (p < e && *p - '0' < 10u) ++p;
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.misc on Sun Nov 24 20:08:24 2024
    From Newsgroup: comp.lang.misc

    Kaz Kylheku <643-408-1753@kylheku.com> writes:

    Here is an example: using a regex match to capture a C comment /* ... */
    in Lex compared to just recognizing the start sequence /* and handling
    the discarding of the comment in the action.

    Without non-greedy repetition matching, the regex for a C comment is
    quite obtuse. The procedural handling is straightforward: read
    characters until you see a * immediately followed by a /.

    Regular expressions are neither greedy nor non-greedy. One of the
    key points of regular expressions is that they are declarative
    rather than procedural. Any procedural change of behavior overlaid
    on a regular expression is a property of the tool, not the regular
    expression. It's easy to write a regular expression that exactly
    matches a /* ... */ comment and that isn't hard to understand.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sat Oct 18 00:34:45 2025
    From Newsgroup: comp.lang.misc

    The message body is Copyright (C) 2025 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except as noted in
    the sig.

    On 27/08/2024 23:56, Johanne Fairchild wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    On Tue, 27 Aug 2024 03:15:16 -0000 (UTC), Sebastian wrote:

    In comp.unix.programmer Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    (And I have no idea about this “Black” thing. I just do my thing.)

    Black is a [bla bla bla]

    *Yawn*

    The guy was kindly and politely sharing information with you.

    He was sharing the information with _us_, and we're much more important.

    Lawrence was trying to make him feel like we shouldn't receive it or not receive it in context!

    Although there might be some value in the latter because if we killfile
    trolls and their followup chains we'll only receive the useful
    information if it's not given as a followup.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2025 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Fri Oct 17 17:13:04 2025
    From Newsgroup: comp.lang.misc

    Johanne Fairchild <jfairchild@tudado.org> writes:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Tue, 27 Aug 2024 03:15:16 -0000 (UTC), Sebastian wrote:
    In comp.unix.programmer Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    (And I have no idea about this “Black” thing. I just do my thing.)

    Black is a [bla bla bla]

    *Yawn*

    The guy was kindly and politely sharing information with you.

    For the sake of accuracy, here's what Sebastian wrote (more than a year
    ago):

    Black is a Python program that formats Python code
    almost exactly the way you formatted that snippet of Lisp
    code. It's just as ugly in Python as it is in Lisp. Black
    spreads by convincing organizations to mandate its use. It's
    utterly non-configurable on purpose, in order to guarantee
    that eventually, all Python code is made to be as ugly
    and unreadable as possible.

    This is more exaggerated opinion than information. Of course there's
    nothing wrong with that sharing an opinion, but there's also nothing
    wrong with responding to an inflammatory opinion with a yawn.

    Here's what the "black" man page says:

    NAME
    black - uncompromising Python code formatter

    SUMMARY
    black is the uncompromising Python code formatter. By using it,
    you agree to cede control over minutiae of hand-formatting. In
    return, Black gives you speed, determinism, and freedom from
    pycodestyle nagging about formatting. You will save time and
    mental energy for more important matters.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.misc on Sat Oct 18 02:58:46 2025
    From Newsgroup: comp.lang.misc

    The message body is Copyright (C) 2025 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except as noted in
    the sig.

    On 07/08/2024 14:43, Kaz Kylheku wrote:
    On 2024-08-06, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    Equivalent Lisp, for comparison:

    (setf a (cond (b (if c d e))
    (f (if g h i))
    (t j)))

    You can’t avoid the parentheses, but this, too, can be improved:

    (setf a
    (cond
    (b
    (if c d e)
    )
    (f
    (if g h i)
    )
    (t
    j
    )
    ) ; cond
    )

    Nobody is ever going to follow your idio(syncra)tic coding preferences
    for Lisp, that wouldn't pass code review in any Lisp shop, and result in patches being rejected in a FOSS setting.


    If "; cond" went inside the cond form then I'd accept it in general,
    ie. unless I have process or contractual reasons to do otherwise. The
    code has, to an extent greater than most efforts, been made of
    orthogonal syntactic pieces even for the least lisp-aware editors, but
    fails in that particular visual aid ("; cond") when subjected to a
    lisp-aware one (parenthesis matching).

    This is improved:

    (cond ;name-of-the-judgement-as-in-the-documentation
    (b
    (if c d e)
    )
    (f
    (if g h i)
    )
    (t
    j
    )
    ;cond ;name-of-the-judgement-as-in-the-documentation
    )

    I'd have some caveats about the patterns of the code it's going inside,
    ie, how varied does or will the file become but this result, in
    particular, yields to:

    - line-oriented processing and generation,
    - traceability,
    - lisp-aware editors,
    - lisp-unaware editors, and
    - printouts.

    Of course I probably wouldn't be doing medical, aerospace, submarine,
    or weapons development when I accept it in FOSS because of the typical restrictions on making any change at all after acceptance (which just
    means that "accept" has many different meanings and ought be taken to be
    a strictly process oriented word from the activity's GLOSSARY).

    The thing I worry about with coding standards is that they
    surreptitiously form a derived language that's the same from the
    computer's perspective but different among the reader's and you haven't
    really achieved much but improved future task estimation. It would be interesting to know how the costs shift around in practice and what are
    the implications for integrity in billing.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2025 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.misc on Sat Oct 18 03:09:24 2025
    From Newsgroup: comp.lang.misc

    On 06/08/2024 09:04, Sebastian wrote:

    a = b ? (c ? d : e) :
    f ? (g ? h : i) :
    j;

    I like rolling the operators and statement delimiters over to form the dropsies, you can read them like they introduce the role a line forms
    wrt. the previous lines and works when each line contains suitable
    parentheses to make the AST visually obvious:

    a = b ? (c ? d : e)
    : f ? (g ? h : i)
    : j
    ;

    primarily, b chooses something
    or else fallback to f choosing something
    or else fallback to j
    no more options

    --
    Tristan Wibberley
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.misc on Sat Oct 18 04:46:06 2025
    From Newsgroup: comp.lang.misc

    On Sat, 18 Oct 2025 02:58:46 +0100, Tristan Wibberley wrote:

    On 2024-08-06, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    (setf a
    (cond
    (b
    (if c d e)
    ) (f
    (if g h i)
    ) (t
    j
    )
    ) ; cond
    )

    If "; cond" went inside the cond form then I'd accept it in general

    It indicates that the closing statement bracket is for the “cond” construct. Moving it to elsewhere than that closing statement bracket
    would defeat the purpose.
    --- Synchronet 3.21a-Linux NewsLink 1.2