• Re: On overly rigid definitions (was Re: Command Languages Versus Programming Languages)

    From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 00:58:11 2024
    From Newsgroup: comp.lang.misc

    In article <veho4s$sghb$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 13/10/2024 21:29, Dan Cross wrote:
    In article <vegs0o$nh5t$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 13/10/2024 16:52, Dan Cross wrote:
    [snip]
    Sure. But the fact that any of these were going concerns is an
    existence proof that one _can_ take bytecodes targetted toward a
    "virtual" machine and execute it on silicon,
    making the
    distinction a lot more fluid than might be naively assumed, in
    turn exposing the silliness of this argument that centers around
    this weirdly overly-rigid definition of what a "compiler" is.

    I've implemented numerous compilers and interpreters over the last few
    decades (and have dabbled in emulators).

    To me the distinctions are clear enough because I have to work at the
    sharp end!

    I'm not sure why people want to try and be clever by blurring the roles
    of compiler and interpreter; that's not helpful at all.

    I'm not saying the two are the same; what I'm saying is that
    this arbitrary criteria that a compiler must emit a fully
    executable binary image is not just inadquate, but also wrong,
    as it renders separate compilation impossible. I am further
    saying that there are many different _types_ of compilers,
    including specialized tools that don't emit machine language.

    Sure, people can write emulators for machine code, which are a kind of
    interpreter, or they can implement bytecode in hardware; so what?

    That's exactly my point.

    So, then what, we do away with the concepts of 'compiler' and
    'interpreter'? Or allow them to be used interchangeably?

    I don't see how you can credibly draw that conclusion from what
    I've been saying.

    But it's really pretty straight-forward; a compiler effects a
    translation from one computer language to another (the
    definition from Aho et al). An interpreter takes a program
    written in some computer language and executes it. Of course
    there's some gray area here; is a load-and-go compiler a
    compiler in this sense (yes; it is still translating between
    its source language and a machine language) or an interpreter?
    (Possibly; after all, it's taking a source language and causing
    a program written in it to be executed.)

    Java is an interesting case in point here; the Java compiler is
    obviously a compiler; the JVM is an interpreter. I don't think
    anyone would dispute this. But by suggesting some hard and fast
    division that can be rigidly upheld in all cases we're ignoring
    so much nuance as to be reductive; but by pointing these things
    out, we see how inane it is to assert that a "proper compiler"
    is only one that takes a textual source input and emits machine
    code for a silicon target.

    Somehow I don't think it is useful to think of gcc as a interpreter for
    C, or CPython as an native code compiler for Python.

    I don't think anyone suggested that. But we _do_ have examples
    of true compilers emitting "code" for interpreters; cf LLVM and
    eBPF, which I mentioned previously in this thread, or compilers
    that emit code for hypothetical machines like MMIX, or compilers
    that emit instructions that aren't implemented everywhere, or
    more precisely are implemented by trap and emulation.

    That doesn't really affect what I do. Writing compiler backends for
    actual CPUs is hard work. Generating bytecode is a lot simpler.

    That really depends on the bytecode, doesn't it? The JVM is a
    complex beast;

    Is it? It's not to my taste, but it didn't look too scary to me. Whereas >modern CPU instruction sets are horrendous. (I normally target x64,
    which is described in 6 large volumes. RISC ones don't look much better,
    eg. RISC V with its dozens of extensions and special types)

    I dunno. Wirth wrote an Oberon compiler targeting MIPS in ~5000
    lines of code. It was pretty straight-forward.

    And most of those ten volumes in the SDM have to do with the
    privileged instruction set and details of the memory model like
    segmentation and paging, most of which don't impact the compiler
    author much at all: beyond, perhaps providing an intrinsic for
    the `rdmsr` and `wrmsr` instructions, I don't think you care
    much about MSRs, let alone VMX or the esoterica of under what
    locked cycles the hardware sets the "A" bit on page table
    entries on a TLB miss.

    Example of JVM:

    aload index Push a reference from local variable #index

    Ok. `leaq index(%rip), %rax; pushq %rax` isn't that hard either.

    MIPS or the unprivileged integer subset of RISC-V
    are pretty simple in comparison.

    (Especially in my case as I've devised myself, another distinction.
    Compilers usually target someone else's instruction set.)

    If you want one more distinction, it is this: with my compiler, the
    resultant binary is executed by a separate agency: the CPU. Or maybe the >>> OS loader will run it through an emulator.

    Python has a mode by which it will emit bytecode _files_, which
    can be separately loaded and interpreted; it even has an
    optimizing mode. Is that substantially different?

    Whether there is a discrete bytecode file is besides the point. (I
    generated such files for many years.)

    You still need software to execute it. Especially for dynamically typed >bytecode which doesn't lend itself easily to either hardware >implementations, or load-time native code translation.

    Sure. But if execution requires a "separate agency", and you
    acknowledge that could be a CPU or a separate program, how is
    that all that different than what Python _does_? That doesn't
    imply that the Python interpreter is the same as a CPU, or that
    an interpreter is the same as a compiler. But it does imply
    that the definitions being thrown about here aren't particularly
    good.

    With my interpreter, then *I* have to write the dispatch routines and
    write code to implement all the instructions.

    Again, I don't think that anyone disputes that interpreters
    exist. But insisting that they must take a particular shape is
    just wrong.

    What shape would that be? Generally they will need some /software/ to
    excute the instructions of the program being interpreted, as I said.
    Some JIT products may choose to do on-demand translation to native code.

    Is there anything else? I'd be interested in anything new!

    I actually meant to write that "insisting that _compilers_ take
    a specific shape is just wrong." But I think the point holds
    reasonably well for interpreters, as well: they need not
    directly interpret the text of a program; they may well create
    some sort of internal bytecode after several optimization and
    type checking steps, looking more like a load-and-go compiler
    than, say, the 6th Edition Unix shell.

    Comparison to Roslyn-style compilers blurs the distinction
    further still.

    (My compilers generate an intermediate language, a kind of VM, which is
    then processed further into native code.

    Then by the definition of this psuedonyminous guy I've been
    responding to, your compiler is not a "proper compiler", no?

    Actually mine is more of a compiler than many, since it directly
    generates native machine code. Others generally stop at ASM code (eg.
    gcc) or OBJ code, and will invoke separate programs to finish the job.

    The intermediate language here is just a step in the process.

    But I have also tried interpreting that VM; it just runs 20 times slower >>> than native code. That's what interpreting usually means: slow programs.) >>
    Not necessarily. The JVM does pretty good, quite honestly.

    But is it actually interpreting? Because if I generated such code for a >statically typed language, then I would first translate to native code,
    of any quality, since it's going to be faster than interpreting.

    Doesn't that reinforce my thesis that these things are much
    blurier than all this uninformed talk of a mythical "proper
    compiler" would lead one to believe?

    - Dan C.

    --- Synchronet 3.20a-Linux NewsLink 1.114