Forum: War Ensemble BBS

Re: On overly rigid definitions (was Re: Command Languages Versus Programming Languages)

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Mon Oct 14 00:58:11 2024

From Newsgroup: comp.lang.misc

In article <veho4s$sghb$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 13/10/2024 21:29, Dan Cross wrote:

In article <vegs0o$nh5t$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 13/10/2024 16:52, Dan Cross wrote:

[snip]
Sure. But the fact that any of these were going concerns is an
existence proof that one _can_ take bytecodes targetted toward a
"virtual" machine and execute it on silicon,
making the
distinction a lot more fluid than might be naively assumed, in
turn exposing the silliness of this argument that centers around
this weirdly overly-rigid definition of what a "compiler" is.

I've implemented numerous compilers and interpreters over the last few
decades (and have dabbled in emulators).

To me the distinctions are clear enough because I have to work at the
sharp end!

I'm not sure why people want to try and be clever by blurring the roles
of compiler and interpreter; that's not helpful at all.

I'm not saying the two are the same; what I'm saying is that
this arbitrary criteria that a compiler must emit a fully
executable binary image is not just inadquate, but also wrong,
as it renders separate compilation impossible. I am further
saying that there are many different _types_ of compilers,
including specialized tools that don't emit machine language.

Sure, people can write emulators for machine code, which are a kind of
interpreter, or they can implement bytecode in hardware; so what?

That's exactly my point.

So, then what, we do away with the concepts of 'compiler' and
'interpreter'? Or allow them to be used interchangeably?

I don't see how you can credibly draw that conclusion from what
I've been saying.

But it's really pretty straight-forward; a compiler effects a
translation from one computer language to another (the
definition from Aho et al). An interpreter takes a program
written in some computer language and executes it. Of course
there's some gray area here; is a load-and-go compiler a
compiler in this sense (yes; it is still translating between
its source language and a machine language) or an interpreter?
(Possibly; after all, it's taking a source language and causing
a program written in it to be executed.)

Java is an interesting case in point here; the Java compiler is
obviously a compiler; the JVM is an interpreter. I don't think
anyone would dispute this. But by suggesting some hard and fast
division that can be rigidly upheld in all cases we're ignoring
so much nuance as to be reductive; but by pointing these things
out, we see how inane it is to assert that a "proper compiler"
is only one that takes a textual source input and emits machine
code for a silicon target.

Somehow I don't think it is useful to think of gcc as a interpreter for
C, or CPython as an native code compiler for Python.

I don't think anyone suggested that. But we _do_ have examples
of true compilers emitting "code" for interpreters; cf LLVM and
eBPF, which I mentioned previously in this thread, or compilers
that emit code for hypothetical machines like MMIX, or compilers
that emit instructions that aren't implemented everywhere, or
more precisely are implemented by trap and emulation.

That doesn't really affect what I do. Writing compiler backends for
actual CPUs is hard work. Generating bytecode is a lot simpler.

That really depends on the bytecode, doesn't it? The JVM is a
complex beast;

Is it? It's not to my taste, but it didn't look too scary to me. Whereas >modern CPU instruction sets are horrendous. (I normally target x64,
which is described in 6 large volumes. RISC ones don't look much better,
eg. RISC V with its dozens of extensions and special types)

I dunno. Wirth wrote an Oberon compiler targeting MIPS in ~5000
lines of code. It was pretty straight-forward.

And most of those ten volumes in the SDM have to do with the
privileged instruction set and details of the memory model like
segmentation and paging, most of which don't impact the compiler
author much at all: beyond, perhaps providing an intrinsic for
the `rdmsr` and `wrmsr` instructions, I don't think you care
much about MSRs, let alone VMX or the esoterica of under what
locked cycles the hardware sets the "A" bit on page table
entries on a TLB miss.

Example of JVM:

aload index Push a reference from local variable #index

Ok. `leaq index(%rip), %rax; pushq %rax` isn't that hard either.

MIPS or the unprivileged integer subset of RISC-V

are pretty simple in comparison.

(Especially in my case as I've devised myself, another distinction.
Compilers usually target someone else's instruction set.)

If you want one more distinction, it is this: with my compiler, the
resultant binary is executed by a separate agency: the CPU. Or maybe the >>> OS loader will run it through an emulator.

Python has a mode by which it will emit bytecode _files_, which
can be separately loaded and interpreted; it even has an
optimizing mode. Is that substantially different?

Whether there is a discrete bytecode file is besides the point. (I
generated such files for many years.)

You still need software to execute it. Especially for dynamically typed >bytecode which doesn't lend itself easily to either hardware >implementations, or load-time native code translation.

Sure. But if execution requires a "separate agency", and you
acknowledge that could be a CPU or a separate program, how is
that all that different than what Python _does_? That doesn't
imply that the Python interpreter is the same as a CPU, or that
an interpreter is the same as a compiler. But it does imply
that the definitions being thrown about here aren't particularly
good.

With my interpreter, then *I* have to write the dispatch routines and
write code to implement all the instructions.

Again, I don't think that anyone disputes that interpreters
exist. But insisting that they must take a particular shape is
just wrong.

What shape would that be? Generally they will need some /software/ to
excute the instructions of the program being interpreted, as I said.
Some JIT products may choose to do on-demand translation to native code.

Is there anything else? I'd be interested in anything new!

I actually meant to write that "insisting that _compilers_ take
a specific shape is just wrong." But I think the point holds
reasonably well for interpreters, as well: they need not
directly interpret the text of a program; they may well create
some sort of internal bytecode after several optimization and
type checking steps, looking more like a load-and-go compiler
than, say, the 6th Edition Unix shell.

Comparison to Roslyn-style compilers blurs the distinction
further still.

(My compilers generate an intermediate language, a kind of VM, which is
then processed further into native code.

Then by the definition of this psuedonyminous guy I've been
responding to, your compiler is not a "proper compiler", no?

Actually mine is more of a compiler than many, since it directly
generates native machine code. Others generally stop at ASM code (eg.
gcc) or OBJ code, and will invoke separate programs to finish the job.

The intermediate language here is just a step in the process.

But I have also tried interpreting that VM; it just runs 20 times slower >>> than native code. That's what interpreting usually means: slow programs.) >>

Not necessarily. The JVM does pretty good, quite honestly.

But is it actually interpreting? Because if I generated such code for a >statically typed language, then I would first translate to native code,
of any quality, since it's going to be faster than interpreting.

Doesn't that reinforce my thesis that these things are much
blurier than all this uninformed talk of a mythical "proper
compiler" would lead one to believe?

- Dan C.

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Noozle
  Sun Apr 20 15:14:28 2025
  from Noozle City via Telnet
- Microbot
  Sun Apr 20 03:00:36 2025
  from Moore, Ok via Telnet
- Noozle
  Sat Apr 19 14:10:30 2025
  from Noozle City via Telnet
- Noozle
  Sat Apr 19 09:18:26 2025
  from Noozle City via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,030
Nodes:	10 (1 / 9)
Uptime:	47:05:09
Calls:	13,348
Calls today:	2
Files:	186,574
D/L today:	6,796 files (1,757M bytes)
Messages:	3,358,263

Re: On overly rigid definitions (was Re: Command Languages Versus Programming Languages)

Who's Online

Recent Visitors

System Info