Forum: War Ensemble BBS

Spring 2025 Challenge: TicTacToe Transformer

From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Mon Feb 24 23:34:34 2025

From Newsgroup: comp.lang.prolog

Very simple challenge conceptually, develop the idea
of Centipawn towards TicTacToe and implement the
game based on learning / training a transformer, and

then executing it. All written in Prolog itself! Optional
bonus exercise, make the execution ИИUƎ style, i.e.
incremental evaluation of the transformer.

Centipawn - Chess Wiki
https://chess.fandom.com/wiki/Centipawn

NNUE - Chess Programming Wiki
https://www.chessprogramming.org/NNUE
--- Synchronet 3.20c-Linux NewsLink 1.2

From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Tue Feb 25 09:09:04 2025

From Newsgroup: comp.lang.prolog

Prologers are still on the path of Don Quixote:

extremely restrictive setting and the only reason
it’s worked so well over the years is that people
have persisted at flogging it like the deadest
of dead horses

For some its a dead horse, for others by means of the
two nobel prices, one for Geoffrey Hinton in Physics and
one for Demis Hassabis in Chemistry, both in 2024,
its rather a wakeup call.

The current state of affaire in Prolog is , autoencoders
and transformers are not available via ILP, it lacks the
conceptual setting, because its based on a model of
belief congruence,

trying to avoid cognitve dissonance. Basically ILP adopts
Abduction as already conceived by Charles Sanders Peirce.
He is also the originator of Conceptual Graphs. The
problem is solved for some

background knowledge B and some observation E, in that the
idea is to find a hypothesis H such that:

Consistency: B, H |/- f /* no absurdity */
Completess: B, H |- E

There is also a refinement with positive and negative
observation E+ and E-. The challenge I am positing is to
get some hands-on and see what are the merits of autoencoders

and transformers, and maybe to see whether there is a possible
marriage of autoencoders and transformers with ILP. The
challenge here is that autoencoders and transformers have

no concept of absurdity. The main feature of extrapolation in
autoencoders and transformers are:

- Inferencing:
The autoencoder might also tolerate deviations in
the input that are not in the training data, giving
it some inferential capability.

- Generation:
And then choose an output again not in the training
data, giving it some generative capabilities.

There is no measurement against absurdity in the
inferencing and no measurement against absurdity in
the generation. This is also seen in practice, like
when you interact with

ChatGPT, it can halucinate unicorns, and it can
even make mistake, in the halucination, like believing
the are are white chetsnut unicorns.

So the following is possible:

There are unicorns

There are white chestnut unicorns

I see this as a chance that absurdity is possible in
autoencoders and transformers, for many reasons,
especially from my interest in paraconsistent logics.
You can already not assume that training data is

consistent. That there is no ex falso explosion in the
type of autoencoder and transformer machine learning
is rather a benefit than a curse, and somehow gives a
neat solution to many problems, where ILP might

fail by design because it is too strict.

See also:

https://de.wikipedia.org/wiki/Geoffrey_Hinton

https://de.wikipedia.org/wiki/Demis_Hassabis

https://en.wikipedia.org/wiki/Abductive_reasoning#Abduction

Mild Shock schrieb:

Very simple challenge conceptually, develop the idea
of Centipawn towards TicTacToe and implement the
game based on learning / training a transformer, and

then executing it. All written in Prolog itself! Optional
bonus exercise, make the execution ИИUƎ style, i.e.
incremental evaluation of the transformer.

Centipawn - Chess Wiki
https://chess.fandom.com/wiki/Centipawn

NNUE - Chess Programming Wiki
https://www.chessprogramming.org/NNUE

--- Synchronet 3.20c-Linux NewsLink 1.2

From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Tue Feb 25 09:19:15 2025

From Newsgroup: comp.lang.prolog

ILP might fail by design because it is too strict

So I am currenty looking whether the R statistics
package of SWI-Prolog delivers some autoencoders and
transformers. Or whether the Janus interface can help
in experimenting with autoencoders and transformers.

That ILP has the wrong design when dealing with “absurdity”,
is already seen in my intro to this thread:

The paper contains not a single reference to autoencoders!
Still they show this example:

Inductive logic programming at 30
Flg. 1 ILP systems struggle with structured examples
that exhibit observational noise. All three examples
clearly speil the word "ILP". with some alterations:
3 noisy pixels. shifted and elongated letters. If we
would be to learn a program that simply draws "ILP"
in the middle of the picture, without noisy pixels
and elongated letters, that would be a correct program. https://arxiv.org/abs/2102.10556

There is no idea of a model that shows variation, in
inferencing and generation. The idea is that there is a
single correct program which produces one output and

the rest is error. Thats not how machine learning is
conceived in autoencoders. BTW, here is some progress with
my learning Tic-Tac-Toe , in particular the training
of a full "neural network computer" based on eForests,

meaning the neural networks are in fact realized as
pure Prolog implemented binary decision diagrams (BDD)
realized via SWI-Prolog 9.3.19:

?- test5.
% 77,982,596 inferences, 5.391 CPU in 5.518 seconds
(98% CPU, 14466337 Lips)
0
438
% 771,928,499 inferences, 55.422 CPU in 56.299 seconds
(98% CPU, 13928228 Lips)
208
589
% 3,252,688,243 inferences, 250.688 CPU in 256.150 seconds
(98% CPU, 12975072 Lips)
126
true.

The above does out-of-bag training with successive
transfer of a learnt model, to progressively larger
bags of size 100, 500 and 1000. The final error score
of 126 means it can

already play in 12% of the training data cases the
optimal Tic-Tac-Toe strategy, and this only after like
5 minutes of training and the result is in algorithmic
form, i.e. BDD but could also try @kuniaki.mukai ZDD.

I am currently working on better parameters and
better measurement of the inferencing and generation
of the learnt model. But it gives an answer to the
question what this here means in the setting of

autoencoders and transformers for additional
training after pre-training:

- Domain Adaptation: Well-structured latent space can
help transfer knowledge from abundant domains to
underrepresented ones.

Mild Shock schrieb:

Prologers are still on the path of Don Quixote:

extremely restrictive setting and the only reason
it’s worked so well over the years is that people
have persisted at flogging it like the deadest
of dead horses

For some its a dead horse, for others by means of the
two nobel prices, one for Geoffrey Hinton in Physics and
one for Demis Hassabis in Chemistry, both in 2024,
its rather a wakeup call.

The current state of affaire in Prolog is , autoencoders
and transformers are not available via ILP, it lacks the
conceptual setting, because its based on a model of
belief congruence,

trying to avoid cognitve dissonance. Basically ILP adopts
Abduction as already conceived by Charles Sanders Peirce.
He is also the originator of Conceptual Graphs. The
problem is solved for some

background knowledge B and some observation E, in that the
idea is to find a hypothesis H such that:

Consistency: B, H |/- f /* no absurdity */
Completess: B, H |- E

There is also a refinement with positive and negative
observation E+ and E-. The challenge I am positing is to
get some hands-on and see what are the merits of autoencoders

and transformers, and maybe to see whether there is a possible
marriage of autoencoders and transformers with ILP. The
challenge here is that autoencoders and transformers have

no concept of absurdity. The main feature of extrapolation in
autoencoders and transformers are:

- Inferencing:
The autoencoder might also tolerate deviations in
the input that are not in the training data, giving
it some inferential capability.

- Generation:
And then choose an output again not in the training
data, giving it some generative capabilities.

There is no measurement against absurdity in the
inferencing and no measurement against absurdity in
the generation. This is also seen in practice, like
when you interact with

ChatGPT, it can halucinate unicorns, and it can
even make mistake, in the halucination, like believing
the are are white chetsnut unicorns.

So the following is possible:

There are unicorns

There are white chestnut unicorns

I see this as a chance that absurdity is possible in
autoencoders and transformers, for many reasons,
especially from my interest in paraconsistent logics.
You can already not assume that training data is

consistent. That there is no ex falso explosion in the
type of autoencoder and transformer machine learning
is rather a benefit than a curse, and somehow gives a
neat solution to many problems, where ILP might

fail by design because it is too strict.

See also:

https://de.wikipedia.org/wiki/Geoffrey_Hinton

https://de.wikipedia.org/wiki/Demis_Hassabis

https://en.wikipedia.org/wiki/Abductive_reasoning#Abduction

Mild Shock schrieb:

Very simple challenge conceptually, develop the idea
of Centipawn towards TicTacToe and implement the
game based on learning / training a transformer, and

then executing it. All written in Prolog itself! Optional
bonus exercise, make the execution ИИUƎ style, i.e.
incremental evaluation of the transformer.

Centipawn - Chess Wiki
https://chess.fandom.com/wiki/Centipawn

NNUE - Chess Programming Wiki
https://www.chessprogramming.org/NNUE

--- Synchronet 3.20c-Linux NewsLink 1.2

From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sun Mar 2 03:49:32 2025

From Newsgroup: comp.lang.prolog

Ok, my bad. You can of course also try a decoder-only.
Just like here in this Python code example:

**Simple PyTorch Implementation of “Grokking”**
We trained a standard decoder-only transformer (Vaswani et al., 2017) https://github.com/teddykoker/grokking

The transformer need not necessarely have a encoder and
a latent space. It can be also a decoder-only.

Mild Shock schrieb:

Very simple challenge conceptually, develop the idea
of Centipawn towards TicTacToe and implement the
game based on learning / training a transformer, and

then executing it. All written in Prolog itself! Optional
bonus exercise, make the execution ИИUƎ style, i.e.
incremental evaluation of the transformer.

Centipawn - Chess Wiki
https://chess.fandom.com/wiki/Centipawn

NNUE - Chess Programming Wiki
https://www.chessprogramming.org/NNUE

--- Synchronet 3.20c-Linux NewsLink 1.2

From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sun Mar 2 22:39:27 2025

From Newsgroup: comp.lang.prolog

Thank you that you think,
I would invent these things:

Are you thinking that autoencoders
could play a bigger role in tasks like
language modeling

Nope, it is all in the papers, like here:

**Attention Is All You Need**
Vaswani et al., 2017
https://arxiv.org/abs/1706.03762

The conclusion says, its same architecture
as autoencoders:

In this work, we presented the Transformer,
the first sequence transduction model based
entirely on attention, replacing the recurrent
layers most commonly used in encoder-decoder
architectures with multi-headed self-attention.

Same architecture with latent spaces between
encoder and decoder. The training on my laptop
would take, for the EN-DE model ConvS2S Ensemble
reported in the paper Table 2, using my GPU:

7.7e19 / 3e13 = 1 month

If I would try to train GPT 4.5 on my
laptop it would take:

1E23 / 3e13 = 3'000 years

P.S.: The paper is the from the same Vaswani et al.,
2017 as referenced in the Python code of
the other Grokking paper.

Mild Shock schrieb:

Ok, my bad. You can of course also try a decoder-only.
Just like here in this Python code example:

**Simple PyTorch Implementation of “Grokking”**
We trained a standard decoder-only transformer (Vaswani et al., 2017) https://github.com/teddykoker/grokking

The transformer need not necessarely have a encoder and
a latent space. It can be also a decoder-only.

Mild Shock schrieb:

Very simple challenge conceptually, develop the idea
of Centipawn towards TicTacToe and implement the
game based on learning / training a transformer, and

then executing it. All written in Prolog itself! Optional
bonus exercise, make the execution ИИUƎ style, i.e.
incremental evaluation of the transformer.

Centipawn - Chess Wiki
https://chess.fandom.com/wiki/Centipawn

NNUE - Chess Programming Wiki
https://www.chessprogramming.org/NNUE

--- Synchronet 3.20c-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Thu Oct 23 01:54:50 2025
  from Moore, Ok via Telnet
- Microbot
  Wed Oct 22 03:04:13 2025
  from Moore, Ok via Telnet
- Microbot
  Tue Oct 21 04:06:41 2025
  from Moore, Ok via Telnet
- Microbot
  Mon Oct 20 05:50:15 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,073
Nodes:	10 (0 / 10)
Uptime:	226:35:03
Calls:	13,783
Calls today:	1
Files:	186,987
D/L today:	979 files (359M bytes)
Messages:	2,435,038

Spring 2025 Challenge: TicTacToe Transformer

Who's Online

Recent Visitors

System Info