• Spring 2025 Challenge: TicTacToe Transformer

    From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Mon Feb 24 23:34:34 2025
    From Newsgroup: comp.lang.prolog


    Very simple challenge conceptually, develop the idea
    of Centipawn towards TicTacToe and implement the
    game based on learning / training a transformer, and

    then executing it. All written in Prolog itself! Optional
    bonus exercise, make the execution ИИUƎ style, i.e.
    incremental evaluation of the transformer.

    Centipawn - Chess Wiki
    https://chess.fandom.com/wiki/Centipawn

    NNUE - Chess Programming Wiki
    https://www.chessprogramming.org/NNUE
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Tue Feb 25 09:09:04 2025
    From Newsgroup: comp.lang.prolog

    Prologers are still on the path of Don Quixote:

    extremely restrictive setting and the only reason
    it’s worked so well over the years is that people
    have persisted at flogging it like the deadest
    of dead horses

    For some its a dead horse, for others by means of the
    two nobel prices, one for Geoffrey Hinton in Physics and
    one for Demis Hassabis in Chemistry, both in 2024,
    its rather a wakeup call.

    The current state of affaire in Prolog is , autoencoders
    and transformers are not available via ILP, it lacks the
    conceptual setting, because its based on a model of
    belief congruence,

    trying to avoid cognitve dissonance. Basically ILP adopts
    Abduction as already conceived by Charles Sanders Peirce.
    He is also the originator of Conceptual Graphs. The
    problem is solved for some

    background knowledge B and some observation E, in that the
    idea is to find a hypothesis H such that:

    Consistency: B, H |/- f /* no absurdity */
    Completess: B, H |- E

    There is also a refinement with positive and negative
    observation E+ and E-. The challenge I am positing is to
    get some hands-on and see what are the merits of autoencoders

    and transformers, and maybe to see whether there is a possible
    marriage of autoencoders and transformers with ILP. The
    challenge here is that autoencoders and transformers have

    no concept of absurdity. The main feature of extrapolation in
    autoencoders and transformers are:

    - Inferencing:
    The autoencoder might also tolerate deviations in
    the input that are not in the training data, giving
    it some inferential capability.

    - Generation:
    And then choose an output again not in the training
    data, giving it some generative capabilities.

    There is no measurement against absurdity in the
    inferencing and no measurement against absurdity in
    the generation. This is also seen in practice, like
    when you interact with

    ChatGPT, it can halucinate unicorns, and it can
    even make mistake, in the halucination, like believing
    the are are white chetsnut unicorns.

    So the following is possible:

    There are unicorns

    There are white chestnut unicorns

    I see this as a chance that absurdity is possible in
    autoencoders and transformers, for many reasons,
    especially from my interest in paraconsistent logics.
    You can already not assume that training data is

    consistent. That there is no ex falso explosion in the
    type of autoencoder and transformer machine learning
    is rather a benefit than a curse, and somehow gives a
    neat solution to many problems, where ILP might

    fail by design because it is too strict.

    See also:

    https://de.wikipedia.org/wiki/Geoffrey_Hinton

    https://de.wikipedia.org/wiki/Demis_Hassabis

    https://en.wikipedia.org/wiki/Abductive_reasoning#Abduction

    Mild Shock schrieb:

    Very simple challenge conceptually, develop the idea
    of Centipawn towards TicTacToe and implement the
    game based on learning / training a transformer, and

    then executing it. All written in Prolog itself! Optional
    bonus exercise, make the execution ИИUƎ style, i.e.
    incremental evaluation of the transformer.

    Centipawn - Chess Wiki
    https://chess.fandom.com/wiki/Centipawn

    NNUE - Chess Programming Wiki
    https://www.chessprogramming.org/NNUE

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Tue Feb 25 09:19:15 2025
    From Newsgroup: comp.lang.prolog

    ILP might fail by design because it is too strict

    So I am currenty looking whether the R statistics
    package of SWI-Prolog delivers some autoencoders and
    transformers. Or whether the Janus interface can help
    in experimenting with autoencoders and transformers.

    That ILP has the wrong design when dealing with “absurdity”,
    is already seen in my intro to this thread:

    The paper contains not a single reference to autoencoders!
    Still they show this example:

    Inductive logic programming at 30
    Flg. 1 ILP systems struggle with structured examples
    that exhibit observational noise. All three examples
    clearly speil the word "ILP". with some alterations:
    3 noisy pixels. shifted and elongated letters. If we
    would be to learn a program that simply draws "ILP"
    in the middle of the picture, without noisy pixels
    and elongated letters, that would be a correct program. https://arxiv.org/abs/2102.10556

    There is no idea of a model that shows variation, in
    inferencing and generation. The idea is that there is a
    single correct program which produces one output and

    the rest is error. Thats not how machine learning is
    conceived in autoencoders. BTW, here is some progress with
    my learning Tic-Tac-Toe , in particular the training
    of a full "neural network computer" based on eForests,

    meaning the neural networks are in fact realized as
    pure Prolog implemented binary decision diagrams (BDD)
    realized via SWI-Prolog 9.3.19:

    ?- test5.
    % 77,982,596 inferences, 5.391 CPU in 5.518 seconds
    (98% CPU, 14466337 Lips)
    0
    438
    % 771,928,499 inferences, 55.422 CPU in 56.299 seconds
    (98% CPU, 13928228 Lips)
    208
    589
    % 3,252,688,243 inferences, 250.688 CPU in 256.150 seconds
    (98% CPU, 12975072 Lips)
    126
    true.

    The above does out-of-bag training with successive
    transfer of a learnt model, to progressively larger
    bags of size 100, 500 and 1000. The final error score
    of 126 means it can

    already play in 12% of the training data cases the
    optimal Tic-Tac-Toe strategy, and this only after like
    5 minutes of training and the result is in algorithmic
    form, i.e. BDD but could also try @kuniaki.mukai ZDD.

    I am currently working on better parameters and
    better measurement of the inferencing and generation
    of the learnt model. But it gives an answer to the
    question what this here means in the setting of

    autoencoders and transformers for additional
    training after pre-training:

    - Domain Adaptation: Well-structured latent space can
    help transfer knowledge from abundant domains to
    underrepresented ones.


    Mild Shock schrieb:
    Prologers are still on the path of Don Quixote:

    extremely restrictive setting and the only reason
    it’s worked so well over the years is that people
    have persisted at flogging it like the deadest
    of dead horses

    For some its a dead horse, for others by means of the
    two nobel prices, one for Geoffrey Hinton in Physics and
    one for Demis Hassabis in Chemistry, both in 2024,
    its rather a wakeup call.

    The current state of affaire in Prolog is , autoencoders
    and transformers are not available via ILP, it lacks the
    conceptual setting, because its based on a model of
    belief congruence,

    trying to avoid cognitve dissonance. Basically ILP adopts
    Abduction as already conceived by Charles Sanders Peirce.
    He is also the originator of Conceptual Graphs. The
    problem is solved for some

    background knowledge B and some observation E, in that the
    idea is to find a hypothesis H such that:

    Consistency: B, H |/- f /* no absurdity */
    Completess: B, H |- E

    There is also a refinement with positive and negative
    observation E+ and E-. The challenge I am positing is to
    get some hands-on and see what are the merits of autoencoders

    and transformers, and maybe to see whether there is a possible
    marriage of autoencoders and transformers with ILP. The
    challenge here is that autoencoders and transformers have

    no concept of absurdity. The main feature of extrapolation in
    autoencoders and transformers are:

    - Inferencing:
      The autoencoder might also tolerate deviations in
      the input that are not in the training data, giving
      it some inferential capability.

    - Generation:
      And then choose an output again not in the training
      data, giving it some generative capabilities.

    There is no measurement against absurdity in the
    inferencing and no measurement against absurdity in
    the generation. This is also seen in practice, like
    when you interact with

    ChatGPT, it can halucinate unicorns, and it can
    even make mistake, in the halucination, like believing
    the are are white chetsnut unicorns.

    So the following is possible:

      There are unicorns

      There are white chestnut unicorns

    I see this as a chance that absurdity is possible in
    autoencoders and transformers, for many reasons,
    especially from my interest in paraconsistent logics.
    You can already not assume that training data is

    consistent. That there is no ex falso explosion in the
    type of autoencoder and transformer machine learning
    is rather a benefit than a curse, and somehow gives a
    neat solution to many problems, where ILP might

    fail by design because it is too strict.

    See also:

    https://de.wikipedia.org/wiki/Geoffrey_Hinton

    https://de.wikipedia.org/wiki/Demis_Hassabis

    https://en.wikipedia.org/wiki/Abductive_reasoning#Abduction

    Mild Shock schrieb:

    Very simple challenge conceptually, develop the idea
    of Centipawn towards TicTacToe and implement the
    game based on learning / training a transformer, and

    then executing it. All written in Prolog itself! Optional
    bonus exercise, make the execution ИИUƎ style, i.e.
    incremental evaluation of the transformer.

    Centipawn - Chess Wiki
    https://chess.fandom.com/wiki/Centipawn

    NNUE - Chess Programming Wiki
    https://www.chessprogramming.org/NNUE


    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sun Mar 2 03:49:32 2025
    From Newsgroup: comp.lang.prolog


    Ok, my bad. You can of course also try a decoder-only.
    Just like here in this Python code example:

    **Simple PyTorch Implementation of “Grokking”**
    We trained a standard decoder-only transformer (Vaswani et al., 2017) https://github.com/teddykoker/grokking

    The transformer need not necessarely have a encoder and
    a latent space. It can be also a decoder-only.

    Mild Shock schrieb:

    Very simple challenge conceptually, develop the idea
    of Centipawn towards TicTacToe and implement the
    game based on learning / training a transformer, and

    then executing it. All written in Prolog itself! Optional
    bonus exercise, make the execution ИИUƎ style, i.e.
    incremental evaluation of the transformer.

    Centipawn - Chess Wiki
    https://chess.fandom.com/wiki/Centipawn

    NNUE - Chess Programming Wiki
    https://www.chessprogramming.org/NNUE

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sun Mar 2 22:39:27 2025
    From Newsgroup: comp.lang.prolog

    Thank you that you think,
    I would invent these things:

    Are you thinking that autoencoders
    could play a bigger role in tasks like
    language modeling

    Nope, it is all in the papers, like here:

    **Attention Is All You Need**
    Vaswani et al., 2017
    https://arxiv.org/abs/1706.03762

    The conclusion says, its same architecture
    as autoencoders:

    In this work, we presented the Transformer,
    the first sequence transduction model based
    entirely on attention, replacing the recurrent
    layers most commonly used in encoder-decoder
    architectures with multi-headed self-attention.

    Same architecture with latent spaces between
    encoder and decoder. The training on my laptop
    would take, for the EN-DE model ConvS2S Ensemble
    reported in the paper Table 2, using my GPU:

    7.7e19 / 3e13 = 1 month

    If I would try to train GPT 4.5 on my
    laptop it would take:

    1E23 / 3e13 = 3'000 years

    P.S.: The paper is the from the same Vaswani et al.,
    2017 as referenced in the Python code of
    the other Grokking paper.

    Mild Shock schrieb:

    Ok, my bad. You can of course also try a decoder-only.
    Just like here in this Python code example:

    **Simple PyTorch Implementation of “Grokking”**
    We trained a standard decoder-only transformer (Vaswani et al., 2017) https://github.com/teddykoker/grokking

    The transformer need not necessarely have a encoder and
    a latent space. It can be also a decoder-only.

    Mild Shock schrieb:

    Very simple challenge conceptually, develop the idea
    of Centipawn towards TicTacToe and implement the
    game based on learning / training a transformer, and

    then executing it. All written in Prolog itself! Optional
    bonus exercise, make the execution ИИUƎ style, i.e.
    incremental evaluation of the transformer.

    Centipawn - Chess Wiki
    https://chess.fandom.com/wiki/Centipawn

    NNUE - Chess Programming Wiki
    https://www.chessprogramming.org/NNUE


    --- Synchronet 3.20c-Linux NewsLink 1.2