• Mini-Language for hyphenation

    From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.misc on Tue Jan 21 11:53:55 2025
    From Newsgroup: comp.lang.misc

    I foresee the need for a mini-language for hyphenation in
    my current plain-text paragraph wrapper project. Here are
    my plans, comments are welcome:

    example

    . This is an unadorned word "example". The system might automatically
    insert possibilities for hyphenation from a hyphenation dictionary.

    ex[[-]]am[[-]]ple

    Here, possibilities for hyphenation have been inserted. It
    is assumed that nested brackets occur so rarely in natural
    texts, that this possibility is negligible. But means for
    escaping will be discussed below.

    ba[ck[k-|k]]en

    This is a hyphenation of a German word according to the
    rules from 1973. It's either "backen" or "bak-
    ken".

    Bett[[-|t]]uch

    "Bettuch" or "Bett-
    tuch", according to spelling rules from 1973.

    So, the general pattern in my mini-language is:

    [no-hyphenation text[pre-break text|post-break text]]

    .

    Bett[t]uch

    When brackets occur in the text that do no satisfy the
    syntax of my mini-language, they will simply be left alone.
    I.e., this is just literally "Bett[t]uch" with a "t" to
    be "typeset" in literal brackets.

    ba[ck[k-|k][-|ck@-99]]

    Here, two possibilities for hyphenation are given, the second one
    has a value of -99 added to the quality of the break, which means
    that "[k-|k]" will be preferred.

    backen[[#]]

    This inserts an invisible marker of width zero that then may be found
    in the wrapped paragraph to learn on which line the "n" has ended.

    b[[#97]]cken

    Here, the "a" is given by its code point number.

    b[[#u61]]cken

    Here, the "a" is given by its code point number in hex notation.

    Escape Mechanisms

    In programming language, we may indeed have nested brackets as
    in "a[ b[ 20 ]]". Using the above notation, this can be written
    as "a[[#91]] b[[#91]] 20 [[#93]][[#93]]".

    My mini-language is intended to be a low-level mechanism
    for the specification of hyphenation rules. Higher-level
    formatting languages may be built on top of it, which may
    automatically convert "a[ b[ 20 ]]" into "a[[#91]] b[[#91]] 20
    [[#93]][[#93]]" when it appears in the context of source code.

    However, as a last ressort, one may use a special notation to
    redefine the characters of the mini-language:

    [[#40=#91]]
    [[#91=]]

    Above, the parenthesis "(" (40) is given the role of the bracket
    "[" (91), and then the bracket is defined to have no special role
    in the mini-language. (The value right of "=" always represents
    the role this symbol has in the /original/ mini-language.)


    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.misc on Tue Jan 21 16:16:52 2025
    From Newsgroup: comp.lang.misc

    On 21/01/2025 12:53, Stefan Ram wrote:
    I foresee the need for a mini-language for hyphenation in
    my current plain-text paragraph wrapper project. Here are
    my plans, comments are welcome:

    I would recommend you look at Tex's hyphenation algorithm and dictionary lists, rather than inventing your own. Then you can re-use existing hyphenation patterns and exception lists for countless different
    languages. It includes support for hyphenation patterns that lead to
    changes in the spelling, ligatures, diacriticals or other effects. Your examples "backen" and "Bettuch" are even included in the TeXbook.


    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.misc on Mon Jan 27 16:48:15 2025
    From Newsgroup: comp.lang.misc

    ram@zedat.fu-berlin.de (Stefan Ram) writes:

    I foresee the need for a mini-language for hyphenation in
    my current plain-text paragraph wrapper project. Here are
    my plans, comments are welcome:

    example

    . This is an unadorned word "example". The system might automatically
    insert possibilities for hyphenation from a hyphenation dictionary.

    ex[[-]]am[[-]]ple

    Here, possibilities for hyphenation have been inserted. It
    is assumed that nested brackets occur so rarely in natural
    texts, that this possibility is negligible. But means for
    escaping will be discussed below.

    ba[ck[k-|k]]en

    This is a hyphenation of a German word according to the
    rules from 1973. It's either "backen" or "bak-
    ken".

    Bett[[-|t]]uch

    "Bettuch" or "Bett-
    tuch", according to spelling rules from 1973.

    So, the general pattern in my mini-language is:

    [no-hyphenation text[pre-break text|post-break text]]

    .

    Bett[t]uch

    When brackets occur in the text that do no satisfy the
    syntax of my mini-language, they will simply be left alone.
    I.e., this is just literally "Bett[t]uch" with a "t" to
    be "typeset" in literal brackets.

    ba[ck[k-|k][-|ck@-99]]

    Here, two possibilities for hyphenation are given, the second one
    has a value of -99 added to the quality of the break, which means
    that "[k-|k]" will be preferred.

    backen[[#]]

    This inserts an invisible marker of width zero that then may be found
    in the wrapped paragraph to learn on which line the "n" has ended.

    b[[#97]]cken

    Here, the "a" is given by its code point number.

    b[[#u61]]cken

    Here, the "a" is given by its code point number in hex notation.

    Escape Mechanisms

    In programming language, we may indeed have nested brackets as
    in "a[ b[ 20 ]]". Using the above notation, this can be written
    as "a[[#91]] b[[#91]] 20 [[#93]][[#93]]".

    My mini-language is intended to be a low-level mechanism
    for the specification of hyphenation rules. Higher-level
    formatting languages may be built on top of it, which may
    automatically convert "a[ b[ 20 ]]" into "a[[#91]] b[[#91]] 20
    [[#93]][[#93]]" when it appears in the context of source code.

    However, as a last ressort, one may use a special notation to
    redefine the characters of the mini-language:

    [[#40=#91]]
    [[#91=]]

    Above, the parenthesis "(" (40) is given the role of the bracket
    "[" (91), and then the bracket is defined to have no special role
    in the mini-language. (The value right of "=" always represents
    the role this symbol has in the /original/ mini-language.)

    To me this looks like you are proposing an answer before
    really understanding the question. What problem are you
    trying to solve? That isn't at all clear from your
    comments.
    --- Synchronet 3.20c-Linux NewsLink 1.2