I foresee the need for a mini-language for hyphenation in
my current plain-text paragraph wrapper project. Here are
my plans, comments are welcome:
I foresee the need for a mini-language for hyphenation in
my current plain-text paragraph wrapper project. Here are
my plans, comments are welcome:
example
. This is an unadorned word "example". The system might automatically
insert possibilities for hyphenation from a hyphenation dictionary.
ex[[-]]am[[-]]ple
Here, possibilities for hyphenation have been inserted. It
is assumed that nested brackets occur so rarely in natural
texts, that this possibility is negligible. But means for
escaping will be discussed below.
ba[ck[k-|k]]en
This is a hyphenation of a German word according to the
rules from 1973. It's either "backen" or "bak-
ken".
Bett[[-|t]]uch
"Bettuch" or "Bett-
tuch", according to spelling rules from 1973.
So, the general pattern in my mini-language is:
[no-hyphenation text[pre-break text|post-break text]]
.
Bett[t]uch
When brackets occur in the text that do no satisfy the
syntax of my mini-language, they will simply be left alone.
I.e., this is just literally "Bett[t]uch" with a "t" to
be "typeset" in literal brackets.
ba[ck[k-|k][-|ck@-99]]
Here, two possibilities for hyphenation are given, the second one
has a value of -99 added to the quality of the break, which means
that "[k-|k]" will be preferred.
backen[[#]]
This inserts an invisible marker of width zero that then may be found
in the wrapped paragraph to learn on which line the "n" has ended.
b[[#97]]cken
Here, the "a" is given by its code point number.
b[[#u61]]cken
Here, the "a" is given by its code point number in hex notation.
Escape Mechanisms
In programming language, we may indeed have nested brackets as
in "a[ b[ 20 ]]". Using the above notation, this can be written
as "a[[#91]] b[[#91]] 20 [[#93]][[#93]]".
My mini-language is intended to be a low-level mechanism
for the specification of hyphenation rules. Higher-level
formatting languages may be built on top of it, which may
automatically convert "a[ b[ 20 ]]" into "a[[#91]] b[[#91]] 20
[[#93]][[#93]]" when it appears in the context of source code.
However, as a last ressort, one may use a special notation to
redefine the characters of the mini-language:
[[#40=#91]]
[[#91=]]
Above, the parenthesis "(" (40) is given the role of the bracket
"[" (91), and then the bracket is defined to have no special role
in the mini-language. (The value right of "=" always represents
the role this symbol has in the /original/ mini-language.)
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,030 |
Nodes: | 10 (0 / 10) |
Uptime: | 76:20:43 |
Calls: | 13,351 |
Calls today: | 3 |
Files: | 186,574 |
D/L today: |
9,421 files (2,344M bytes) |
Messages: | 3,358,952 |