DXR is a code search and navigation tool aimed at making sense of large projects. It supports full-text and regex searches as well as structural queries.

Mercurial (1aeaa33a64f9)

VCS Links

Line Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
New option of Libhyphen 2.7: NOHYPHEN

Hyphen, apostrophe and other characters may be word boundary characters,
but they don't need (extra) hyphenation. With NOHYPHEN option
it's possible to hyphenate the words parts correctly.

Example:

ISO8859-1
NOHYPHEN -,'
1-1
1'1
NEXTLEVEL

Description:

1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
and NOHYPHEN with the comma separated character (or character sequence)
list forbid the (extra) hyphens at the hyphen and apostrophe characters.

Implicite NOHYPHEN declaration

Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the
previous settings, plus in UTF-8 encoding, endash (U+2013) and
typographical apostrophe (U+2019) are NOHYPHEN characters, too.

It's possible to enlarge the hyphenation distance from these
NOHYPHEN characters by using COMPOUNDLEFTHYPHENMIN and
COMPOUNDRIGHTHYPHENMIN attributes.

Compound word hyphenation

Hyphen library supports better compound word hyphenation and special
rules of compound word hyphenation of German languages and other
languages with arbitrary number of compound words. The new options,
COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right
style for the hyphenation of compound words.

Algorithm

The algorithm is an extension of the original pattern based hyphenation
algorithm. It uses two hyphenation pattern sets, defined in the same
pattern file and separated by the NEXTLEVEL keyword. First pattern
set is for hyphenation only at compound word boundaries, the second one
is for hyphenation within words or word parts.

Recursive compound level hyphenation

The algorithm is recursive: every word parts of a successful 
first (compound) level hyphenation will be rehyphenated
by the same (first) pattern set.

Finally, when first level hyphenation is not possible, Hyphen uses
the second level hyphenation for the word or the word parts.

Word endings and word parts

Patterns for word endings (patterns with ellipses) match the
word parts, too.

Options

COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary
COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary
NEXTLEVEL: sign second level hyphenation patterns

Default hyphenmin values

Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0,
and 0 under the hyphenation, too. ("0" values of
LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.)

Examples

See tests/compound* test files.

Preparation of hyphenation patterns

It hasn't been special pattern generator tool for compound hyphenation
patterns, yet. It is possible to use PATGEN to generate both of
pattern sets, concatenate it manually and set the requested HYPHENMIN values.
(But don't forget the preprocessing steps by substrings.pl before
concatenation.) One of the disadvantage of this method, that PATGEN
doesn't know recursive compound hyphenation of Hyphen.

László Németh
<nemeth (at) openoffice.org>