www.sixfingeredman.net
..................................................
::. .  .   .     .       .          .

HOME
readme
brain
ideas
todo
writing
photos
graphics
projects
quotes
recipes
books
movies
links
old site

formatting plain text

A few things about the task of formatting plaintext into non-plaintext. The main challenge to overcome is the fact that we're going from fixed-width to not, and that we're going from wrapped to flowing.

Therefore we have to make edjucated guesses about when spacing is intended for alignment, and when it's coincidental. Really this shouldn't be terribly hard, but it's still difficult to formalize. The first thing to do is break things into words separated by spaces, then look for words which appear to be intentionally aligned.

The same problem occurs with breaks. Lines which are subsequent, aligned on the left and near the "natural break point" on the right are probably supposed to flow. Those not near the natural breakpoint aren't.

We can also apply some reasoning about what's going on nearby. If the previous line was judged to be intentionally broken, then this one probably is too.

This is toughest in columns, where the margin of error between the natural breakpoint and an intentional one can be too small. For example:

Column heading one  Column heading two  Column heading three
one                 two                 three
four                five                six
seven               eight               nine

It takes some semantic knowledge to be sure that the first row isn't just a wrapped continuation of the header, as the following shows:

Column heading  Column heading  Column heading
one             two             three
1.200           2.1111          3.111
12.2            5.100           16.6

Of course, that's pretty darn unreadable, but again it's hard to formalize.


... those who are inspired by a model other than nature, a mistress above all
others, are laboring in vain.
	-- Leonardo da Vinci