www.sixfingeredman.net .................................................. ::. . . . . . . |
HOME readme brain ideas todo writing photos graphics projects quotes recipes books movies links old site |
formatting plain textA few things about the task of formatting plaintext into non-plaintext. The main challenge to overcome is the fact that we're going from fixed-width to not, and that we're going from wrapped to flowing. Therefore we have to make edjucated guesses about when spacing is intended for alignment, and when it's coincidental. Really this shouldn't be terribly hard, but it's still difficult to formalize. The first thing to do is break things into words separated by spaces, then look for words which appear to be intentionally aligned. The same problem occurs with breaks. Lines which are subsequent, aligned on the left and near the "natural break point" on the right are probably supposed to flow. Those not near the natural breakpoint aren't. We can also apply some reasoning about what's going on nearby. If the previous line was judged to be intentionally broken, then this one probably is too. This is toughest in columns, where the margin of error between the natural breakpoint and an intentional one can be too small. For example: Column heading one Column heading two Column heading three one two three four five six seven eight nine It takes some semantic knowledge to be sure that the first row isn't just a wrapped continuation of the header, as the following shows: Column heading Column heading Column heading one two three 1.200 2.1111 3.111 12.2 5.100 16.6 Of course, that's pretty darn unreadable, but again it's hard to formalize. |
... those who are inspired by a model other than nature, a mistress above all others, are laboring in vain. -- Leonardo da Vinci