Work on (and from, sadly) TextMunger has slowed down in the past few months. I took a vague break this summer, and am knuckling down on some freelance work this fall. Which leaves even less time for TM work….
Still, things have happened, and there are some plans.
The XRML-targeted formatter works a bit better (doesn’t get stuck on zero-punctuation for long periods of time) even if the underlying algorithm still has issues.
I’m thinking of rewriting it, or writing another, that would work on multiple-passes over a text-block to allow for vertical text placement a la columns, although hopefully not so dense and delineated to show up rivers of punctuation that delineate actual columns. I would like just enough to encourage a vertical reading, and not just the standard (Western) left-to-right, line-by-line.
A simple implementation of a single-pass would be an offset, so that the next word lines up with the start of the last word. Would want to enhance this so that an arbitrary number of words (tokens, whatever) are printed before the next line alignment.
Multi-pass is more difficult, as it would include the above, as well as requiring an intake of a previously formatted chunk of text for re-writing.
I’ve done some work on preparing to save lists of selected texts, but it has been exposed to the UI yet.
First, because there is no menu-strip in the UI. I think it’s time to address this (as well as scrapping the diagnostics tab).
But mainly I think the library-system needs and overhaul. Parsing a nested set of folders for
.txt files is fine, but it’s a dumb system. Workable, practical, but literally dumb — it returns texts, but no awareness of those texts. And awareness allows us to sort, etc.
By author, date, other characteristics. And above all, source — which would be a good thing for citations and acknowledgements.
Weighted source texts (for Markov chaining)
I’ve been staring at Edde Addad’s jGnoetry. Not only do I want to integrate some sort of re-processeable template (?) system, weighted texts is a desirable element. I’m constantly irritated by trying to combine two texts — say, Manual Of Egyptian Archaeology And Guide To The Study Of Antiquities In Egypt and a script for The Wizard of Oz or 1975 Apocalypse Now Workprint, only to find that almost everything is Egyptian due to the massive size of the source. If I could indicate that the Egyptian text, regardless of length should only be considered 30% of the time, I’d be much happier. Currently, I can dump in multiple copies of one text, but that’s a nuisance.
Selectable output as re-processable template
As mentioned, jGnoetry has made me want to be able to keep some words|tokens, but regenerate everything else.
Not just for markov chaining, however — being able to select any arbitrary rule and apply it to a block of text would be great. This would move the UI to a real alt-text-editor tool. Probably would need a rule-toolbar. Or easy way to select (and edit-on-the-fly?) a given rule for application.
rework of the rule selection system
Too many clicks to get to where I want to edit something.
(Assignable) hot-keys is one-option. A serious re-thinking of the rule-selection GUI is a must.