September 28th, 2012

Work on (and from, sadly) TextMunger has slowed down in the past few months. I took a vague break this summer, and am knuckling down on some freelance work this fall. Which leaves even less time for TM work….


Still, things have happened, and there are some plans.


XRML formatter

The XRML-targeted formatter works a bit better (doesn’t get stuck on zero-punctuation for long periods of time) even if the underlying algorithm still has issues.


I’m thinking of rewriting it, or writing another, that would work on multiple-passes over a text-block to allow for vertical text placement a la columns, although hopefully not so dense and delineated to show up rivers of punctuation that delineate actual columns. I would like just enough to encourage a vertical reading, and not just the standard (Western) left-to-right, line-by-line.


A simple implementation of a single-pass would be an offset, so that the next word lines up with the start of the last word. Would want to enhance this so that an arbitrary number of words (tokens, whatever) are printed before the next line alignment.


Multi-pass is more difficult, as it would include the above, as well as requiring an intake of a previously formatted chunk of text for re-writing.


Text library

I’ve done some work on preparing to save lists of selected texts, but it has been exposed to the UI yet.


First, because there is no menu-strip in the UI. I think it’s time to address this (as well as scrapping the diagnostics tab).


But mainly I think the library-system needs and overhaul. Parsing a nested set of folders for .txt files is fine, but it’s a dumb system. Workable, practical, but literally dumb — it returns texts, but no awareness of those texts. And awareness allows us to sort, etc.

 By author, date, other characteristics. And above all, source — which would be a good thing for citations and acknowledgements.


Weighted source texts (for Markov chaining)

I’ve been staring at Edde Addad’s jGnoetry. Not only do I want to integrate some sort of re-processeable template (?) system, weighted texts is a desirable element. I’m constantly irritated by trying to combine two texts — say, Manual Of Egyptian Archaeology And Guide To The Study Of Antiquities In Egypt and a script for The Wizard of Oz or 1975 Apocalypse Now Workprint, only to find that almost everything is Egyptian due to the massive size of the source. If I could indicate that the Egyptian text, regardless of length should only be considered 30% of the time, I’d be much happier. Currently, I can dump in multiple copies of one text, but that’s a nuisance.


Selectable output as re-processable template

As mentioned, jGnoetry has made me want to be able to keep some words|tokens, but regenerate everything else.


Not just for markov chaining, however — being able to select any arbitrary rule and apply it to a block of text would be great. This would move the UI to a real alt-text-editor tool. Probably would need a rule-toolbar. Or easy way to select (and edit-on-the-fly?) a given rule for application.


rework of the rule selection system

Too many clicks to get to where I want to edit something.


(Assignable) hot-keys is one-option. A serious re-thinking of the rule-selection GUI is a must.



August 27th, 2012


Looks like a good source of data for TextMunger; and/or some of its output….


June 11th, 2012

I love this for all the wrong reasons — I want to add this to TextMunger.


May 28th, 2012

With some birthday amazon gift-certificates, I picked up a few books I’m working on:

Actually, I bought the Mez Breeze book a few weeks earlier.

I encountered Mezangelle around 2000 I think, but it was mostly via the online _][ad][Dressed in a Skin C.ode and I simultaneously didn’t get much into the visual presentation (I’m still ambivalent towards a lot of online experiments, which is rather ironic. Hence, the ambivalence — which, isn’t no attitude, it’s two simultaneous opposing attitudes — I like it and don’t like it) and saw a parallel[le] oogenesis with my work in XraysMonaLisa. As such, I somewhat avoided it, so I could continue my own independent evolution.

Sadly, locked in an imaginary closet of isolation — I wasn’t aware of anybody working in such an area other than Mez — my writing dwindled. It never stopped, but I suffered innumerable crises of faith, and my output is still a trickle compared to the 1990s. OTOH, I am now currently gainfully employed full-time and working on a career and a family. So other bourgeoise impediments have arisen.

And, finally — I never quit. Half the reason for starting the wiki was for XraysMonaLisa — to give it an online medium, an outlet, an editor, and the ability to see the history river of continual change. Sadly, I’ve never invested the time in PmWiki to get the visualization that is possible with, say, MediaWiki, but hopefully someday…..

That’s all been changing over the last few months, as I finally started on my long-delayed text manipulation project, and in the process discovered a vast flotilla of fellow travelers out there. I’ve started writing more, and researching furious green ideas too late into the night…

Oh, and one other book that I picked up at Ocean State Job Lots:


May 25th, 2012

untitled (put the blame on VCR)

VCR. Take



yonder construction

Modified template of single-syllable lines run through jGnoetry (with some manual editing). The syllable-counting engine is imperfect.

I used the default sources, with Shakespeare @ 0%, leaving us with 50% Tristan Tzara and 50% Lawrence Lessig.

I’ve been re-wiring jGnoetry as a side-project (it is OpenSource, after all). I haven’t yet got to changing any functionality — it’s more been about refactoring, separation of presentation and calculation, etc. A good exercise while I think about the program, and it’s implications for TextMunger.

Hitting a syllabic rhythmic structure doesn’t hold a lot of draw, for me [gasp!]. But what I definitely like is the ability to save chunks of the output, and regenerate the gaps — and having said regeneration be from the same markov-model. Or from a different model, but, hopefully, keyed from the ends of the gaps.

The Anderson Poems had some sections I would have liked to have modified — but charNG doesn’t offer a simple way to do that. jGnoetry theoretically allows the use of source-text-as-template, but with random-offset lines I found the experience less than perfect. Not to mention the need to regenerate the corpuses from one application to the other.

blog vs wiki

I’m still grappling with having both a blog and a wiki. I think that blog-posts are a little more ephemeral, or better suited to date-ordering, as they may be multiply classified, whereas the wiki is pre-sorted into sections. Yes, the blog has Categories, which may correspond to wiki Groups. But the blog’s default ordering is time-wise. So the front-end gives us a different viewpoint.

And putting a text in here, first, allows me to stare append notes that might be distracting from the way I think it should be viewed in the wiki.


May 21st, 2012

We read in the introduction to Mr. Addad’s jGnoetry:

[…Q]uotation marks, parenthesis, and brackets, […] are tricky to handle in bigram generation systems because you can’t be guaranteed that an open-bracket will have a matching close-bracket.

No argument here. Both the opening and closing tokens of any pair are unlikely to fall within the short range of any interesting n-gram model.

But, what if we modify the model, to make it want to close off the pair?
Translating “want” into some sort of algorithmic model, to increase the chance of closing off the pair.
read more…


May 10th, 2012

I continue to tweak the TextMunger.

Currently, it’s case-sensitive all the time, which means that ALL CAPS words and phrases do not mingle well with lower-case words and phrases. So, I thought, make it NOT case-sensitive, and ALL CAPS sentences will intermix!

Which, yes! They do!

The Law of Unintended (or Not Realized Ahead of Time) Consequences says that it will also lead to a loss of recognition of the start and end of sentences, which was helped by U&LC recognition. This being a dumb-as-statistical-sticks Markov-chaining.

read more…


May 1st, 2012

Literal Minded sets us straight — it’s not pronounced the way you think:

The first thing one needs to know about metathesis is that it is pronounced meTAthesis, not MEtathesis. On at least one occasion I’ve seen the word in print while I wasn’t wearing my phonology hat, pronounced it the wrong way, and thought it referred to a thesis about theses. In fact, it refers to transpositions of sounds within a word. For example, I’ve heard several people talk about “agpar scores” for newborn babies, when they mean “Apgar scores.” Or for another example, check out Semantic Compositions‘s posting on cavalry vs. Calvary.

It would be fun to build a metathesis-processer for TextMunger.

  • syndicate

    • Add to MyMSN
    • Add to MyYahoo
    • Add to Google Reader
    • Add to Bloglines
    • Add to Newsgator
    • Add to NewsIsFree