From xradiograph

WordSalad: Automatic for the People

On this page... (hide)

  1. 1. Philip M. Parker’s writing machines
  2. 2. Applied
    1. 2.1 Automated Insights
    2. 2.2 Narrative Science
  3. 3. Betascript publishing
  4. 4. Twitter Bots
    1. 4.1 Horse e-books
    2. 4.2 Mine
    3. 4.3 Others
  5. 5. Human Machines
    1. 5.1 Term Paper generation
    2. 5.2 AOL Content generators
  6. 6. Auto-correction in... things that you type in
  7. 7. jokes
  8. 8. See Also

1.   Philip M. Parker’s writing machines

Man “writes” 200,000 books [Boing Boing]
He Wrote 200,000 Books (but Computers Did Some of the Work) [NYT]


US Patent: Method and apparatus for automated authoring and marketing
which links to a video of Parker’s software in-action
Parker’s video-description is very informative as to his processes and goals.
Review for inclusion, here.


“Philip M. Parker has written some sophisticated software for auto-assembling books about various technical subjects, and has “written” more than 200,000 of them.”


This is a little larger-scale than the Generators I’ve listed elsewhere.


Or maybe not. Maybe it’s just “applied theory” with brute force.



patent: Method and apparatus for automated authoring and marketing


Digital Poetry


Parker has applied his techniques within his dictionary project to digital poetry; he reports posting over 1.3 million didactic poems, aspiring to reach one poem for each of word found in the English language. He refers to these as “edge poems” since they are generated using graph theory, where “edge” refers to mathematical values that relate words to each other in a semantic web. He has posted in the thesaurus section of his online dictionary the values used in these algorithms. Genres produced include the following: acrostic, butterfly cinquain, cinquain, diamante, ekphrastic, fib or Fibonacci poetry, gnomic poetry, haiku, Kural, limerick, mirror cinquain, nonet, octosyllable, pi, quinzaine, Rondelet, sonnet, tanaka, unitoum, waka, simple verse, and xenia epigram. Genres were created by Parker to allow one genre of poem for each letter of the English alphabet, including Yoda, for Y (poetry using the grammar structure of the famous Star Wars character), and Zedd for Z (poems shaped in the letter Z). His poems are didactic in nature, and either define the entry word in question, or highlight its antonyms. He has stated plans to expand these to many languages and is experimenting with other poetic forms.


An Introduction to “Edge Poetry” by Philip M. Parker, INSEAD


UPDATE 2013.10.11: the above two links, and root-domain, are dead
instead, we have ran trawls on 2012.02.06 (successful) and 2012.10.11 (“Service Unavailable”).



UPDATE 2015.01 Phil Parker is now operating EdgeMaven Media the world’s foremost provider of applications to firms seeking to create computer-generated content or title material. He’s been doing this since 2012, only I just now noticed.


2.   Applied

Now comes news (via Peter Kafka in All Things D and Jason Boog in Galleycat) that robot-written “stories” are turning up on the pages of Forbes and other publications. The robots are made by Narrative Science, which (says its About page) “started life as a joint research project at Northwestern University Schools of Engineering and Journalism.”


More on Narrative Science and competitor Automated Insights at Algorithmically constructed news (BoingBoing)

Steven Levy:  (yes, it’s that Steven Levy)


2.1   Automated Insights

Automated Insights’ Robbie Allen on How I automated my writing career:


[...] I’ve been programming for 16 years. My whole career I’ve focused on automating the un-automatable — essentially making computers do things people never thought they could do. By the time I started on my 10th book, I got another kind of itch — I wanted to automate my writing career. I was getting bored with the tedium of writing books, and the money wasn’t that good.


But that’s absurd, right? How can a computer possibly write something coherent and informative, much less entertaining? The “how can a computer possibly do X?” questions are the ones I’ve spent my career trying to answer. So, I set out on a quest to create software that could write. It took more effort than writing 10 books put together, but after building a team of 12 people, we were able to use our software to generate more than 100,000 sports-related stories in a nine-month period.




Sports is only one of many different categories we are working on. We’ve also done work in finance, real estate and a few other data-intensive industries. However, don’t limit your thinking on what’s possible. We get a steady stream of requests from non-obvious industries, such as pharmaceutical clinical trials and even domain name registrars. Any area that has large datasets where people are trying to derive meaning from the data are potential candidates for our technology. [emphasis added]


2.2   Narrative Science

Narrative Science articles on They convey information. Not necessarily fascinating, but they convey it. And that’s what these sorts of articles are all about: numbers.



3.   Betascript publishing

amazon search for Betascript publish → refs



Before I talk about my own troubles, let me tell you about another book, “Computer Game Bot Turing Test”. It’s one of over 100,000 “books” “written” by a Markov chain running over random Wikipedia articles, bundled up and sold online for a ridiculous price. The publisher, Betascript, is notorious for this kind of thing.

VDM Publishing is a German publishing group based in Saarbrücken, Germany, with offices in Argentina, Latvia, Mauritius and Moldova. Submissions are neither peer reviewed nor edited. Its book production is based on print on demand technology.


VDM publishing’s imprints—Alphascript, Betascript, Fastbook Publishing and Doyen Verlag—specialize in publishing and selling Wikipedia articles in printed form via print on demand, in e-commerce bookstores. Alphascript distributes publications through several on-line book retailers. At least some of the books are actually printed by As of November 2010, around 150,000 such titles have been produced. In June 2010, VDM started its own online-bookshop: More Books?! Publishing.


VDM’s publishing methods have received criticism for the soliciting of manuscripts from thousands of individuals, for providing non-notable authors with the appearance of a peer-reviewed publishing history, for benefiting from the free contributions of online volunteers, and for insufficiently disclosing the free nature of their content. VDM responds that Wikipedia is a valuable, quality resource, that the company has no problem asking authors for content, that buyers are informed of where information comes from, that books are a convenient form to collect articles about interesting subjects, and that its customers are satisfied with VDM’s products.

The articles are often poorly printed with features like missing characters from foreign languages, and numerous images of arrows where Wikipedia had links. It appears much better to read the original articles for free at the Wikipedia website than paying a lot of money for what has been described as a scam or hoax. Advertising for the books at Amazon and elsewhere does not reveal the free source of all the content. It is only revealed inside the books, which may satisfy the license requirements for republishing of Wikipedia articles.


In an interview published on the VDM website, Wolfgang Philipp Müller, CEO of the VDM Group, defends the legality of Alphascript Publishing’s commercial practices: “The new branded publishing houses Alphascript and Fast Book? are publishing controversial books. Are you sure that this kind of publication is legal? Müller: Did you ever ask this question to Google? For years, they scanned works that were protected by copyright law, but Google published these works without the permission of the authors or publishing houses. That’s more than breaking law, that’s plagiarism. At last, Google gets into trouble. In sharp contrast to Google, Alphascript and Fast Book? are publishing works which are intended and allowed to be published. These so-called copyleft works are put in the internet at everyone’s disposal. The licenses for the free use expressly give the permission for commercial use. And this is exactly what we are doing. But it is unusual, that’s right.” As of 26 February 2010, there are 17,658 books “printed” over broad diversity of titles as “Frederic P. Miller, Agnes F. Vandome, John McBrewster” are signed as editor by Alphascript Publishing.


As an example of the “care” given to the books, the book “History of Georgia (country)” is about the European country Georgia but has a cover image of Atlanta in the American state Georgia. The Wikipedia article History of Georgia (country) does not make such a comical blunder. Another example is a book about an American football team with a soccer player on the cover.



How to make your own books from Wikipedia -- turns out there’s a built-in-feature. Doubt that’s how the “big houses” do it, though...
The open-source code is found at



4.    Twitter Bots

4.1   Horse e-books


4.2   Mine



4.3   Others

(as if we care!)



5.   Human Machines

5.1   Term Paper generation

The Term Paper Artist - writing words and pages for a term-paper mill. Interesting thought on generation of content


Term paper work is also extremely easy, once you get the hang of it. It’s like an old dance routine buried in one’s muscle memory. You hear the tune — say, “Unlike the ancient Greek tragic playwrights, Shakespeare likes to insert humor in his tragedies” — and your body does the rest automatically. I’d just scan Google or databases like for a few quotes from primary and secondary sources, create an argument based on whatever popped up from my search, write the introduction and underline the thesis statement, then fill in the empty spaces between quotes with whatever came to mind.


Getting the hang of it is tricky, though. Over the years, several of my friends wanted in on the term paper racket, and most of them couldn’t handle it. They generally made the same fundamental error — they tried to write term papers. In the paper mill biz, the paper isn’t important. The deadline, page count, and number of sources are. DUMB CLIENTS make up much of the trade. They have no idea whether or not Ophelia committed suicide or was secretly offed by Gertrude, but they know how to count to seven if they ordered seven pages.


I had a girlfriend who had been an attorney and a journalist, and she wanted to try a paper. I gave her a five-page job on leash laws in dog parks, and she came home that evening with over 50 pages of print outs, all articles and citations. She sat down to write. Three hours later she was rolling on the floor and crying. She tried to write a paper, instead of filling five pages. Another friend of mine spent hours trying to put together an eight-page paper on magical realism in Latin American fiction. At midnight she declared that it was impossible to write that many pages on books she had never read. She was still weeping, chain-smoking cigarettes, and shouting at me at 2 a.m.  I took 20 minutes and finished the paper, mostly by extending sentences until all the paragraphs ended with an orphaned word on a line of its own. [emphasis added]


5.2   AOL Content generators

AOL Hell: An AOL Content Slave Speaks Out


[....] If AOL could find a good way for machines to write about Lady Gaga, they would almost certainly fire the writers who remain.


When it comes to an article, what AOL cares about is the title, and the “keywords” that will make the article more likely to show up among the top results on Google. You type phrases into “Google Trends,” and it suggests the most popular combination of words associated with that topic.  You then stick those words into your title and first paragraphs. Rinse, wash, and repeat. The article itself was just ballast.


“LADY GAGA PANTLESS IN PARIS” is the example given in “The AOL Way” internal documents.  That’s the best possible title. A buzz-worthy topic, a sexy result. It mattered little if Lady Gaga was actually pantless in Paris; it only had to relate somehow to the article as a whole. The entire title could be a come-on, a tease. It might well turn out that Lady Gaga was neither pantless, nor in Paris at the time. The important part was that the reader would click on those words to read the rest, thereby producing ad revenue for the websites. Words didn’t matter; stealing other people’s work also didn’t matter.



6.   Auto-correction in... things that you type in

phones are them major culprit here, but let’s not forget auto-accept in word-processing spell-check is usually bad idea...


How your cell phone’s autocorrect software works, and why it’s getting better.



7.   jokes

Unsupervised joke generation from big data [PDF], a paper by University of Edinburgh researchers Sasa Petrovic and David Matthews, describes an ingenious and successful method for teaching a computer to make up jokes like “I like my relationships like I like my source, open;” “I like my coffee like I like my war, cold;” and “I like my boys like I like my sectors, bad.” The researchers wrote code that called on Google’s n-gram database to find noun-attribute pairs, zero in on nouns with ambiguous meaning, and automatically generate jokes.




8.   See Also


Retrieved from
Page last modified on January 08, 2015, at 04:33 PM