Markov chain sentence generator in erlang

Long ago I’d implemented an python IRC bot (first post on the blog) that used Markov chains to speak semi-intelligible drivel based on a corpus(brain/soul) that is filled with other drivel that gets talked about in chatrooms.

Here is the logic behind using Markov chains to generate text that sounds intelligible :

– Feed the program a big input text ( from a text file or from a chatroom), and have it parse it into a ‘bag’ ( a dictionary with multiple values for a key, which are arranged in a list ).

– The key can be a number of words, which is the chain length. e.g: {‘the’, ‘dream’, ‘of’} is one key of length 3. {‘what’, ‘can’} is a key of length 2. The value for these keys is the word that follows them in the input body of text.

e.g: “the dreams of men never die.” yields  { {‘the’, ‘dreams’, ‘of’} -> ‘men’, {‘dreams’, ‘of’, ‘men’} -> ‘never’ ……}

as the bag, for a chain length of 3 words. Let’s assume we have a chain length of 4.

1. Now pick at random any 4 consecutive words from the input text. This is our initial state S0. If S0 = {W0,W1,W2,W3}, then our output string is W0 W1 W2 W3 right now.

2. Based on current state S0, and the value of this key in the bag ( say V), we move to a next state S1. i.e: If S0 = {W0,W1,W2,W3}, and Bag[S0] = V, then S1 = {W1,W2,W3,V}. Our output string is W0 W1 W2 W3 V. Our state becomes S1 now.

3. Repeat 2. with state S0 = S1 (the output state of step 2) until we have the output string of the desired length or property.

Here is a small implementation in erlang. http://pastebin.com/bGaW7CkK

– Save the above file as markov.erl

– Compile with erlc markov.erl

– Run with erl -noshell -s markov start “/path/to/corpus/file” -s init stop

Here is another version ( a distributed version, dishing out chores to small processes ) – http://pastebin.com/d1RWmZ3Z . Be warned though – the distributed version will easily stall your dual core laptop if you feed it with a monstrosity comprising of 10K words. It spawns one process for each word.


On running either of the above versions, a random line that sounds intelligible shall be displayed. The program uses Markov chains of length 3. You can change it by changing the constant ARITY defined at the top.

Running it on a txt file (about hackers and painters) as its corpus yielded the following output – with a chain length of 3 – http://pastebin.com/aWZeZKmw

Advertisements
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

3 Responses to Markov chain sentence generator in erlang

  1. that of a very good understand.

  2. Kyong Gdula says:

    Constantly a great article when i stop by this web site along with websites you possess. Understand your own insights.

  3. M. Arkov says:

    The commenters here might need to use your markov chain generator.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s