Markov chain sentence generator in erlang

Long ago I’d implemented an python IRC bot (first post on the blog) that used Markov chains to speak semi-intelligible drivel based on a corpus(brain/soul) that is filled with other drivel that gets talked about in chatrooms.

Here is the logic behind using Markov chains to generate text that sounds intelligible :

– Feed the program a big input text ( from a text file or from a chatroom), and have it parse it into a ‘bag’ ( a dictionary with multiple values for a key, which are arranged in a list ).

– The key can be a number of words, which is the chain length. e.g: {‘the’, ‘dream’, ‘of’} is one key of length 3. {‘what’, ‘can’} is a key of length 2. The value for these keys is the word that follows them in the input body of text.

e.g: “the dreams of men never die.” yields  { {‘the’, ‘dreams’, ‘of’} -> ‘men’, {‘dreams’, ‘of’, ‘men’} -> ‘never’ ……}

as the bag, for a chain length of 3 words. Let’s assume we have a chain length of 4.

1. Now pick at random any 4 consecutive words from the input text. This is our initial state S0. If S0 = {W0,W1,W2,W3}, then our output string is W0 W1 W2 W3 right now.

2. Based on current state S0, and the value of this key in the bag ( say V), we move to a next state S1. i.e: If S0 = {W0,W1,W2,W3}, and Bag[S0] = V, then S1 = {W1,W2,W3,V}. Our output string is W0 W1 W2 W3 V. Our state becomes S1 now.

3. Repeat 2. with state S0 = S1 (the output state of step 2) until we have the output string of the desired length or property.

Here is a small implementation in erlang. http://pastebin.com/bGaW7CkK

– Save the above file as markov.erl

– Compile with erlc markov.erl

– Run with erl -noshell -s markov start “/path/to/corpus/file” -s init stop

Here is another version ( a distributed version, dishing out chores to small processes ) – http://pastebin.com/d1RWmZ3Z . Be warned though – the distributed version will easily stall your dual core laptop if you feed it with a monstrosity comprising of 10K words. It spawns one process for each word.


On running either of the above versions, a random line that sounds intelligible shall be displayed. The program uses Markov chains of length 3. You can change it by changing the constant ARITY defined at the top.

Running it on a txt file (about hackers and painters) as its corpus yielded the following output – with a chain length of 3 – http://pastebin.com/aWZeZKmw

Posted in Uncategorized | Tagged , , , , | 3 Comments

A genetic algorithm example in Erlang.

I’ve never cared too much about genetic algorithms. It sounded way too far fetched and impractical, so I’d absolutely no interest or knowledge about it. That was, until today  – and this awesome article explaining genetic algorithms. There’s nothing like a well written article, and this was surely one of that kind. At the very least, it inspired me to write my own version of the idea, although without the fancy graphs, etc. Since the time I read the article, I’ve already had a few places where I think such an idea could easily be put to work :).

The idea behind this program is to begin with a base population of X people, and try to breed/mutate into Y ( where Y is specified on the command line ), through general selection and random mutation. Basically, what this means is that we can select the population fit to breed, and can randomly mutate the progenies formed out of the mating. Varying the parameters (like the policy with which we select the population fit to mate, or the probabiliy with which mutation occurs, leads to interesting results!)

So here it is in erlang, in all its glory http://pastebin.com/hk8yNEi3

1. Download and save as genetic.erl

2. Compile as erlc genetic.erl

3. Run from the command line as erl genetic.erl -s genetic test dinosaur -s init stop     (If you want to evolve into a ‘dinosaur’).

The output will display a series of cross-breeding, followed by random mutations until a ‘dinosaur’ is formed ( from random 8 lettered words ).

To reiterate – please read the post linked to, it’s awesome.

Thanks,

Abhinav

Posted in Uncategorized | Tagged , , , , , , , | 3 Comments

Erlang websocket server ( websocket protocol 76 )

*Note* – As of Oct 24, 2011, This version of websocket server will only work properly with Google Chrome <= 13.X. The new and last call ( hopefully stable) version of the websocket draft has been released, and soon someone will implement the handshake, which changes a bit ( the headers change a little, and so do the framing/encoding bits ).

In this post earlier today, I’d written a simple websocket server implementing the 76thwebsocket protocol ietf draft in Python. Erlang felt bad and whined about it so I redid it in Erlang. :)

http://pastebin.com/fmacSxGA

It’s mostly based on Joe Armstrong’s original implementation of draft 75 of the  same, but written from scratch to implement the 76th draft, which differs from the 75th, as mentioned here.

Thanks,

Abhinav

Posted in Uncategorized | Tagged , , , | 1 Comment

Python websocket server ( websocket protocol 76 )

*Note* – As of Oct 24, 2011, This version of websocket server will only work properly with Google Chrome <= 13.X. The new and last call ( hopefully stable) version of the websocket draft has been released, and soon someone will implement the handshake, which changes a bit ( the headers change a little, and so do the framing/encoding bits ).

 

 

http://pastebin.com/zBjN02jQ

A simple python server that handshakes an HTML5 enabled browser connecting to it using websockets. Also includes basic message framing. pywebsocket was way too overkill for what I needed, and there were no other python implementations implementing revision 76 (most implement 75 which is slightly different) so I decided to implement a quick handshake myself. The server’s end of the handshake has 13 steps. If you think that’s too much, you should know that the client’s side has 43 steps !

TODO : disconnect handshake ( doesn’t interrupt functionality though ).

Websockets mark the death of workarounds like ajax, orbited/comed, polling, since a browser can now simply open a socket and connect to any application with an open socket that is willing to handshake it according to the w3c’s recommendations.

Enjoy !

Posted in Uncategorized | Tagged , , , | 6 Comments

A very basic erlang Crawler

1.Save the code below as spyder.erl

2.Run as : erl spyder.erl  -s spyder start http://site_to_crawl.com -s init stop

http://pastebin.com/Qw9f11bi

This is a very crude crawler – will crawl all the links in a page, and then further. It doesn’t protect you from black holes, and will crawl away without concern for robots.txt. Just something to brush up my rusty(probably visible in the code ?) erlang. But what the hell – it works ! Feel free to improve on it :)

Posted in Uncategorized | Tagged , | 1 Comment

[OwlKun] Integrating OAuth with Twitter’s API in python

I believe everyone’s heard that  Twitter will be doing away with basic authentication come June ’10. They’re switching to OAuth. It’s a mechanism via which a consumer (C) and access the resources of a site (S), on behalf of a user (U), without a username or a password being provided to C by U.  ( C can be either an application or a website, U is a user registered on S .)

In our case, C is our application ( which here is OwlKun). U is you – your twitter account. S is Twitter.

Now, every python twitter API/library will eventually have to switch to OAuth, and many have already done so, but there is absolutely no comprehensive documentation on how to go about integrating your app with twitter, using any given language apart from the existing source code.

Lines 75-104 contain the OAuth specific material, others contain integration stuff so that you can actually run these commands from inside vim. :)

Here’s the link to OwlKun’s source :

http://owlkun.googlecode.com/files/owlkun.py

Etymology : OwlKun = Owl ( Stays awake at nights, it was night time when I wrote this ) + Kun = Friend ( Since I have no friends since it’s a nice and friendly application that’s well documented. )

Thanks,

Abhinav

Posted in Python, Twitter, Uncategorized | Tagged , , , | 1 Comment

Setting your currently playing rhythmbox song as your pidgin status message.

This script will set your pidgin status message as the song that’s currently being played by your Rhythmbox.

It’s a simple process – save http://pastebin.com/jeHPqKWT to a file called status_changer.py, and run it. i.e:

1. Save it to a file – say status_changer.py
2. Run it : python status_changer.py

3. Stare in awe :)

 

:)

 

Posted in Dbus, Linux, Metal, Music, Python, rhythmbox, Uncategorized | Tagged , , , , , , , , , | 1 Comment