Over the last few weeks I got in touch with the fascinating field of data visualisation which offers great ways to play around with the perception of information.
In a more formal approach data visualisation denotes “The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition“
Nowadays there is a huge flood of information that hit’s us everyday. Enormous amounts of data collected from various sources are freely available on the internet. One of these data gargoyles is Twitter producing around 400 million (400 000 000!) tweets per day!
Tweets basically offer two “layers” of information. The obvious direct information within the text of the Tweet itself and also a second layer that is not directly perceived which is the Tweets’ metadata. In this case Twitter offers a large number of additional information like user data, retweet count, hashtags, etc. This metadata can be leveraged to experience data from Twitter in a lot of exciting new ways!
So as a little weekend project I have decided to build a small piece of software that generates real-time heat maps of certain keywords from Twitter data.
This is a first static peek on what it’s gonna look like (apparently the friendly floatees use Twitter, too):
See a screencast here: screencast.com
To get this working I have used lots of shiny things:
- Twitter Streaming API
- MongoDB’s capped collections
- Flask
- Redis
- Tweepy
- jQuery Eventsource
- Google Maps API
- heatmap-gmaps.js
The app is written in Python and consists of mainly three components:
tstream.py:
A small service based on tweepy that implements a StreamListener which inserts incoming data in a MongoDB capped collection. Here you can also set filter terms. This example uses mostly terms related to “Big Data”.
tweet_service.py:
A Flask based web app which get’s new data from MongoDB and makes use of the publish-subscribe pattern. Being “tailable” MongoDB’s capped collections come in handy. There is no need to remember which messages a client has already received, the cursor itself yields new documents on arrival. Also, capped collections are of a fixed size which is appropriate for this use case but your mileage may vary. Incoming Tweets are published to a redis channel for which there is also a listener that returns a “text/event-stream” “Content-Type” header for connecting clients.
map.html:
A few lines of HTML and JavaScript which bring up a Google Maps canvas and a listener for server-sent events. When new data, basically consisting of Lat/Lon tuples, arrives the new point is added to a heat map overlay based on heatmap-gmaps.js in real-time.
Rough overview:
Conclusion
With a relatively small amount of code it is possible to turn text data in astonishing visualizations. With only a little more effort different hash tags could be illustrated by different colors. Or one could count tweets on a topic in certain regions and then compare activity based on number of citizens. There are lots of interpretations possibel through the underlying data set.
You can find the code on github, feel free to fork and play around!


Awesome post, Johannes! Adding a semantic engine to the real time Twitter feed is going to result in the killer application to show the potential of big data and the semantic web working together… Instead of a geospacial heatmap you could create a conceptual network heatmap and use it to monitor trends and exploit relationship between semantic categories! Pure gold!
I’m really keen on seing the way it is going to develop!
Reblogged this on Agile Mobile Developer.
Wow that’s really inspiring for a beginner like me! Thanks for the great post
Pingback: Real-time Twitter heat map with MongoDB « Another Word For It
So I’m trying to run the HeatMap, and it gives me an error. I don’t doubt that it’s PEBCAK (problem exists between chair and keyboard), but I was curious as to whether or not you had any insight:
ConnectionError: Error 61 connecting localhost:6379. Connection refused.
127.0.0.1 – - [23/Jul/2012 13:20:18] “GET /tweets?callback=loomit HTTP/1.1″ 200 -
Debugging middleware caught exception in streamed response at a point where response headers were already sent.
Traceback (most recent call last):
File “/Library/Python/2.6/site-packages/werkzeug/wsgi.py”, line 513, in next
return self._next()
File “/Library/Python/2.6/site-packages/werkzeug/wrappers.py”, line 829, in iter_encoded
for item in self.response:
File “/Users/kalil/Desktop/Austin/Heatmap/tweet_service.py”, line 54, in event_stream
pubsub.subscribe(‘chat’)
File “/Library/Python/2.6/site-packages/redis/client.py”, line 1316, in subscribe
return self.execute_command(‘SUBSCRIBE’, *channels)
File “/Library/Python/2.6/site-packages/redis/client.py”, line 1268, in execute_command
connection.connect()
File “/Library/Python/2.6/site-packages/redis/connection.py”, line 204, in connect
raise ConnectionError(self._error_message(e))
Hi,
it seems as if you haven’t started redis which would listen on port 6379!
Cheers,
Johannes
Ah, I am unfamiliar with Redis – it is now listening as intended.
I truly wish I had another location to ask questions in, haha – could pester you for another?
I set up a collection called tweets_tail under a database named “test” in mongodb…
Yet when I run tweet_service.py, it gives me a “OperationFailure: database error: tailable cursor requested on non capped collection”.
Looking at tweet_service.py, it seems as though this shouldn’t cause a problem dependent on the name of the database, as it explicitly assigns the collection as a member of the database.
I ran a query on “db.tweets_tail.isCapped” and it returned:
function () {
var e = this.exists();
return e && e.options && e.options.capped ? true : false;
}
Which is the correct format.
Any ideas?
Hi,
did you configure the collection to be capped? Just run “db.tweets_tail.isCapped()” (mind the braces) and it should tell!
Cheers,
Johannes
Aye, it returns true.
I’m still working on it, and I can’t determine if it’s an issue with the database itself, or perhaps a bug?
I’m more leaning towards the idea that I’m doing something incorrectly.
Okay, you can disregard my previous comments – I figured it out. Thank you for your help!
Actually, I hate to spam the comment section with questions, but now that I have created a database named tstream with a capped collection named tweets_tail, redis and the program itself no longer throw me errors.
However, when I run it, I get:
avon:heatmap kalil$ python tweet_service.py
* Running on http://0.0.0.0:5000/
* Restarting with reloader
beginning to tail…
1
127.0.0.1 – - [24/Jul/2012 14:57:54] “GET /tweets?callback=loomit HTTP/1.1″ 200 -
And the map never gains any new tweets. I’m under the impression that no tweets are actually being saved to the database; upon querying db.tweets_tail.count(), it has 0 documents, and, logically, db.tweets_tail.find({}) returns no documents.
Redis appears to be fine:
“[227] 24 Jul 14:57:54 – Accepted 127.0.0.1:52186
[227] 24 Jul 14:57:58 – 1 clients connected (0 slaves), 931152 bytes in use”
When tstream.py runs, none of the functions in the StreamListener class appear to ever be called. I placed print statements before and after the class, and they print. The only thing that does not print is the statement after: “streamer.filter(track = setTerms)”
I know that Tweepy has a notoriously broken Streaming API – is that where the problem lies?
I’d really like to try this out, and I appreciate your help.
Ey Kalil i have been playing around with the code and i think that your issue is the size of the capped collection.
i had the same problem, although i had more that a million tweets on a couchdb database, i tried everything on my simple script to make it work to import to mongo, but simply did’t worked at all.
Until i increase the size of the capped collection with this.
db.runCommand({“convertToCapped”: “tweets_tail”, size: 10000000});
Its all seems that depending on the tweet, the size of the collection changes, so if you have a fixed capped collection mongodb will simply ignore it.
Hope that Helps!!
Hi, if you increase the size of the capped collection, you end up with a disconnection from the client. The exception thrown is:
—————————————-
Exception happened during processing of request from (’127.0.0.1′, 52411)
Traceback (most recent call last):
File “/usr/lib/python2.7/SocketServer.py”, line 284, in _handle_request_noblock
self.process_request(request, client_address)
File “/usr/lib/python2.7/SocketServer.py”, line 310, in process_request
self.finish_request(request, client_address)
File “/usr/lib/python2.7/SocketServer.py”, line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File “/usr/lib/python2.7/SocketServer.py”, line 640, in __init__
self.finish()
File “/usr/lib/python2.7/SocketServer.py”, line 693, in finish
self.wfile.flush()
File “/usr/lib/python2.7/socket.py”, line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
—————————————-
Ey Guys, besides of that Redis method listen is blocking the entire app. So in order to work “as is”
you need to
1./ Launch app,
2./ load the url, wait until the entire map is loaded (no heatmap showing)
3./ restart the app.
that way you could see the heatmap filling.
Between the json callback and the tail and listen the app only listen once before it stop, so any new request will not work because never call again the GET call from redis, and redis, is on his last row, so, if new clients are coming up will never answer the entire set unless the app is restarted.
jbrandstettercs: if you dont mind i would like to add and make mods to your code.
Yes, of course you can tinker with the code! It’s on github so feel free to fork!
I’ve followed the example exactly and quite literally copied and pasted the code, but the heat map isn’t generated. Is there a problem with the code? I have no error when I run tstream.py and tweet_service.py and then running the map.html on local host as per the instructions. I’ve also created a capped mongo collection.
Furthermore I get this error:
Is it because the collection is capped too high amount?
Exception happened during processing of request from (’127.0.0.1′, 60681)
Traceback (most recent call last):
File “/usr/lib/python2.7/SocketServer.py”, line 284, in _handle_request_noblock
self.process_request(request, client_address)
File “/usr/lib/python2.7/SocketServer.py”, line 310, in process_request
self.finish_request(request, client_address)
File “/usr/lib/python2.7/SocketServer.py”, line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File “/usr/lib/python2.7/SocketServer.py”, line 640, in __init__
self.finish()
File “/usr/lib/python2.7/SocketServer.py”, line 693, in finish
self.wfile.flush()
File “/usr/lib/python2.7/socket.py”, line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
—————————————-