Getting Started with D3.js

You are thinking about including some nice charts and graphics into your current project? Maybe you heard about D3.js, as some people claim it the universal JavaScript visualization framework. Maybe you also heard about a steep learning curve. Let’s see if this is really true!

First of all, what is D3.js?
D3.js is an open source JavaScript framework written by Mike Bostock helping you to manipulate documents based on data.

Okay, let’s first have a look at the syntax
Therefore, lets look at the following hello world example. It will append an <h1> element saying ‘Hello World!’ to the content <div> element.

<!DOCTYPE html>
<html>
    <head>
        <script src="http://d3js.org/d3.v3.min.js"></script>
    </head>
    <body>
    <div id="content"></div>
        <script> 
            d3.select('#content')
                .append('h1')
                .text('Hello World!');
        </script>
    </body>
</html>

As you can see the syntax is very similar to frameworks like JQuery and obviously, it saves you a lot of code lines as it offers a nice fluent API.

But let’s see how we can bind data to it:

d3.select('#content')
   .selectAll('h1')
   .data(['Sarah', 'Robert', 'Maria', 'Marc'])
   .enter()
   .append('h1')
   .text(function(name) {return 'Hello ' + name + '!'});

What happens? The data function gets our names array as parameter and for each name we append an <h1> element with a personalized greeting message. For a second, we ignore the selectAll(‘h1′) and enter() method call, as we will explore them later. Looking into the browser we can see the following:

Hello Sarah!
Hello Robert!
Hello Maria!
Hello Marc!

Not bad for a start! Inspecting the element in the browser, we see the following generated markup:

[...]
    <div id="content">
        <h1>Hello Sarah!</h1>
        <h1>Hello Robert!</h1>
        <h1>Hello Maria!</h1>
        <h1>Hello Marc!</h1>
    </div>
[...]

This already shows one enourmous advantage of D3.js: You acctually see the generated code and can spot errors easily.

Now, let’s have a closer look at the data-document connection
As mentioned in the beginning, D3.js helps you to manipulate documents based on data. Therefore, we only take care about handing the right data over to D3.js, so the framework can do the magic for us. To understand how D3.js handles data, we’ll first have a look at how data might change over time. Let’s take the document from our last example. Every name is one data entry. 

Data-Document Example 1

Easy. Now let’s assume new data comes in:

Data-Document Example 2

As new data is coming in, the document needs to be updated. The entries of Robert and Maria need to be removed, Sarah and Marc can stay unchanged and Mike, Sam and Nora need a new entry each. Fortunately, using D3.js we don’t have to care about finding out which nodes need to be added and removed. D3.js will take care about it. It will also reuse old nodes to improve performance. This is one key benefit of D3.js.

So how can we tell D3.js what to do when?
To let D3.js update our data, we initially need a data join, so D3.js knows our data. Therefore, we select all existing nodes and connect them with our data. We can also hand over a function, so D3.js knows how to identify data nodes. As we initally don’t have <h1> nodes, the selectAll function will return an empty set.

var textElements = svg.selectAll('h1').data(data, function(d) { return d; });

After the first iteration, the selectAll will hand over the existing nodes, in our case Sarah, Robert, Marc and Maria. So we can now update these existing nodes. For example, we can change their CSS class to grey:

textElements.attr({'class': 'grey'});

Additionally, we can tell D3.js what to do with entering nodes, in our case Mike, Sam and Nora. For example, we can add an <h1> element for each of them and set the CSS class to green for each of them:

textElements.enter().append('h1').attr({'class': 'green'});

As D3.js now updated the old nodes and added the new ones, we can define what will happen to both of them. In our cases this will affect the nodes of Mike, Sarah, Sam, Mark and Nora. For example, we can rotate them:

textElements.attr({'transform', rotate(30 20,40)});

Furthermore, we can specify what D3.js will do to nodes like Robert and Maria, that are not contained in the data set any more. Let’s change their CSS class to red:

textElements.exit().attr({'class': 'red'});

You can find the full example code to illustrate the data-document connection of D3.js as JSFiddle here: https://jsfiddle.net/q5sgh4rs/1/

But how to visualize data with D3.js?
Now that we know about the basics of D3.js, let’s go to the most interesting part of D3.js: drawing graphics. To do so, we use SVG, which stands for scalable vector graphics. Maybe you already know it from other contexts. In a nutshell, it’s a XML-based vector image language supporting animation and interaction. Fortunately, we can just add SVG tags in our HTML and all common browsers will display it directly. This also facilitates debugging, as we can inspect generated elements in the browser. In the following, we see some basic SVG elements and their attributes:

SVG elements

To get a better understanding of how SVG looks like, we’ll have a look at it as a basic example of SVG code, generating a rectangle, a line and a circle.

<svg>
 <rect x="10" y="15" width="60" height="20" />
 <line x1="95" y1="35" x2="105" y2="15" />
 <circle cx="130" cy="25" r="6" />
</svg>

To generate the same code using D3.js, we need to add an SVG to our content <div> and then append the tree elements with their attributes like this:

var svg = d3.select('#content').append('svg');
svg.append('rect').attr({x: 10, y: 15, width: 60, height: 20});
svg.append('line').attr({x1: 85, y1: 35, x2: 105, y2: 15});
svg.append('circle').attr({cx: 130, cy: 25, r: 6});

Of course, for static SVG code, we wouldn’t do this, but as we already saw, D3.js can fill attributes with our data. So we are now able to create charts! Let’s see how this works:

<div id="content"></div>
<script>
 d3.select('#content')
        .append('svg')
            .selectAll('rect')
            .data([100, 200, 150, 60, 50])
            .enter()
            .append('rect')
                .attr('x', 0)
                .attr('y', function(data, index) {return index * 25})
                .attr('height', 20)
                .attr('width', function(data, index) {return data});
</script>

This will draw our first bar chart for us! Have a look at it: https://jsfiddle.net/tLhomz11/2/

How to turn this basic bar chart into an amazing one?
Now that we started drawing charts, we can make use of all the nice features D3.js offers. First of all, we will adjust the width of each bar to fill the available space by using a linear scale, so we don’t have to scale our values by hand. Therefore, we specify the range we want to get values in and the domain we have. In our case, the data is in between 0 and 200 and we would like to scale it to a range of 0 to 400, like this:

var xScale = d3.scale.linear().range([0, 400]).domain([0,200]);

If we now specify x values, we just use this function and get an eqivalent value in the right range. If we don’t know our maximum value for the domain, we can use the d3.max() function to calculate it based on the data set we want to display.

To add an axis to our bar chart, we can use the following function and call it on our SVG. To get it in the right position, we need to transform it below the chart.

[svg from above].call(d3.svg.axis().scale(xScale).orient("bottom"));

Now, we can also add interaction and react to user input. For example, we can give an alert, if someone clicks one our chart:

[svg from above].on("click", function () {
    alert("Houston, we get attention here!");
})

Adding a text node for each line, we get the following chart rendered in the browser:

Coding Example Result

If you would like to play around with it, here is the code: https://jsfiddle.net/Loco5ddt/

If you would like to see even more D3.js code, using the same data to display a pie chart and adding an update button, look at the following one: https://jsfiddle.net/4eqzyquL/

Data import
Finally, we can import our data in CSV, TSV or JSON format. To import a JSON file, for example, use the following code. Of course, you can also fetch your JSON via a server call instead of importing a static file.

d3.json("data.json", function(data) {
    [access your data using the data variable]
}

What else does D3,js offer?
Just to name a few, D3.js helps you with layouts, geometry, scales, ranges, data transformation, array and math functions, colors, time formating and scales, geography, as well as drag & drop.

There are a lot of examples online: https://github.com/mbostock/d3/wiki/Gallery

TL;DR
+ based on web standards
+ totally flexible
+ easy to debug
+ many, many examples online
+ Libaries build on D3.js (NVD3.js, C3.js or IVML)
– a lot of code compared to other libraries
– for standard charts too heavy

Learning more
As this blog post is based on a presentation held at the MunichJS Meetup, you can find the original slides here: http://slides.com/elisabethengel/d3js#/ The recording is available on youTube: https://www.youtube.com/watch?v=EYmJEsReewo

For further information, have a look at:

Cross Language Benchmarking Part 3 – Git submodules and the single-command cross language benchmark

In my recent blog posts (part 1, part 2) I have described in detail how to do micro benchmarking for Java and C/C++ with JMH and Hayai. I have presented a common execution approach based on Gradle.

Today I want to improve the overall project structure. Last time I already mentioned, that the project structure of the Gradle projects is not optimal. In the first part I will roughly repeat the main goal and proceedings from the past articles, secondly introduce some new requirements, and finally I will present you a more flexible module structure to split production code and benchmarks, which will then be embedded in a cross language super-project.
Continue reading

Developing a Modern Distributed System – Part II: Provisioning with Docker

As described in an earlier blog post “Bootstrapping the Project”, comSysto’s performance and continuous delivery guild members are currently evaluating several aspects of distributed architectures in their spare time left besides customer project work. In the initial lab, a few colleagues of mine built the starting point for a project called Hash-Collision:

Hash-collision's system structure

They focused on the basic application structure and architecture to get a showcase up and running as quickly as possible and left us the following tools for running it:

  • one simple shell script to set up the environment locally based on many assumptions
  • one complex shell script that builds and runs all services
  • hardcoded dependency on a local RabbitMQ installation

First Attempt: Docker containers as a runtime environment

In search of a more sophisticated runtime environment we went down the most obvious path and chose to get our hands on the hype technology of late 2014: Docker. I assume that most people have a basic understanding of Docker and what it is, so I will not spend too much time on its motivation here. Basically, it is a tool inspired by the idea of ‘write once, run anywhere’, but on a higher level of abstraction than that famous programming language. Docker can not only make an application portable, it allows to ship all dependencies such as web servers, databases and even operating systems as one or multiple well-defined images and use the very same configuration from development all the way to production. Even though we did not even have any production or pre-production environments, we wanted to give it a try. Being totally enthusiastic about containers, we chose the most container-like place we could find and locked ourselves in there for 2 days.

impact-hub-munich

One of the nice things about Docker is that it encourages developers to re-use existing infrastructure components by design. Images are defined incrementally by selecting a base image, and building additional functionality on top of it. For instance, the natural way to create a Tomcat image would be to choose a base image that already brings a JDK and install Tomcat on top of it. Or even simpler, choose an existing Tomcat image from the Docker Hub. As our services are already built as fat JARs with embedded web servers, things were almost trivial.

Each service should run in a standardized container with the executed JAR file being the sole difference. Therefore, we chose to use only one service image and inject the correct JAR using Docker volumes for development. On top of that, we needed additional standard containers for nginx (dockerfile/nginx) and RabbitMQ (dockerfile/rabbitmq). Each service container has a dependency on RabbitMQ to enable communication, and the nginx container needs to know where the Routing service resides to fulfill its role as a reverse proxy. All other dependencies can be resolved at runtime via any service discovery mechanism.

As a first concrete example, this is the Dockerfile for our service image. Based on Oracle’s JDK 8, there is not much left to do except for running the JAR and passing in a few program arguments:

FROM dockerfile/java:oracle-java8
MAINTAINER Christian Kroemer (christian.kroemer@comsysto.com)
CMD /bin/sh -c 'java -Dport=${PORT} -Damq_host=${AMQ_PORT_5672_TCP_ADDR} -Damq_port=${AMQ_PORT_5672_TCP_PORT} -jar /var/app/app.jar'

After building this image, it is ready for usage in the local Docker repository and can be used like this to run a container:

# start a new container based on our docker-service-image
docker run docker-service-image
# link it with a running rabbitmq container to resolve the amq dependency
docker run --link rabbitmq:amq docker-service-image
# do not block and run it in background
docker run --link rabbitmq:amq -d docker-service-image
# map the container http port 7000 to the host port 8000
docker run --link rabbitmq:amq -d -p 8000:7000 docker-service-image
# give an environment parameter to let the embedded server know it has to start on port 7000
docker run --link rabbitmq:amq -d -e "PORT=7000" -p 8000:7000 docker-service-image
# inject the user service fat jar
docker run --link rabbitmq:amq -d -e "PORT=7000" -v HASH_COLLISION_PATH/user/build/libs/user-1.0-all.jar:/var/app/app.jar -p 8000:7000 docker-service-image

Very soon we ended up with a handful of such bash commands we pasted into our shells over and over again. Obviously we were not exactly happy with that approach and started to look for more powerful tools in the Docker ecosystem and stumbled over fig (which was not yet deprecated in favor of docker-compose at that time).

Moving on: Docker Compose for some degree of service orchestration

Docker-compose is a tool that simplifies the orchestration of Docker containers all running on the same host system based on a single docker installation. Any `docker run` command can be described in a structured `docker-compose.yml` file and a simple `docker-compose up` / `docker-compose kill` is enough to start and stop the entire distributed application. Furthermore, commands such as `docker-compose logs` make it easy to aggregate information for all running containers.

fig-log-output

Here is an excerpt from our `docker-compose.yml` that illustrates how self-explanatory those files can be:

rabbitmq:
 image: dockerfile/rabbitmq
 ports:
 - ":5672"
 - "15672:15672"
user:
 build: ./service-image
 ports:
 - "8000:7000"
 volumes:
 - ../user/build/libs/user-1.0-all.jar:/var/app/app.jar
 environment:
 - PORT=7000
 links:
 - rabbitmq:amq

Semantically, the definition of the user service is equivalent to the last sample command given above except for the handling of the underlying image. The value given for the `build` key is the path to a directory that contains a `Dockerfile` which describes the image to be used. The AMQ service, on the other hand, uses a public image from the Docker Hub and hence uses the key `image`. In both cases, docker-compose will automatically make sure the required image is ready to use in the local repository before starting the container. A single `docker-compose.yml` file consisting of one such entry for each service is now sufficient for starting up the entire distributed application.

An Aside: Debugging the application within a Docker container

For being able to debug an application running in a Docker container from the IDE, we need to take advantage of remote debugging as for any physical remote server. For doing that, we defined a second service debug image with the following `Dockerfile`:

FROM dockerfile/java:oracle-java8
MAINTAINER Christian Kroemer (christian.kroemer@comsysto.com)
CMD /bin/sh -c 'java -Xdebug -Xrunjdwp:transport=dt_socket,address=10000,server=y,suspend=n -Dport=${PORT} -Damq_host=${AMQ_PORT_5672_TCP_ADDR} -Damq_port=${AMQ_PORT_5672_TCP_PORT} -jar /var/app/app.jar'

This will make the JVM listen for a remote debugger on port 10000 which can be mapped to any desired host port as shown above.

What we got so far

With a local installation of Docker (on a Mac using boot2docker http://boot2docker.io/) and docker-compose, starting up the whole application after checking out the sources and building all JARs is now as easy as:

  • boot2docker start (follow instructions)
  • docker-compose up -d (this will also fetch / build all required images)
  • open http://$(boot2docker ip):8080/overview.html

Note that several problems originate from boot2docker on Mac. For example, containers can not be accessed on `localhost`, but using the IP of a VM as boot2docker is built using a VirtualBox image.

In an upcoming blog post, I will outline one approach to migrate this setup to a more realistic environment with multiple hosts using Vagrant and Ansible on top. Until then, do not forget to check if Axel Fontaine’s Continuous Delivery Training at comSysto is just what you need to push your knowledge about modern deployment infrastructure to the next level.

Cross-language benchmarking – Gradle loves native binaries!

Last time I gave you an introduction to my ideas about benchmarking. I explained to you that comparing performance between different compilers and implementations is as old as programming ifself, and, above all, it is not that simple to setup as it sounds. If you don’t know what I write about, have a look at the former post on this topic. There, I explained, how to setup a simple Gradle task which runs JMH benchmarks as part of a Gradle task chain. However, the first article is no prerequisite at all for this article!

Today I want to start from the other side. As a C++ developer who wants to participate in a performance challenge, I want

  • a framework that outputs numbers that can be compared with other benchmark results
  • to run all benchmarks at once
  • to compare my results with other implementations or compiler assemblies
  • to execute all in one task chain

Continue reading

Cross-language benchmarking made easy?

There is this eternal fight between the different programming languages. “The code in ‘XYZlang’ runs much faster than in ‘ABClang'”. Well, this statement bears at least three misunderstandings. First of all, in most cases it is not the source code you wrote, that gets actually executed, second thing, please define “faster” upfront, third thing, and general rule of experiments: Do not draw conclusions based on a benchmark, see them more of a hint that some things seem to make a difference.

In this article series, I will not discuss about numbers and reasons, why code X or language Y supersedes the other. There are many people out there who understand the backgrounds much better than me – here, short hint to a very good article about Java vs. Scala benchmarking by Aleksey Shipilëv. There will be no machine code, no performance tweaks which make your code perform 100 times better.  I want to present you ideas, how you can set such micro benchmarks up in a simple, automated and user friendly way. In detail we will come across these topics:

  • How to setup one build that fits all requirements?
  • Gradle in action building across different languages
  • Benchmarking with JMH (java) and Hayai (C/C++) to proof the concept
  • How to store the results?

Continue reading