Developing a Modern Distributed System – Part I: Bootstrapping the Project

A few months ago, our performance and continuous delivery guild decided to gain more hands-on experience with distributed software architectures. As companies like Twitter or Netflix have open-sourced a lot of components from their software stacks this seemed a great idea to get started. A good introduction is a blog post about the Twitter software stack. However, we did not want to stare at architecture diagrams but rather getting our hands dirty and build something ourselves. Do you remember the good old days when Java Pet Store was new and fancy? We needed something similar for modern distributed architectures and finally settled to build a clone of the popular programmer Q&A site Stackoverflow: Hash-Collision was born. With Hash-Collision we want to address different issues in distributed systems such as:

  • Decomposition of the application into individual services
  • Bootstrapping of the whole system
  • Routing
  • Distributed service communication
  • Security among services
  • UI integration

As our lab idea sparked quite some interest at comSysto we now have multiple teams working on different aspects. In the first lab we concentrated on getting some minimalistic services running. We also decided on the JVM as a common runtime environment for all services to leverage the same tool chain and to simplify setup. However, at a later point we may decide to replace an existing service by a language that does not run on top of the JVM just for the sake of it.

System Structure

We decomposed Hash-Collision into fine-grained services which can be scaled independently. The system is decomposed into these verticals:

  • qa: handles discussions composed of questions and answers.
  • user: handles user data management
  • auth: handles credentials, authentication and authorization.

On top of these services is a UI which integrates services via their (REST) API endpoints: Hash-Collision's system structure

Services

We want to provide completely self-contained services so WAR deployment wasn’t an option. After some research we stumbled across Spark and immediately liked its simplicity. Here is a echo server:


import static spark.Spark.*;

public class Echo {
    public static void main(String[] args) {
        get("/echo", (req, res) -> req.queryParams("msg"));
    }
}

Each service provides some part of the domain logic, has its own data storage and exposes an API endpoint via HTTP. For simple deployment, it is contained in a single JAR that can easily be started from the command line.

Service Communication

For simplicity we settled for HTTP as communication protocol for API endpoints for the time being. However, in later versions we might switch to a binary protocol like Thrift. For asynchronous service communication we use AMQP with Msgpack encoding.

Routing

We decided to use Camel to implement a basic load balancing and routing component for the service layer. Service instances are registered dynamically by the router. A service instance announces itself by periodically sending heartbeat messages, so the router can keep track of running instances that are available. However, we currently run into availability problems when a service instance has just died and is not yet removed from the load balancer’s node list.

UI

Hash-Collision

In Microservices: Decomposing Applications for Deployability and Scalability Chris Richardson describes different options for UI integration: Among them are calling services directly from the UI or using an API gateway. While calling services is easier to begin with, it can get unwieldy and a complex mess quite fast. Apart from that, the client issues many fine-grained HTTP requests which can decrease overall performance. Although Hash-Collision calls services directly from the UI due to implementation time constraints, we definitely recognize the benefits of an API gateway and will revisit the implementation later.

Service APIs and Shared Code

In every project there is a need to share code. The development team of the German retailer Otto describes in an article about their shop architecture (PDF) that they avoid company-internal shared code altogether. Instead they release such code always as open-source projects and integrate them. Due to the small amount of shared code, Hash-Collision currently uses a common project which is included in the build of individual service modules. In later incarnations we might as well separate the development cycles and may release shared code into an internal Maven repository. Shared code is also needed for service APIs. Service providers and consumers share a common API which also consists of transport objects. Some common approaches are:

  • Code duplication
  • Shared interface description language file and code generation
  • Shared code in the target language (client API)

Conclusion and Next Steps

In this lab we have built a very simple technical prototype. Especially operations-wise we have made quite some compromises such as simple local builds or provisioning via shell scripts. Next, another team will tackle:

  • Continuous delivery of services
  • Provisioning and system setup with tools like Vagrant or Docker

Development-wise we’d like to address:

  • Reduce the number of client requests by introducing an API gateway
  • Enable distributed tracing and monitoring to understand how requests flow through the overall system
  • Improve resilience
  • Add real service persistence

Stay tuned for more blog posts as we improve Hash-Collision.

Finally I would like to recommend a 2-days Training on Architecting for Continuous Delivery and Zero Downtime with Axel Fontaine, which will take place on 26th + 27th of January 2015 at comSysto. Read more about it by clicking at the link or sign up immediately HERE!

 

A first timer’s experience at Strata Europe

I am totally overwhelmed by the impressions I got as a first timer at a Strata Europe conference, which was this year from 19 November to 21 November in Barcelona. The seven(!) different tracks included talks of high quality covering topics from a theoretical standpoint as well as an architectural and tooling view. The most interesting talks for me personally however were those in which the speakers shared their experiences about the real world application of the presented concepts.

_93A9821

As a perfect starting point I joined the D3 tutorial by Sebastian Gutierrez (DashingD3js.com). There I learned the basics of how D3 works, which enables me now to understand and possibly customize all the libraries that are built on top.

Following up with the talk of Simon Worgan (Jagex Ltd) & Samuel Kerrien (RESEREC) I was intrigued by their presentation of a bi-directional recommender system. This is definitely a topic I am following up on given that I played around with recommendation myself (see my earlier blog post about collaborative filtering).

Inspired by the slide deck of Dan McKinley about Data driven products at Etsy I was happy to join Melissa Santos’ talk about Etsy’s way of making data accessible to EVERYONE in the company. She shared stories of how product managers stopped the development team from shipping code that did not include the respective tracking functionality because the product managers themselves write Scalding (Hadoop) jobs to get data about the shipped features.

As a fan of Google Docs I was surprised to find out that it is possible to use Google Docs as a front-end of a Data Warehouse and thus enabling management to get daily updates of the most important metrics directly from the source – thank you Aaron Frazer (Seeking Alpha) for sharing this!

I collected more knowledge about data visualization: from a tooling perspective via a talk about ggvis by Garrett Grolemund (RStudio) and from a communication viewpoint via a talk about storytelling and animation by Michael Freeman (Institute for Health Metrics and Evaluation, University of Washington). The latter showed the following impressing data visualization: U.S. Gun Deaths.

I also learned about Duke, an open source tool to deduplicate data with a “self configuration” option, and about graphlab, a commercial tool that tries to close the gap between prototyping and productionizing predictive apps (which seemed worth trying).

Of course the sponsors brought all kinds of nice goodies but the most useful ones – in my opinion – were  the O’Reilly books signed by the authors. Depicted here is Ellen Friedman from our partner company MapR signing the book “Time Series Databases” (available for download here).

_93A9568

All these impressions completely reinforced my reasons of attending conferences: Everytime I go to a conference (now it’s tech related, during my studies it was AIESEC related) I come out with even more inspiration, motivation, knowledge and contacts. With this spirit lifting I wish I could go to a conference every week, however that would be too time consuming. Good thing that there are meetups in which I can get close to this conference feeling.

I’m looking forward to the meetups about Big Data in Munich (see http://bigdata.comsysto.com/). Feel free to join me!

Java 8 Collectors for Guava Collections

Java 8 comes with streaming API, it divides data processing into two phases: intermediate operations and terminal operation.Terminal operation can have few different forms and I would like to concentrate on reduction to collections, especially to Guava immutable collections. Terminal operation requires collector which will collect data and return it as required structure, but Guava does not provide such collector. In order to create Guava collection out of a stream we have to first reduce stream result into temporary collection and than transfer it:

Continue reading

cS at Mind the product 2014 #mtpcon

When I started my journey to “mind the product conference”, I was looking forward to the pubs. I knew that this was Europe’s biggest product conference, perhaps even the world’s largest one. I expected it to be a great event with many speakers and visitors from around the world. However, I was not really sure what to expect from the conference or the kind of educational experience it could offer me.

The mtp workshops

The day before the conference I joined the workshops. I’ve been particularly looking forward to the half-day workshop “Analytics and Testing” with Craig Sullivan. It’s known that most companies create their products solely based on assumptions without prior investigation and within the scope of user tests. The consequences that follow are wasted development investments, frustrated users and lost time to generate early competitive advantage. In Craig’s Workshop, UX, Analytics, Testing and Innovation were some of the topics discussed. Furthermore we discussed the importance of Copywrite-Testing. A variety of informative practice examples were also presented. In order to provide you with some practical examples from our projects, another blog on this topic would be recommendable.

My highlights from the #mtpcon

Continue reading

Spring Boot – my favorite timesaving conventionenabling autoconfigcreating beanmaking classpathshaking microcontainer

One of the common misconceptions when it comes to Spring based Java applications is that these require a sheer amount of configuration before one can even start working on the actual domain problem that the application is suppose to solve. This is mainly because of XML configurations that were greatly reduced with annotations already. But still, if you want to set-up a web application as quickly as possible without Spring (XML) configuration files you need to download and configure a web server, set-up a database connection, then write all the required beans, persistence.xml for hibernate, web.xml, etc. Since what you actually wanted was to code the solution to your very own problem you start asking yourself whether it really has to be so complicated!?

Continue reading

Eberhard Wolff zeigt wie’s geht – Die Spring Master Class in Berlin

berlin

Vom 24. bis 26. September 2014 fand erstmalig eine Schulung im Themenumfeld “Spring Framework” statt, die sich abseits der Platzhirsche VMWare und FastLane zu platzieren versucht. Angeboten und organisiert wird das hands-on Training von der Firma Gedoplan, die den Spring Evangelisten und Java Champion Eberhard Wolff für diese Veranstaltung in Berlin gewinnen konnte.

Wie der Titel bereits andeutet, zielen die Inhalte auf die Vermittlung von weiterführenden Konzepten in der Verwendung des Spring Framework ab. Ein guter Ausgangspunkt ist der Kenntnisstand der Spring Core Schulung und ca. zwei bis drei Projekte Praxiserfahrung. Diese helfen dabei, die verschiedenen Fragestellungen, die im Verlauf der Schulung aufgrund der extremen Flexibilität des Frameworks entstehen, aufzugreifen und einzuordnen.

Da Spring seiner Umgebung und dem Entwickler quasi keine Bedingungen auferlegt, haben Projekte, die darauf basieren, anfangs ein sehr hohes Entwicklungstempo und viel Prototyping Charakter, da viel ausprobiert wird. Ab einem gewissen Punkt kann diese Flexibilität zur Last werden wenn sich das Projekt hin zu einem Mix aus vielen verschiedenen Ansätzen entwickelt hat. Um dem entgegen zu wirken, empfiehlt es sich in einem frühen Stadium ein paar Konventionen festzulegen, die in der Regel wiederum von Spring belonht werden. Zu diesen Konzepten gehören ansich ganz natürliche und selbsterklärende Maßnahmen, deren eigentliche Macht spielen dann AOP Techniken und verschiedene Scanning Techniken aus.

So erarbeitete der Lehrgang zum Beispiel in einigen Übungen ein System das verschiedenen Ansätzen folgt, um eine 3 Tier Anwendung nicht nur in ihre technischen Schichten zu strukturieren sondern auch fachlich also vertikal in verschiedene Slices aufzuteilen. Über einfache aber oft unterschätzte Hilfsmittel wie Facaden oder die Sichtbarkeit von Klassen und Interfaces definierten wir klare Schnittstellen, ohne dass wir aufwendig mehrere jar Module erstellen mussten. Methoden zur Einbettung der Komponenten in unterschiedlichen Umgebungen zeigte, wie leicht mit Spring Boardmitteln zum Beispiel eine Anwendung für den Betrieb in einerseits einem JEE Application Server aufgesetzt werden konnte, mit wenigen Handgriffen aber auch ohne Probleme in einem Tomcat oder einer normalen SE jar runtime auskommen konnte, und das nur durch geschickte Strukturierung der Konfiguration. Das selbe Konzept greift natürlich für Spring-basierte JUnit/TestNG Tests, für die normalerweise nicht produktive Datenbanken verwendet werden. Also wird kurzerhand ein Teil der Konfiguration ausgetauscht. Konventionen und Struktur sind für solche Ansätze unerlässlich.

Wie bei Spring üblich gab es in allen besprochenen Ansätzen kein richtig oder falsch. Der Spring XML Konfiguration mit ihren mächtigen Namespaces wurde genauso Rechnung getragen wie der Möglichkeit, das gesamte Setup in reinem Java zu definieren. Als wichtigster Punkt wurde aber angeführt, sich für bestimmte Konzepte zu entscheiden und an diesen dann auch, so weit wie es geht, festzuhalten.

Ein neuer Big Player im spring.io Portfolio kam in dieser Schulung ebenfalls nicht zu kurz. Das Spring Boot Projekt vereint sehr viele Best Practices und sinnvolle Konventionen aus über 10 Jahren Spring unter einer Haube und löst so eine kleine Revolution in der Spring Welt aus. Der Spring Boot Aha Effekt für Entwickler wird deutlich, wenn man das Parade Code Beispiel von der Projekt Website unter die Lupe nimmt. Mit diesen wenigen Zeilen Code kann man ein normales jar File erzeugen, aus dem eine komplette Spring Webanwendung erwachsen kann:

package hello;

import org.springframework.boot.*;
import org.springframework.boot.autoconfigure.*;
import org.springframework.stereotype.*;
import org.springframework.web.bind.annotation.*;

@Controller
@EnableAutoConfiguration
public class SampleController {
    @RequestMapping("/")
    @ResponseBody
    String home() {
        return "Hello World!";
    }
    public static void main(String[] args) throws Exception {
        SpringApplication.run(SampleController.class, args);
    }
}

Diese Java Klasse hat fast genauso viele Zeilen Annotationen wie Code Zeilen. Das bringt die Spring Boot Magie ins Spiel. Im Hintergrung sorgen eine Vielzahl an Scannern, JavaConfig Klassen und “Good Guesses” für einen reibungslosen Start der Anwendung. Welche Java Libraries generell in der Applikationen verwendet werden sollen, gibt der Entwickler wie gewohnt per Maven oder Gradle an, hat dabei aber nun die Möglichkeit, auf vorgefertigte Spring Boot Dependencies zuzugreifen, die wie ein Plugin System automagisch Features hinzufügen, teilweise sogar ohne dass eine Zeile an der Klasse geändert werden muss. Wer noch mehr Details über Spring Boot erfahren möchte, dem sei dieser Spring Boot Artikel empfohlen.

Spring Boot macht hierbei vieles vor, was an vielen Stellen immer wieder angesprochen wird, Konventionen und Konsistenz erleichtern viele Dinge im täglichen Umgang mit Spring.

Hier noch einmal die besprochenen Themen in der Kurzzusammenfassung:

  • Infrastruktur Konfiguration
  • Pattern und Best Practices aus großen Spring Anwendungen
  • Spring AOP und AspectJ Support
  • weitere Spring Projekte wie  Integration, Security, Web Services und Batch im Einsatz

Zusammenfassend lässt sich die Schulung thematisch als durchweg gelungen und sehr gut organisiert bezeichnen. Auch wenn hier und da vielleicht nicht das letzte Spring Interna herausgequetscht wurde (was auch nicht Ziel dieser Schulung ist), so boten die drei Tage ein sehr gutes Hands-On Training. Für alle, die wissen wollen, wie große und komplexe Spring Anwendungen bereits in den Kinderschuhen richtig auf den Weg gebracht werden können, oder was bei einer Konsolidierung am ehesten angegangen werden sollte um am Ende des Tages ein einfach zu konfigurierendes und dennoch flexibles Stück Software ausliefern zu können, ist diese Schulung genau das Richtige.

 

How to create your own ‘dynamic’ bean definitions in Spring

Recently, I joined a software project with a Spring/Hibernate-based software stack, which is shipped in an SAAS-like manner, but the databases of the customers need to be separated from each other. Sounds easy for you? Ok let’s see what we have in detail.

1. Baseline study

Requirement: There is a software product that can be sold to different customers, but the provider wants to keep the sensible data of the customer in separate data sources. Every customer/login has only access to exactly one data source.

Continue reading