cloud-dashboard

Ice cream sales break microservices, Hystrix to the rescue

In November 2015, we had the opportunity to spend three days with a greenfield project in order to get to know Spring Cloud Netflix. At comSysto, we always try to evaluate technologies before their potential use in customer projects to make sure we know their pros and cons. Of course, we had read about several aspects, but we never really got our hands dirty using it. This had to change!

Besides coming up with a simple scenario that can be completed within a few days, our main focus was on understanding potential problems in distributed systems. First of all, any distributed system comes with the ubiquitous problem of failing services that should not break the entire application. This is most prominently addressed by Netflix’ “Simian Army” which intentionally breaks random parts of the production environment.

However, we rather wanted to provoke problems arising under heavy load due to capacity limitations. Therefore, we intentionally designed a distributed application with a bottleneck that turned into an actual problem with many simultaneous requests.

Our Use Case

Our business case is about an ice selling company, which is acting on worldwide locations. On each location there are ice selling robots. At the company’s headquarters we want to show an aggregated report about the ice selling activities for each country.

All our components are implemented as dedicated microservices using Spring Boot and Spring Cloud Netflix. Service discovery is implemented using Eureka server. The communication between the microservices is RESTful.

architecture

Architecture overview of our distributed system with the deployment setup during the experiments.

There is a basic location-service, which knows about all locations provided with ice-selling-robots. The data from all these locations has to be part of the report.

For every location, there is one instance of the corresponding microservice representing an ice-selling-robot. Every ice-selling-robot has locally stored information about the amount of totally sold ice cream and the remaining stock amount. Each of them continuously pushes this data to the central current-data-service. It fails with a certain rate, which is configured by a central Config Server.

For the sake of simplicity, the current-data-service stores this information in-memory. Every time it receives an update from one of the ice-selling-robots, it takes the new value and forgets about the old one. Old values are also forgotten if their timestamp is too old.

The current-data-service offers an interface by which the current value for the totally sold amount of ice cream or the remaining stock amount can be retrieved for one location. This interface is used by an aggregator-service, which is able to generate and deliver an aggregated report on demand. For all locations provided by the location-service the current data is retrieved from the current-data-service, which is then aggregated by summing up the single values from the locations grouped by the locations’ country. The resulting report consists of the summed up values per country and data type (totally sold ice cream and remaining stock value).

Because the connection between aggregator-service and current-data-service is quite slow, the calculation of the report takes a lot of time (we simply simulated this slow connection with a wifi connection, which is slow in comparison with an internal service call on the same machine). Therefore, an aggregated report cache has been implemented as fallback. Switching to this fallback has been implemented using Hystrix. At fixed intervals the cache is provided with the most current report by a simple scheduled job.

The reporting service is the only service with a graphical user interface. It generates a very simplistic html-based dashboard, which can be used by the business section of our company to get an overview of all the different locations. The data presented to the user is retrieved from the aggregator-service. Because this service is expected to be slow and prone to failure, a fallback is implemented which retrieves the last report from the aggregated-report-cache. With this, the user can always request a report within an acceptable response time even though it might be slightly outdated. This is a typical example for maintaining maximum service quality in case of partial failure.

report

The reporting “dashboard”.

We used a Spring Cloud Dashboard from the open source community for showing all registered services:

cloud-dashboard

Spring Cloud Dashboard in action.

The circuit-breaker within the aggregator-service can be monitored from Hystrix dashboard.

Screen Shot 2015-12-30 at 22.22.26

Hystrix dashboard for reporting service under load. All circuits are closed, but 19% of all getReport requests failed and were hence successfully redirected to the cached version.

Understanding the Bottleneck

When using Hystrix, all connectors to external services typically have a thread pool of limited size to isolate system resources. As a result, the number of concurrent (or “parallel”) calls from the aggregator-service to the report-service is limited by the size of the thread pool. This way we can easily overstress the capacity for on-demand generated reports, forcing the system to fall back to the cached report.

The relevant part of the reporting-service’s internal declaration looks as depicted in the following code snippet (note the descriptive URLs that are resolved by Eureka). The primary method getReport() is annotated with @HystrixCommand and configured to use the cached report as fallbackMethod:

@HystrixCommand(
 fallbackMethod="getCachedReport",
 threadPoolKey="getReportPool"
)
public Report getReport() {
 return restTemplate.getForObject("http://aggregator-service/", Report.class);
}

public Report getCachedReport() {
 return restTemplate.getForObject("http://aggregated-report-cache/", Report.class);
}

In order to be able to distinguish primary and fallback calls from the end user’s point of view, we decided to include a timestamp in every served report to indicate the delta between the creation and serving time of a report. Thus, as soon as the reporting-service delegates incoming requests to the fallback method, the age of the served report starts to increase.

Testing

With our bottleneck set up, testing and observing the runtime behavior is fairly easy. Using JMeter we configured a testing scenario with simultaneous requests to the reporting-service.

Basic data of our scenario:

  • aggregation-server instances: 1
  • test duration: 60s
  • hit rate per thread: 500ms
  • historize-job-rate: 30s
  • thread pool size for the getReport command: 5

Using the described setup we conducted different test runs with a JMeter thread pool size (=number of concurrent simulated users) of 3, 5 and 7. Analyzing the served reports timestamps leads us to the following conclusion:

Using a JMeter thread count below the size of the service thread pool results in a 100% success rate for the reporting-service calls. Setting sizes of both pools equal already gives a small noticeable error rate. Finally, setting the size higher than the thread pool results in growing failures and fallbacks, also forcing the circuit breaker into short circuit states.

Our measured results are as follows (note that the average report age would be 15s when always using the cached version given our historize-job-rate of 30s):

  • 3 JMeter threads: 0,78s average report age
  • 5 JMeter threads: 1,08s average report age
  • 7 JMeter threads: 3,05s average report age

After gaining these results, we changed the setup in a way that eliminates the slow connection. We did so by deploying the current-data-service to the same machine as the aggregation-service. Thus, the slow connection has now been removed and replaced with an internal, fast connection. With the new setup we conducted an additional test run, gaining the following result:

  • 7 JMeter threads, fast network: 0,74s average report age

By eliminating one part of our bottleneck, the value of report age significantly drops to a figure close below the first test run.

Remedies

The critical point of the entire system is the aggregation due to its slow connection. To address the issue, different measures can be taken.

First, it is possible to scale out by adding additional service instances. Unfortunately, this was hard to test given the hardware at hand.

Second, another approach would be to optimize the slow connection, as seen in our additional measurements.

Last but not least, we could also design our application for always using the cache assuming that all users should see the same report. In our simplistic scenario this would work, but of course that is not what we wanted to analyze in the first place.

Our Lessons Learned

Instead, let us explain a few take-aways based on our humble experience of building a simple example from scratch.

Spring Boot makes it really easy to build and run dozens of services, but really hard to figure out what is wrong when things do not work out of the box. Unfortunately, available Spring Cloud documentation is not always sufficient. Nevertheless, Eureka works like a charm when it comes to service discovery. Simply use the name of the target in an URL and put it into a RestTemplate. That’s all! Everything else is handled transparently, including client-side load balancing with Ribbon! In another lab on distributed systems, we spent a lot of time working around this issue. This time, everything was just right.

Furthermore, our poor deployment environment (3 MacBooks…) made serious performance analysis very hard. Measuring the effect of scaling out is nearly impossible on a developer machine due to its physical resource limitations. Having multiple instances of the same services doesn’t give you anything if one of them already pushes the CPU to its limits. Luckily, there are almost infinite resources in the cloud nowadays which can be allocated in no time if required. It could be worth considering this option right away when working on microservice applications.

In Brief: Should you use Spring Cloud Netflix?

So what is our recommendation after all?

First, we were totally impressed by the way Eureka makes service discovery as easy as it can be. Given you are running Spring Boot, starting the Eureka server and making each microservice a Eureka client is nothing more than dependencies and annotations. On the other hand, we did not evaluate its integration in other environments.

Second, Hystrix is very useful for preventing cascading errors throughout the system, but it cannot be used in a production environment without suitable monitoring unless you have a soft spot for flying blind. Also, it introduces a few pitfalls during development. For example, when debugging a Hystrix command the calling code will probably detect a timeout in the meantime which can give you completely different behavior. However, if you got the tools and skills to handle the additional complexity, Hystrix is definitely a winner.

In fact, this restriction applies to microservice architectures in general. You have to go a long way for being able to run it – but once you are, you can scale almost infinitely. Feel free to have a look at the code we produced on github or discuss whatever you are up to at one of our user groups.

Spark Summit Europe Amsterdam

comSysto at Spark Summit Europe

At the end of October 2015 the first European Spark Summit took place at the Beurs van Berlage center in Amsterdam. The conference was the third of its kind this year dedicated to Apache Spark. Four of comSysto’s engineers traveled to Amsterdam for three intense days of Spark. This post summarizes highlights from the training and talks, as well as some of our general thoughts about Spark.

comsysto at spark europe

comSysto at Spark Summit Europe Amsterdam 2015

Keynotes

There were a total of 9 keynotes over two days, here our favorites:

Matei Zaharia the creator of Spark gave a state-of-the-union keynote focusing on the rapid adoption and overall growth of Spark as an Apache Foundation project. Spark now has over 600 contributors and is one of the most active Apache projects. 51% of Spark users are deploying in the cloud. Python popularity as a Spark language grew by 20% and people are also picking up R as a fourth language choice. The introduction of the new DataFrame API was the main challenge this year, more performance optimizations are coming with Project Tungsten. Zaharia also gave a peek at the upcoming Spark 1.6 features: mainly a type-safe DataFrame API named Dataset API, the integration of DataFrames into the Spark Streaming and GraphX APIs and more Tungsten features (in-memory cache, SSD storage).

Martin Odersky gave a keynote on Spark being the “ultimate scala collections”. Spark is an example of a Scala DSL that defines lazy collection operations and adds pairwise operations (e.g. reduceByKey). Scala will adopt some of the concepts, such as collection views, cachable collections and pairwise operations on sequence of pairs as a result of Spark using them extensively. On the other hand Spark can benefit from Scala’s rich type system as well as the upcoming Spores feature for compile-time check of closure captures that might get distributed across nodes. There is obviously a lot of exchange between the two communities which both can benefit from.

Talks

Magellan: Geospatial Analytics on Spark

It is promising to see a library addressing the handling of geospatial data and operations in Spark. There are many libraries available for encoding, parsing and storing geospatial data in various formats, however when trying to express more advanced operations such as geospatial joins, unions or intersections in a distributed fashion you were on your own. Spatial operations will often involve a join of multiple geospatial layers which maps well to RDD operations. Magellan provides optimized geospatial predicates and operations on top of Spark’s DataFrame API. For primitive spatial operations it depends on ESRI’s Geometry API and it aims at implementing the OpenGIS Simple Feature for SQL API.

Streaming Analytics with Spark, Kafka, Cassandra and Akka

Helena Edelson gave a presentation on rethinking classical data processing architectures to meet the flood of data faced with today. LinkedIn for example generates 2.5 trillion events per day amounting to 1 Petabyte of streaming data. The Lambda Architecture style provides guidelines for handling both batch and stream processing of massive datasets, however implementing is still hard. Edelson discussed some technology choices for implementing different aspects of Lambda: Spark/Scala for distributed computing, Mesos for cluster resource management, Akka for concurrent and fault-tolerant application logic, Cassandra for distributed data storage and Kafka for real-time ingestion of streaming data: the SMACK stack. The colocation of Cassandra and Spark nodes for data locality especially seems like a good choice. Code for her reference application killrweather can be found on Github.

Spark DataFrames

Michael Armbrust from Databricks talked about Spark’s DataFrame API and its integration with Spark ML. A DataFrame is a distributed collection of rows organized into named columns and a unified interface for interacting with data in Scala, Java, Python or R. The main advantage of DataFrames over RDDs is Spark’s ability to optimize program execution. Since DataFrames provide more information on the structure of the data, usually better performance can be achieved by optimization compared to regular RDDs. Also user defined functions are language agnostic: for example, user defined Python functions are no longer sent to worker nodes and executed using a slower Python interpreter. Regarding integration with Spark ML, a more streamlined version of MLlib built on top of the DataFrame API was presented. Databricks also introduced Spark ML Pipeline abstraction: A practical machine learning pipeline often involves a sequence of data pre-processing, feature extraction, model fitting, and validation stages. This had to be done manually and was error prone. Spark ML Pipelines provide an abstraction for those common data processing steps. It is nice to see that the programming interface matured and we think we will see plenty of new features in the upcoming releases.

Productionizing Spark and the Spark Job Server

The talk by Evan Chan focused on setting up and tuning Spark clusters and how to avoid common pitfalls: from choosing the right cluster mode to debugging Spark applications and collecting Spark context metrics. Another step towards making Spark production ready is using the Spark Job Server, which turns a Spark cluster into a “cluster as a service” by adding a REST management interface. Spark Job Server provides its own metadata store for storing and sharing jobs, configurations and job jars. It sits on top of your streaming or batch workloads and manages jobs and Spark contexts for you. Since the Job Server is creating the context, an existing Spark context can be re-used or a new one can be created, allowing for low latency queries and RDD sharing among jobs. Security, Authentication and all cluster managers are supported. Spark Job Server also found its way into the latest DataStax Enterprise distribution.

Spark Training

On the first day Databricks offered four training sessions on Spark in parallel. We chose the “Data Science with Apache Spark” training by Jon Bates since our main use cases include exploratory data analysis and machine learning. Offering a training at that scale (hundreds of participants) is definitely a challenge, however it was well executed. Databricks provided access to their cloud platform for all participants which gave everyone the opportunity to use browser-based “notebooks” for exploration and execution of lab code against their own Spark clusters in the cloud (AWS). Compared to small scale trainings there were obviously less opportunities to ask questions and the pace of presentation and amount of the material was tremendous: there was a lot to digest. However the quality of the tutorial content and the opportunity to continue to use the platform for some weeks after the training made up for that.

Conclusion

Spark is a promising tool for handling all kinds of large-scale data processing tasks which are getting more and more common at companies across all industries. IBM calls Spark “Potentially the Most Significant Open Source Project of the Next Decade” and commits to Spark by investing $300 million over the next few years and by assigning more than 3,500 researchers and developers to work on Spark-related projects. Microsoft for instance is using Spark and Cassandra to process over 10TB of event data per day from its Office 365 products. The diverse ecosystem of languages and tools offered by Spark is definitely a unique feature, making the switch from exploratory data analysis to application development a lot smoother. Deploying complete stacks (such as SMACK) on a computing cluster or in the cloud seems challenging at the moment. The current focus lies on explorative tools (notebooks) and languages (Python) tailored towards data scientists as well as deployment topics. Discussions on developing full-stack applications and integrating Spark in existing systems, however, are still rare.

At comSysto we explore Spark during our labs, at data science challenges and by implementing prototypes. For data intensive projects and for implementing lambda architectures we currently regard Spark as one of the primary options.

You want to shape a fundamental change in dealing with data in Germany? Then join our Big Data Community Alliance!

minesweeper-preview

ReactJs – Minesweeper App

Minesweeper is well known and quite popular computer game from ’90. For purpose of this blogpost I decided to revive it in ReactJs technology.

ReactJs is JavaScript library providing a view for data rendered as HTML. React offers a model in which subcomponents cannot directly affect enclosing components (“data flows down”), and is also very effective in refreshing a view when data is changed. Its “virtual DOM” feature allows this framework to only render parts of view that actually need to be changed and that is a reason why ReactJs provides fastest reactions on user inputs for great user experience in front-end applications.

MVC architecture

Best practice when using ReactJs is to create a hierarchy with one master component on the top. That component presents the application itself and holds the global state of application. In MVC architecture, that component will be used as a controller and it will handle changes on a model state and trigger a rendering when necessary.

Creating a good model is a key step in developing any good application. Model will hold the data that represents the application current state. Each object should represent one specific element and should only contain at least outer dependency as necessary. Simpler model allows you easier data handling when changes occur.

View is implemented by creating low level React components that can be reused on multiple places in application. Each view component always represents a single model object and it contains that object as a property of a view. Each view component is also responsible for catching html events and forwarding them to controller. In order to be able to handle such events from input devices, each input device is also represented in model. In Minesweeper case it is the MouseModel that holds informations about  current mouse state and alerts controller when actions are made.

Image 1:  Minesweeper – MVN architecture

minesweeper-mvc-arhiecture

Handling state changes

Controller contains a whole global state of application. It reacts on state changes of input devices, delegates actions to a game model and updates application state when necessary. In order to be able to update application state, the controller must be aware of any state change on the model. For that purpose ‘event listener’ pattern is implemented on model.

Each model component implements interface that contains two methods: addEventHandler(eventHandler) and fireEvent(eventName, event). Each model object set an event handler callback to each of its children. So, when model object changes its state it can fire a ‘stateChanged’ event. Its parent will handle that event and fire the same event to his parent and so on and so on until event reach the controller on top of the hierarchy. Controller then handles that event by updating global state and starting render() procedure that will update the view.

Image 2:  Minesweeper – handling state changes 

minesweeper-events

This application was created as a full front-end application and requires only a browser to run. It was developed for Chrome browser and was not additionally adapted to other browsers. It was developed as full front-end but it also offers an option of adding a scoreboard that require a back-end REST service with some kind of persistence.

Live demo is available on:  http://dobilinovic.comsysto.com/minesweeper/

Source code is available on gitHub: https://github.com/Obee88/minesweeper-react

Hope you enjoy it!

Davor Obilinović

 

Interview with Talip Ozturk – founder & CTO from Hazelcast

Bildschirmfoto 2015-11-11 um 15.31.48Recently we had a very special guest in our Lightweight Java User Group. Talip Ozturk is the founder & CTO of Hazelcast. He is a pioneering startup founder having created the first Silicon Valley startup from Turkey. He has been working with enterprise Java since 1999. Talip gave a nice talk on Distributed Caching with JCache and beyond. After his talk we took the opportunity to ask a couple questions about him and his work.

Check out the interview with Talip Ozturk below:

You gave the talk “Distributed Caching with JCache and Beyond” on Tuesday, October 3rd in our Lightweight Java User Group.

What’s your impression of the Lightweight Java User Group & comSysto?

Talip Ozturk: Very smart and interactive group of developers. I love it.

What is so special about JCache?

Talip Ozturk: It standardizes the caching in Java. No need to learn a new caching API. We are now free to switch to any caching provider. Life is much easier this way.

How did you come up with the idea to establish Hazelcast?

Talip Ozturk: I thought it would be really cool if we had distributed implementations of all data structures and services available in java concurrency package. This way we can build distributed applications easily.

Did you always dream of creating your own company?

Talip Ozturk: Yes, since I was in college. I was curious to see how far the rabbit hole goes.

What makes Hazelcast different from its competitors?

Talip Ozturk: Its elegant design from its API to implementation to packaging and also the fact that there is great community around the product. It is built to give “feels right” feeling.

In which direction is your company headed for 2016?

Talip Ozturk: Out new hot-restart persistence is a big deal for us. Also we are working on to make Hazelcast more cloud friendly through better integration with Openshift, Docker, Cloud Foundry etc.

If you could describe yourself in only 5 words, what would they be?

Talip Ozturk: passionate puzzled curious mind

Where do you see yourself in 5 years?

Talip Ozturk: building new technology products

What would you say is so special about your job?

Talip Ozturk: understanding customer/user experience.

What event did you most recently visit and why did you visit this event?

Talip Ozturk: JavaOne, it is great place to demo Hazelcast and meet developers.

How did you come across our meetup user group?

Talip Ozturk: Through Java community. We try to attend all active Java user groups.

What do you do in your free time if you don’t write codes?

Talip Ozturk: Playing soccer. Spending time with my kids.

Imagine you’d be the king of the java world for one day, what would you try to change?

Talip Ozturk: I would ask Java Unsafe to be made ‘safe’.

When and where can we expect to see you again? :)

Talip Ozturk: Another Java User Group or Conference

If you missed out on Talip’s talk please check out the video below:

The slides to Talip’s talk are here available.

If you have a great topic and would like to share it with the Lightweight Java User Group, please contact us here.

 

 

 

Anatomy of a large Angular application

Do I really need a strategy?

Yes.

A fresh application always starts out as that one application that’s going to be designed for easy maintenance and development.
Unfortunately, it’s just a matter of time until that application becomes non-trivial and needs reorganisation and/or a rewrite. In those moments, it helps if you’ve designed your application in a way that’s easy to refactor and, with some forethought (and luck), a reorganisation might not even be necessary. A bigger application usually also means a bigger team consisting of people with varying degree of front-end and Angular knowledge. Having a clear set of guidelines regarding the architecture and coding style pays off very fast.

The aforementioned problems are exactly the problems we faced while building an application that gets more than 10 million visitors each month. After a while, developing a feature becomes a chore. The same questions always pop up:

Where do I put this piece of code?

How do I modify data?

How come this event changed my data and state?

Why does modifying a piece of code suddenly break more than half of my unit tests?

It was clear — we needed a new direction.

Setting a direction

Our goal at that point was to have something that’s easy to develop, maintain and test. If we accomplish that, there’s a good chance that our application is going to be future-proof as well.

This article aims to tell the story of a better architecture but also to provide a working example of all the principles discussed here. That’s why you’ll find an accompanying repository with an interactive demo application. Details of the repository and how it relates to this article will be discussed later.

Separation of concerns

Looking at the problem from a different angle, we’ve noticed that the biggest problem was writing tests that are not too brittle. Easy testing means that mocking various parts of an application is easy which lead us to the conclusion that we need better separation of concerns.

sketchThat also meant we needed a better data flow; one where it’s completely clear who provides and modifies data and who (and how) triggers data changes. After a few initial sketches, we’ve come to a rough sketch of a data flow that resembled React’s Flux. It’s pretty clear how data flows in a flux(-like) application. In a nutshell — an event (e.g. user or callback) requests a data change from a service which modifies the data and propagates the changes to components that need that data. This in turn makes it easy to see who triggered a data change and there’s always one data source.

Better tooling

One thing that made our life easier was using a language that transpiles to JavaScript. That’s something I would seriously recommend. The top two contenders right now are TypeScript and Babel. We chose TypeScript because the tooling made it easier to notice errors at compile time and refactor bigger pieces of code.

Future proofing

Future proofing means having an application that’s easy to maintain but also reasonably easy to upgrade. It won’t be long until Angular 2 becomes production ready and a sane architecture with TypeScript goes a long way in making the gradual upgrade easier.

The bare necessities

What follows is a list of advices I expect developers of a sane Angular application are going to follow:

  • separate your concerns,
  • keep the flow of data unidirectional,
  • manage your UI state using data,
  • use a transpiled language,
  • have a build process in place,
  • test.

Let’s dive into each one of them.

Separating concerns

When each layer of an application can run as a separate entity, doesn’t know too much about the system (layers that aren’t in direct contact) and is easily testable, you’ll have an application that’s a joy to work with. Angular offers building blocks that lend itself to such a separation of concerns. If you want a deep insight into the subject, check out this blog post.

Vertical separation

Concerns can be separated horizontally and vertically. Vertical separation happens when you split an application into verticals. Each vertical has a life of its own and internally should have horizontal separation. What worked best for us, was completely separating parts of the application (e.g. separate home page, details page, configuration page, etc.) into standalone web pages that each initialise an Angular application. Communication between these modules is easy and achievable by using standard techniques like sessions, URL parameters, etc.

verticals@2x

Horizontal separation

Where it gets interesting is horizontal separation. That’s where you actually build up your Angular application and place all its building blocks. It’s important to note that each layer (and block inside a layer) only knows about the layer above itself and doesn’t care about layers underneath that are going to consume its exposed functionalities.

Each vertical features a similar structure:

  • services layer,
  • facade layer,
  • components layer.

horizontals@2x

Components layer

The components layer is the layer that the users can interact with.
It contains directives with accompanying HTML templates and controllers. When testing (and conceptually designing), directives and HTML templates build one block and controllers build the other block of this layer.

The reason is simple — testing controllers is easy because they can be tested without a dependency on Angular. This exact feature of controllers makes them also the perfect place to put any functionality your directive requires. The preferred way then, would be to use controllerAs and bindToController in directives to build up components.

Blocks in this layer get parts of the facade layer injected and, through these, can pull data and request data modification.

components-layer@2x

A question often pops up in this layer — are we going to reach data to a component through isolated scope or get a service injected and request it? 

The answer to that question is not always clear and involves using common sense.
Smaller, reusable components without child components are a clear candidate for getting data through isolated scope and directly using that data.
Components featuring child components or more logic often benefit much more from getting their data through an injected service because they don’t get coupled to their parent.

Facade layer

The facade layer is an abstraction layer. A facade is defined as follows:

A facade can (…) reduce dependencies of outside code on the inner workings of a library, since most code uses the facade, thus allowing more flexibility in developing the system.

In our architecture, its only job is abstracting the back facing part (services layer) from the front facing part of your application (components layer). The blocks in this layer are services whose methods get called from the components layer and are then redirected to corresponding services in the services layer.

It’s that simple.

But also powerful, because such an abstraction is easy to split up and changes done to the services layer never affect your components layer.

Services layer

The services layer features all the smart things your application is supposed to do. Be it data modification, async fetching, UI state modification, etc. This layer is also the layer where your data lives and gets handed to the components layer through the facade layer.

services-layer@2x

This layer is typically going to feature:

  • services that handle your data or UI state (e.g. DataService and UIStateService),
  • services that assist them in doing so (e.g. DataFetchService or LocalStorageService) and
  • other services that you may need like a service that’s going to tell you at which breakpoint in a responsive layout you are.

Keeping the flow of data unidirectional

Now is the time to explain how all the layers and blocks fit together in a unidirectional flow of data.

data-flow@2x

Getting data

The services layer features services that know how to get data. The initial set of data is either already present as part of the HTML, asynchronously fetched or hardcoded. This data gets transformed into objects (your models) and is available through methods present on the services in your services layer.

The blocks in the components layer can now make a request for the data through the facade layer, get the already parsed data and display it. Easy.

Modifying data

If an event happens that should modify data, the blocks in the components layer make a request to the facade layer (e.g. “refresh list of users” or “update the contents of this article with this data”).

The facade layer passes the request to the correct service.

In the services layer, the request gets processed, the needed data gets modified and all the directives get the new data (because it was already bound to the directives). This works thanks to the digest cycle. Most events that happen are going to trigger a digest cycle which will then update the views. If you’ve got an event that doesn’t trigger the digest cycle (like a slider’s slide event), you can trigger a digest cycle manually.

Keep it flowing

As you can see, there’s only one place in your application that modifies your data (or a part of it). That same place provides that data and is the only part where something wrong with the data can happen which makes it much easier to debug.

Managing UI state using data

A larger Angular application is probably going to feature various states in which it can find itself. Clicking on a toggle can lead to the change of a tab, selection of a product and highlighting of a row in a table, all at the same time. Doing that on the DOM level (like jQuery manipulation) would be a bad idea because you lose the connection between your data and view.

Since we’ve already established a nice architecture, let’s use it to manage our UI state. You’d create a UIStateService in the services layer. That service would hold all relevant UI data and modify it if needed. Like already explained, that service would provide that data but also be in charge of modifying it. The facade layer would then delegate all needed changes to the correct service(s).

It’s important to note that a UIStateService might not be needed. Since views depend on data, most of the time it’s possible to just use that data and control the state of the views. A separate state service makes sense when you have to manage UI state that’s completely separated from your model.

Transpiling code

There are many benefits transpiling from a language to JavaScript. A few obvious ones are:

  • using features that are coming in newer versions of ECMAScript,
  • abstraction of JavaScript quirks,
  • compile time errors,
  • better tooling…

You can transpile from future versions of ECMAScript with Babel or even add typing support with TypeScript or Flow. You can’t go wrong with either of these choices because, at the end of the day, you get usable JavaScript. If any of the tools no longer exist, you can continue working with the generated JavaScript.

TypeScript

Seeing as how the Angular Team teamed up with Microsoft and are basing Angular 2 on TypeScript, it is safe to assume that the support for that stack is going to be really good. In that sense, it makes sense to get acquainted with TypeScript.

Aside from offering type safety, TypeScript has really good tooling support with editors like Sublime, Visual Studio Code or WebStorm which all offer autocompletion, inline documentation, refactoring, etc. Most of them also have a built-in TypeScript compiler so you can find compile-time errors while coding. The great autocompletion and inline documentation is possible because of type definition files. You would typically get a type definition file, put it in your project and reference it — the mentioned features work then out of the box. Visit DefinitelyTyped to see which libraries and frameworks are supported (hint: odds are, you’re going to find every library or framework you use there) and then use tsd to easily install them from the CLI.

The team at Angular is proposing a concept where libraries directly include the type definition files. The benefits of that approach are two-fold: there’s no need to search for type definition files and the type definition file you get with a version of a library always corresponds to the API of that version.

To get a quick look at all the benefits of developing with TypeScript, you can watch this video from Angular Connect.

A switch to TypeScript is mostly painless because valid JavaScript code is valid TypeScript code. Just change the file extensions to .ts, put a TypeScript compiler in your build process and you’re good to go.

Speaking of build process…

Having a build process in place

You do have a build process in place, don’t you?

If not, pick Grunt, Gulp, Webpack or whichever build/packaging tool you’d like to work with and get going. The repository accompanying this article uses Gulp, so you can get an idea how the code gets transpiled, packed for the web and tested. I won’t go into details on build tools because there are many articles out there detailing them.

build@2x

Testing

You should test all parts of your application.

I see quite often that people leave out testing HTML templates because they’ve got integration tests. Unfortunately, Angular won’t let you know if you’ve got a typo somewhere in your template and integration tests can get big and slow very fast while still not covering enough ground (not to mention the time needed to maintain them).

The point is — with a good architecture in place, testing is easy because you only test code you’ve written and mock away all dependencies. Angular’s dependency injection plays a big role as well and testing with Angular is straightforward.

A combination of Karma as test runner and Jasmine as testing framework is probably going to be enough for all of your test cases. Testing in your build process (between transpiling and packaging) is also going to make sure you’re not introducing regression bugs.

Testing directives means separately testing the directive definition with its accompanying template and controllers.
Controllers are easy to test because they just get instantiated with all of their dependencies mocked away and you can get straight to testing its insides. Most of the time, you’ll just be testing if your controllers delegated to the correct service in the facade layer.
Instantiating directives and mocking away their controller is also easy because the controller is present at the compiled element after Angular’s compilation. To test what’s happening in a template, change the controller or scope mock and run a digest cycle. The new values should be present.

Testing services in the facade or services layer is just as easy because you can mock away every dependency and really test only the code that’s present.

That’s also the main take-away here — test code that’s present in the component you’re testing. Tests should fail if you modify the public methods of a component, but only tests that are associated with that component and not half of all your tests. If writing tests is hard, you’re either testing too much (and not mocking away enough) or having a problem with the architecture of your application.

Real world example

1-uTlQ1o47ZW5ptmhFnKlBvQ

Heroes of Warcraft is a trademark and Hearthstone is a trademark or registered trademark of Blizzard Entertainment, Inc., in the U.S. and/or other countries.

As part of this article, you can check out and play with a demo application here.

It’s a deck management application for card games. Games like Hearthstone, Magic the Gathering and similar have players building decks from an ever-growing collection of cards and battle against each other. You can create and manage decks with a pre-built array of custom made cards taken from HearthCards.

Source repository

What we’ll discuss here is the repository from which the demo application was built and you can find that repository here. The idea behind this repository is to give you a working application that explores the ideas discussed in this article and a nice cheat sheet when you’re not sure how to implement a feature in Angular using TypeScript.

To get started, clone the repository and follow the README. That’s going to start up your server and serve the compiled Angular modules.

For easier work later, I recommend starting a watcher in each vertical by running gulp watch. Now, each time you modify a file inside of a vertical, Gulp is going to compile and test your changes.

Vertical separation

The application is divided into three verticals: common, deckmanager and deckbuilder. Each of these verticals is an Angular module. The common module is a utility module and gets injected into other modules.

Horizontal separation

All verticals feature a similar structure which follows what we’ve already discussed here in the article. You’ll find the directories components and services where the components directory contains directives, controllers and templates making it the components layer and the services directory where you’ll find the facade and services layer.

Let’s explore the layers.

Services layer

The deckmanager vertical is a good candidate because it features a data managing service and a UI state managing service. Each of these services has its own model consisting of objects that they’ll manage and provide.

DataService, further more, gets LocalStorageService from the common module. This is where separation of concerns pays off — the data (decks and cards in the decks) are going to be stored into local storage. Because our layers are decoupled, it’s easy to replace that storage service with something completely different.

If you take a look at the DataService in the deckbuilder vertical, you’ll see that we’re also injecting a PageValueExtractorService. That service allows us to have pre-populated data in HTML that gets parsed and used right away. This is a powerful technique that can make application startup much faster. Once again, it’s easy to see how trivial it is to combine data storage strategies and, if we decide to change the concept completely, our components won’t notice it. They just care about getting the right data, not how it got there.

Facade layer

Let’s look at the facade layer and see how it works in practice.

// ... imports

export default class FacadeService implements IFacadeService {
    private dataService:IDataService;
    private uiStateService:IUIStateService;

    constructor(dataService:IDataService, uiStateService:IUIStateService) {
        this.dataService = dataService;
        this.uiStateService = uiStateService;
    }

    public getDecks():IDeck[] {
    return this.dataService.getDecks();
}

public createNewDeck(name:string):void {
    this.dataService.createNewDeck(name);
this.uiStateService.setShowNewDeckForm(false);
}

// ... rest of service
}

FacadeService.$inject = ['DataService', 'UIStateService'];

The FacadeService gets the DataService and UIStateService by injection and can then further delegate logic between the other two layers.

If you look at the createNewDeck() method, you can see that the FacadeService isn’t necessarily just a delegation class. It can also decide simple things. The main idea is that we want a layer between components and services so that they don’t know anything about each other’s implementation.

Components layer

The structure of components includes the directive definition, a template and a controller. The template and controller are optional but, more often than not, they’re going to be present.

You can notice that the components are, for a lack of better words, dumb. They get their data and request modifications from the facade layer. Such a structure yields two big wins: less complexity and easier testing.

Take a look at a controller:

// ... imports

export default class DeckController {
    private facadeService:IFacadeService;

    constructor(facadeService:IFacadeService) {
        this.facadeService = facadeService;
    }

    public getDecks():IDeck[] {
        return this.facadeService.getDecks();
    }
    
    public addDeck():void {
        this.facadeService.setShowNewDeckForm(true);
    }
    
    public editDeck(deck:IDeck):void {
        this.facadeService.editDeck(deck);
    }
    
    public deleteDeck(deck:IDeck):void {
        this.facadeService.deleteDeck(deck);
    }
}

DeckController.$inject = ['FacadeService'];

A quick glance makes it obvious that this component provides CRUD functionalities for our game decks and that it’s going to be really easy to test this class.

Data flow

As discussed in the article, the data flow is going to feature components using the facade layer which is going to delegate those requests to the correct services and deliver results.

Because of the digest cycle, every modification is going to also update the values in the components.

To clarify, consider the following image:

app-data-flow@2x

This image shows the data flow when a user clicks on a card in the Deck Builder. Even before the user interacts with the card gallery, the application has to read the contents of the current deck and all cards supported in the application. So, the first step is the initial pull of data that happens from the components through the facade to the services.

After a user clicks on a card the facade layer gets notified that a user action needs to be delegated. The services layer gets notified and does the needed actions (updating the model, persisting the changes, etc.).

Because a user click using ngClick triggers a digest cycle, the views are going to get updated with fresh data just like it happened in the first step.

Under consideration

The application is tested and features a simple build process. I’m not going to dive deep into these topics because the article is big enough as is, but they are self-explanatory.

The build process consists of a main Gulp configuration file and little configuration files for each vertical. The main Gulp file uses the vertical files to build each vertical. The files are also heavily annotated and shouldn’t be a problem to follow.

The tests try to be limited just to files that they’re concerned with and mock everything else away.

What now?

The application has lots of places where it could be improved upon:

  • additional filtering of cards by cost, hit points, attack points or card rarity
  • sorting by all possible criteria,
  • adding Bootstrap’s Affix to the chosen cards in the deck builder
  • developing a better Local Storage service which has much better object checking and casting
  • further improving the Page Value Extractor service to allow for metadata being included in the JSON for better type association
  • etc.

If you check the source code of the application, you’ll notice that there are comments marked with TODO. It’s possible to track these comments in IDEs and text editors (WebStorm and Visual Studio Code do it out of the box, Sublime has several plugins that support it). I’ve included several TODOs that range from new features to improvements and you’re very welcome to fix them and learn a few things along the way.

The devil is in the detail

The points discussed in this article mostly deal with big picture stuff.

If you want to find out about implementation details that can creep up while developing an Angular application, watch this entertaining video from Angular Connect about the usual errors in Angular applications.

Another great resource is this blog post by a developer who re-built the checkout flow at PayPal with Angular.


Back to the drawing board

We have a working application and an idea on how to structure our applications. It’s time to go back to the drawing board now and see if this can really be considered a win.

Consider the demo (tutorial) application that’s featured at the official Angular 2 page — John Papa’s Tour of Heroes. I’ve linked directly to the sources so you can click through the various parts of the application source code. What you’ll notice right away is how similar it feels to the application that’s part of this article. Also, you’ll notice that the take-aways from this article can easily be applied to this application as well — just take the logic out of the components and add layers for a better data flow.

The biggest advantage of developing a well-structured Angular application with TypeScript is the future-proofing that you get. Angular 2 is shaping up to be a great framework and easier to use than Angular 1 with lots of sugar (like annotating components).

Why not, then, upgrade our knowledge for things to come?

Machine Learning with Spark: Kaggle’s Driver Telematics Competition

Do you want to learn how to apply high-performance distributed computing to real-world machine learning problems? Then this article on how we used Apache Spark to participate in an exciting Kaggle competition might be of interest.

The Lab

At comSysto we regularly engage in labs, where we assess emerging technologies and share our experiences afterwards. While planning our next lab, kaggle.com came out with an interesting data science challenge:

AXA has provided a dataset of over 50,000 anonymized driver trips. The intent of this competition is to develop an algorithmic signature of driving type. Does a driver drive long trips? Short trips? Highway trips? Back roads? Do they accelerate hard from stops? Do they take turns at high speed? The answers to these questions combine to form an aggregate profile that potentially makes each driver unique.1

We signed up for the competition to take our chances and to get more hands on experience with Spark. For more information on how Kaggle works check out their data science competitions.

This first post describes our approach to explore the data set, the feature extraction process we used and how we identified drivers given the features. We were mostly using APIs and Libraries provided by Spark. Spark is a “fast and general computation engine for large scale data processing” that provides APIs for Python, Scala, Java and most recently R, as well as an interactive REPL (spark-shell). What makes Spark attractive is the proposition of a “unified stack” that covers multiple processing models on local machine or a cluster: Batch processing, streaming data, machine learning, graph processing, SQL queries and interactive ad-hoc analysis.

For computations on the entire data set we used a comSysto cluster with 3 nodes at 8 cores (i7) and 16GB RAM each, providing us with 24 cores and 48GB RAM in total. The cluster is running the MapR Hadoop distribution with MapR provided Spark libraries. The main advantage of this setup is a high-performance file system (mapr-fs) which also offers regular NFS access. For more details on the technical insights and challenges stay tuned for the second part of this post.

Telematic Data

Let’s look at the data provided for the competition. We first expected the data to contain different features regarding drivers and their trips but the raw data only contained pairs of anonymized coordinates (x, y) of a trip: e.g. (1.3, 4.4), (2.1, 4.8), (2.9, 5.2), … The trips were  re-centered to the same origin (0, 0) and randomly rotated around the origin (see Figure 1).

Figure 1: Anonymized driver data from Kaggle’s Driver Telematic competition1

At this point our enthusiasm got a little setback: How should we identify a driver simply by looking at anonymized trip coordinates?

Defining a Telelematic Fingerprint

It seemed that if we wanted useful and significant machine learning data, we would have to derive it ourselves using the provided raw data. Our first approach was to establish a “telematic fingerprint” for each driver. This fingerprint was composed of a list of features that we found meaningful and distinguishing. In order to get the driver’s fingerprint we used the following features:

Distance: The summation of all the euclidean distances between every two consecutive coordinates.

Absolute Distance: The euclidean distance between the first and last point.

Trip’s total time stopped: The total time that the driver has stopped.

Trip’s total time: The total number of entries for a certain trip (if we assume that every trip’s records are recorded every second, the number of entries in a trip would equal the duration of that trip in seconds)

Speed: For calculating the speed at a certain point, we calculated the euclidean distance between one coordinate and the previous one. Assuming that the coordinates units were meters and that the entries are distributed with a frequency of 1 second. This result would be given in m/s. But this is totally irrelevant since we are not doing any semantic analysis on it and we only compare it with other drivers/trips. For the speed we stored the percentiles 10, 25, 50, 80, 98. We did the same also for acceleration, deceleration and centripetal acceleration.

Acceleration: We set the acceleration to the difference between the speed at one coordinate and the speed at the previous one (when we are increasing speed).

Deceleration: We set the deceleration to the difference between the speed at one coordinate and the speed at the previous one (when we are decreasing speed).

Centripetal acceleration: We used the formulae:

centripetal acceleration

where v is the speed and r is the radius of the circle that the turning curve path would form. We already have the speed at every point so the only thing that is missing is the radius. For calculating the radius we take the current, previous and subsequent points (coordinate). This feature is an indicator of “aggressiveness” in driving style: high average of centripetal acceleration indicates turning at higher speeds.

From all derived features we computed a driver profile (“telematic fingerprint”) over all trips of that driver. From experience we know that the average speed varies between driving in the city compared to driving on the highway. Therefore the average speed over all trips for a driver is maybe not revealing too much. For better results we would need to map trip features such as average speed or maximum speed to different trip types like inner city trips, long distance highway trips, rural road trips, etc. 

Data Statistics: Around 2700 drivers with 200 trips each, resulting in about 540,000 trips. All trips together contain 360 million X/Y coordinates, which means – as they are tracked per second – we have 100,000 hours of trip data.

Machine Learning

After the inital data preparation and feature extraction we could turn towards selecting and testing machine learning models for driver prediction.

Clustering

The first task was to categorize the trips: we decided to use an automated clustering algorithm (k-means) to build categories which should reflect the different trip types. The categories were derived from all trips of all drivers, which means they are not specific to a certain driver. A first look at the extracted features and computed categories revealed that some of the categories are indeed dependent on the trip length, which is an indicator for the trip type. From the cross validation results we decided to use 8 categories for our final computations. The computed cluster IDs were added to the features of every trip and used for further analysis.

Prediction

For the driver prediction we used a Random Forest algorithm to train a model for each driver, which can predict the probability of a given trip (identified by its features) belonging to a specific driver. The first task was to build a training set. This was done by taking all (around 200) trips of a driver and label them with “1” (match) and then randomly choosing (also about 200) trips of other drivers and label them with “0” (no match). This training set is then fed into the Random Forest training algorithm which results in a Random Forest model for each driver. Afterwards the model was used for cross validation (i.e. evaluating the error rate on an unseen test data set) and to compute the submission for the Kaggle competition. From the cross validation results we decided to use 10 trees and a maximum tree depth of 12 for the Random Forest model (having 23 features).

An interesting comparison between the different ensemble learning algorithms for prediction (Random Forest and Gradient-BoostedTrees (GBT) from Spark’s Machine Learning Library (MLib)) can be found on the Databricks Blog.

Pipeline

Our workflow is splitted into several self-contained steps implemented as small Java applications that can be directly submitted to Spark via the “spark-submit” command. We used Hadoop Sequence files and CSV files for input and output. The steps are as follows:

spark-article-1

Figure 2: ML pipeline for predicting drivers

Converting the raw input files: We are faced with about 550,000 small CSV files each containing a single trip of one driver. Loading all the files for each run of our model can be a major performance issue, therefore we converted all input files into a single Hadoop Sequence file which is served from the mapr-fs file system.

Extracting the features and computing statistics: We load the trip data from the sequence file, compute all the features described above as well as statistics such as variance and mean of features using the Spark RDD transformation API and write the results to a CSV file.

Computing the clusters: We load the trip features and statistics and use the Spark MLlib API to compute the clusters that categorize the trips using k-means. The features CSV is enriched with the clusterID for each trip.

Random Forest Training: For the actual model training we load the features for each trip together with some configuration values for the model parameters (e.g. maxDepth, crossValidation) and start a Random Forest model training for each driver with labeled training data and optional testdata for crossvalidation analysis. We serialize each Random Forest model to disk using Java serialization. In its current version Spark provides native saving and loading of model result instances, as well as configuring alternative serialization strategies.

For the actual Kaggle submission we simply load the serialized models and predict the likelihood of each trip belonging to that driver and save the result it in the required CSV format.

Results and Conclusions

This blog post describes our approach and methodology to solve the Kaggle Driver Competition using Apache Spark. Our prediction model based on Random Forest decision trees was able to predict the driver with an accuracy of around 74 percent which placed us at position 670 at the Kaggle leaderboard at the time of submission. Not bad for 2 days of work, however there are many possible improvements we identified during the lab.

To learn more about the implementation details, technical challenges and lessons learned regarding Spark stay tuned for the second part of this post.

You want to shape a fundamental change in dealing with data in Germany? Then join our Big Data Community Alliance!

Sources:
1. https://www.kaggle.com/c/axa-driver-telematics-analysis

Introduction To The E-Commerce Backend commercetools platform

This blog post is an introduction to the e-commerce backend commercetools platform, which is a Platform as a Service (PaaS) of commercetools GmbH from Munich, and gives impulses on how to use it.

First the facts to commercetools and commercetools platform:

commercetools GmbH is a Munich based company situated in the north near Olympia Park and has further offices in Berlin and New York. The commercetools platform is a backend for all kinds of e-commerce use cases including online pure players, mobile and point-of-sales application, couch-commerce and marketplaces. commercetools began developing its platform in 2006 and has never stopped since.
I will at first give an overview of the UI of the platform with examples as to how to use it and then talk about the Rest API they provide in order to access data for an imaginary online shop.

User interface of commercetools platform

The sign up process is fairly easy and completed in about 5 minutes. You create an account and associate a project with it. One account can hold several projects and you can invite several accounts to one project. You will be asked whether you want to include test data in the project which is advisable for your first project.

Sphere Dashboard

Dashboard commercetools platform

The self-explanatory UI allows access to all needed functionalities from Products to Orders to Settings and content for developers. The first thing you will see is the dashboard which gives you revenue statistics for any given time.

I will guide you through the account as the workflow of creating a project should be:

  • Creating Product Types:
    At first you have to understand the difference between product types and categories. Product types are used to describe common characteristics and most importantly, common custom attributes, whereas categories are used to organize products in a hierarchical structure.

    creating a product type

    Creating a product type

    Look at the product type drink I created. I added two attributes, alcohol as a boolean and volume as a number. Now every product which is created using this product type has to have these two attributes additionally to all the other attributes I will show you later.

  • Creating Categories:
    As mentioned the categories are used to organize products in you project. This should be nothing spectacularly new.

    Creating categories

    Creating categories

    I decided to use a root category containing all other categories as subcategories to make my life easier later when retrieving the categories for the online shop. A category has just name, description, parents and children.

  • Creating Products:
    Now to the important part of the setup, the products itself. When creating a product you will have to choose one of the previously created product types. Note that a product can only be of one product type.

    Creating a product

    Creating a product

    After inserting name, description, the custom attributes and a few others the product is created. You can now upload pictures, add categories, create product variants (for example for different colors), add prices and even define SEO attributes.

  • Everything else via API:
    Creating Customers and Orders is possible in the UI but is, in my opinion, more practicable by API calls. This will be explained in the next part of this post.

REST API of commercetools platform

There are a lot of SDKs in different languages like Java, PHP and Node.JS for access to the API (check out the git-repository) but I decided to code directly against the API via the REST service. The API is fully documented here. I wrote an OnePage App with AngularJS and used the Angular $http service for my API calls, which I will show you in this part of my post. Transported data in both directions is in JSON format which allows fast and reliable handling.

Authorization

A client has to obtain an access token via an OAuth2 service. There are several access scopes, such as view_products, manage_orders and view_customers, which allow different kind of interaction with the platform. Normally you would have to implement a small server which handles the authentication and authorization. Otherwise the token would have to be stored on client side, which is not save, for with the manage_orders token a client can not only manage his own orders but all orders of the project. I ignored that for my test application and concentrated on the Rest API.

Getting Products

To obtain the products from the platform I used Angular’s http service:

function loadProducts(){
    $http.get('https://api.sphere.io/testshop-rw/product-projections?current=true')
        .success(function(data){$scope.loadProductsResponse = data;
                               handleLoadProductsResponse($scope.loadProductsResponse);
                                })
}

As response to this request you will receive a list of products with all parameters you can possibly need. Notable is the fast response time of the server which was never over 200 ms.

Carts, Customers and Orders

The most important task for an online shop is the handling of customers and their carts and orders. My test implementation creates an anonymous cart for every new user that enters the website:

if(localStorage['cartId'] === null){
    $http.post('https://api.sphere.io/testshop-rw/carts', {'currency':'EUR'/*,'customerId':localStorage['customerId']*/})
          .success(function(data){localStorage['cartId'] = data.id;})
}

As you can see I use the localStorage feature to store data. That way the customer can come back later or refresh the website without loosing previously obtained data. Once a customer logs in, the cart will be merged into the existing cart of the customer.

Registration for a customer is as simple as this:

function signUp(emailAddress, password, lastName, firstName, streetName, streetNumber, routingCode, city){
    $scope.registerCustomer = {
      email: emailAddress,
      firstName: firstName,
      lastName: lastName,
      password: password,
      anonymousCartId: localStorage['cartId'],
      addresses :[{
        email: emailAddress,
        firstName: firstName,
        lastName: lastName,
        streetName: streetName,
        streetNumber: streetNumber,
        postalCode: routingCode,
        city: city,
        country: 'DE'
     }]
  }
  angular.toJson($scope.registerCustomer)
  $http.post('https://api.sphere.io/testshop-rw/customers', $scope.registerCustomer)
    .success(function(data){$scope.signUpResponse = data;
                            signUpSuccess($scope.signUpResponse);
                            })
    .error(function(data){$scope.signUpResponse = data;
                          handleError(signUpResponse);
                          })
}

The customer can add several addresses including shipping and billing addresses which allows him to select one of them for checkout.

An order is created from a cart or an anonymous cart via POST:

function cartToOrder(updateCartResponse){
      $scope.makeOrder = {
        id : updateCartResponse.id,
        version : updateCartResponse.version
      }
      angular.toJson($scope.makeOrder);
      $http.post('https://api.sphere.io/testshop-rw/orders', $scope.makeOrder)
        .success(function(data){$scope.cartToOrderResponse = data;
                                orderSuccess($scope.cartToOrderResponse);})
}

The process a customer goes through until a product is ordered is fairly simple and only uses a few API calls.

Search

commercetools platform gives you built in fast search and filtering capabilities. Using NoSQL technology, the API allows you to create comprehensive product searches, after-search navigation and configuration. In addition, every change made to the product catalog is automatically indexed.
With the built-in facet technology you can enhance customer experience and usability with extended search and navigation capabilities. Therefore customers can find products faster – especially if you have a comprehensive and complex catalog.

The operators point of view

As the company which operates the online shop you have a pretty easy job, too. All products can be uploaded and updated via CSV files which allows you to manipulate all products at once and not one after the other. There are a few different payment statuses which can be given to orders with the payment state.

plug in integrations

plug in integrations

Orders can be downloaded in CSV or XML to feed them to your inventory control system and logistics provider.

Unfortunately as of yet there are no plug in payment methods which is sad but there is a silver lining. commercetools is working on that right now. The same with the direct integration of Hippo CMS which would allow you to manage all content via Hippo.
Other than that there are several ways to integrate the commercetools platform to your existing IT landscape (see graphic).

For more information on the commercetools platform, here are a few links which might be useful:

All in all I enjoyed working with commercetools because of the complete API documentation, the fast and very helpful support and the very fast and easy accessible API. Just sign up for a free trial and see for your self.

If you want to learn more about AngularJS, register now for our Training and get Early Bird Tickets.