Tag Archives: Web Application

startup-feedback-3-code

Background of collaborative filtering with Mahout

In order to set up Apache Mahout, a library written in Java to perform scalable machine learning algorithms based on Hadoop, in the architecture of Mario’s fabulous online shop for pizza, pasta and co (see blog post Building an Online-Recommendation Engine with MongoDB and Mahout) we’d like to know which recommendation strategy is the best for our so far fictional use case (which is computing recommendations for 32 products and 101 users in real time). With this small amount of data we could also use other tools, e.g. Weka, but in an actual online shop the occurring data would be a lot more than what we simulate here, which is why we choose Apache Mahout. Before we dive into coding details let’s have a look at what Mahout’s collaborative filtering actually does.

Collaborative Filtering

In order to be able to transfer the recommendation logic to use cases of different businesses we opt for collaborative filtering. A technique for producing recommendations solely based on the user’s preferences for products (instead of including product features and/or user properties). Well, collaborative filtering can be user- or item-based. User-based recommendation promotes products to the user that are bought by users who are similar to her.

User-based Recommendation

User-based Recommendation: recommend products to a user based on what similar users have bought

Item-based recommendation proposes products that are similar to the ones the user already buys.

Item-based Recommendation

Item-based Recommendation: recommend products to a user that are similar to the ones he already bought

User-Item Preferences and Similarity

Alright, but what does similar mean in this context? In collaborative filtering similarity between users (for user-based recommendations) or items (for item-based recommendations) is computed based on the user-item preference only. We use the number of how often a user bought a product as a proxy for the user’s preference. It’s not a perfect proxy but is does the trick and it’s easy to gather. One could also use the number of clicks or views or a combination of those.

Based on these user-item preferences we can use the Euclidean distance or the Pearson correlation to determine the similarity between users respectively items (products). Based on the Euclidean distance, two users are similar if the distance between their preference vectors projected into a Cartesian coordinate system is small. In fact, the Pearson correlation (based on demeaned user-item preferences) coincides with the cosine of the angle between the preference vectors. That is, two users are similar if the angle between their preference vectors is small, or formulated in terms of correlation, two users are similar if they rate the same products high and other products low, intuitively spoken. 

Euclidean and Cosine/Pearson Similarity

Difference between Euclidean and Cosine/Pearson User-Similarity

However, user-item preferences can be (intentionally) limited to pure association, i.e. the user buys or doesn’t buy the product (respectively views or doesn’t view the product etc.). In this case, similarities between users or items can be computed based on the Tanimoto coefficient or the log-likelihood ratio. Both similarities are concepts of how likely respectively unlikely it is that two users have both an association to some items but not to other items.

Tanimoto similarity

The Tanimoto similarity between 2 users is computed as the number of products the 2 users have in common divided by the total number of products they bought (respectively clicked or viewed) overall.

This isn’t really a detailed description of similarity measures and it doesn’t need to be one: Even if one fully understands the concept and computational details of these similarities, in the end one would probably still prefer a data driven decision in order to choose between them for the particular use case at hand.

So Mario decided to implement all of the above mentioned recommenders, that is user- and item-based each combined with one of the for similarity measures, plus the Slope One recommender which doesn’t need any similarity measure as input at all. Once all 9 Mahout recommendation strategies are implemented he wants to evaluate and compare them.

Stay tuned for the coding details of how to integrate the open source recommendation framework Mahout into Mario’s online shop.

Please feel free to attend our talk “Building a Online-Recommendation Engine with MongoDB” at the Free GOTO NoSQL Munich – part II in Munich, April 9, 2013 to get a live and comprehensive presentation of our online-recommencation engine. Furthermore, we would love to meet you at the NoSQL Roadshow Munich 2013. A great place to learn more about NoSQL and Big Data technologies. To get a 30% discount please use the comSysto Code COMSYSTO30.

 

comSysto becomes a Hippo Partner

Since February 2012 comSysto is proud to be one of the direct partners of Hippo, delivering context aware CMS solutions and empowering their audience to engage with content.

Hippo is a Dutch company located in Amsterdam providing CMS based solutions for over 10 years. Their main product Hippo CMS is the first web content management solution to deliver context-aware content for its customers. According to CMS Match - a wiki portal for content management systems – Hippo CMS is one of the most feature-rich CMS solutions including context awareness, multi lingual, multi channel, multi site, SEO, advanced search, reporting and an intuitive interface. For details take a look at Hippo’s key capabilities.

From a technical perspective it is based on open source technologies like the Spring Framework, Apache Wicket, Apache Jackrabbit and open standards like the Content Repository for Java Technology API (JCR, specified in JSR 170 and JSR 283). It provides seamless integration in any web framework like Apache Wicket and Spring MVC and provides its own Hippo Site Toolkit (HST) for building CMS based web applications. The CMS itself is extensible through a plugin-architecture and managed content is accessible in various ways like via the Hippo Repository or REST services. For an overview about the technology behind Hippo take a look at Hippo’s technology overview.

Do you want to know more? Then try the online demo and get in touch with us – we are looking forward to build and deliver the content web solution you need!

hippo-comSysto

Apache Wicket Training von comSysto und JWeekend

Am 22. und 23. Mai 2010 organisiert die comSysto GmbH in Zusammenarbeit mit JWeekend und Wicket London User Group das erste Apache Wicket Training im deutschsprachigen Raum.

Die Vermittlung der Inhalte wird an praktischen Beispielen vollzogen, so dass nach 2 Tagen eine beeindruckende, von den Teilnehmern geminsam entwicklete webbasierte Anwendung entstehen wird.

In der Halbzeitpause des Trainings am Samstag Abend widmen wir uns noch einem sehr wichtigen Thema – Fußball! Im Champions League Finale zwischen Bayern München und Inter Mailand werden wir selbstverständlich die Bayern auf ihrem Weg zum diesjährigen “Triple” lautstark unterstützen.

Mehr Informationen über die Ziele und Ablauf des Trainings finden Sie in unserem Flyer unter:

comSysto Apache Wicket Training

Bei Interesse bitte eine kurze Email an kontakt[at]comsysto.com schreiben, es sind nur noch wenige Plätze frei.