This article is about one of the newest products of WSO2, WSO2 Machine Learner (WSO2 ML). We have released the very first general availability release of WSO2 ML. For people who are wondering, when did I move from Stratos team to ML team, it happened January this year (2015) on my request (Yes, WSO2 was kind enough to accommodate my request :-)). We are a 7 member team now (effectively 3 in R&D) and lead by Dr. Srinath Perera, VP Research. We also get the assistance from a member of UX team and a member of documentation team.
What is Machine Learning?
“Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the construction and study of algorithms that can learn from and make predictions on data.”
More simplified definition from Professor Andrew Ng of Stanford University;
“Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI.” (source: https://www.coursera.org/course/ml)
In simple terms, with machine learning we are trying to make the computer learn patterns from a vast amount of historical data and then use the learnt patterns to make predictions.
What is WSO2 Machine Learner?
WSO2 Machine Learner is a product which helps you to manage and explore your data, build machine learning models after analyzing the data using machine learning algorithms, compare and manage generated machine learning models and predict using the built models. Following image depicts the high level architecture of WSO2 ML.
WSO2 ML exposes all its operations via a REST API. We use well-known Apache Spark to perform various operations on datasets in a scalable and efficient manner. Currently, we support number of machine learning algorithms, covering regression and classification types from supervised learning techniques and clustering type from unsupervised learning techniques. We use Apache Spark's MLLib to provide support for all currently implemented algorithms.
In this post, my main focus is to go through the feature list of WSO2 ML 1.0.0 release, so that you could see, whether it can be used to improve the way you do machine learning.
Manage Your Datasets
We help you manage your data, through our dataset versioning support. In a typical use case, you would have an X amount of data now and you would collect another Y amount of data in a month time. With WSO2 ML you could create a dataset with version 1.0.0 which points to X data and in a month time you could create version 1.1.0 which points to (X+Y) data. Then, you could pick these different dataset versions, run a machine learning analysis on top of them and generate models.
WSO2 ML accepts CSV, TSV data formats and the dataset files can reside in file system or in an HDFS. In addition to these storages, we support pulling data from a WSO2 Data Analytics Server generated data table [doc].
Explore Your Data
Once you uploaded datasets into WSO2 ML, you could explore few key details about your dataset such as feature set, scatter plots to understand the relationship of two selected features, histogram of each feature, parallel sets to explore categorical features, trellis charts and cluster diagrams [doc].
Manage Your ML Projects
WSO2 ML has a concept call 'Project' which is basically a logical grouping of set of machine learning analyses you would perform on a selected dataset. Note that when I say a dataset, it implies multiple dataset versions belong to a particular dataset. WSO2 ML allows you to manage your machine learning projects based on datasets and also based on users.
Build and Manage Analyses
WSO2 ML has a concept call 'Analysis' which holds a pre-processed feature set, a selected machine learning algorithm and its calibrated set of hyper-parameters. Each analysis belongs to a project and a project can have multiple analyses. Once you create an analysis, you cannot edit it but you can view it and also delete it. Analysis creation can be done using the wizard provided by WSO2 ML.
Run Analyses and Manage Models
Once you followed the wizard and generate an analysis, final step is to pick a dataset version from the available versions of the project's dataset and run the analysis. Outcome of this process is a machine learning model. Same analysis can be run on different dataset versions and generate multiple models.
Once a model is generated you could perform various operations on it such as viewing the model summary, downloading the model object as a file, publishing the model into WSO2 registry and predicting.
The ultimate goal of you is to build an accurate model which can later be used for prediction. To help you out here, i.e. to allow you to easily compare all the different models got created using different analyses, we have a model comparison view.
In a Classification problem case, we will sort the models using their accuracy values, and in numerical prediction case we sort base on the mean squared error.
ML REST API
All the underlying WSO2 ML operations are exposed using the REST API and in fact our UI client is built on top of the ML REST API [doc]. If you wish, you could write a client in any language, on top of our REST API. It currently supports basic auth and session based authentication.
Our Jaggery based UI is built using latest UX designs and you probably have felt it from the screenshots seen thus far in this post.
ML-WSO2 ESB Integration
We have written a ML-ESB mediator which could be used to do prediction of data collected from an incoming request against a ML model generated using WSO2 ML [doc].
ML-WSO2 CEP Integration
In addition to ESB mediator, we have written a ML-CEP extension, which could use to do real-time predictions against a generated model [doc].
External Spark Cluster Support
WSO2 ML by default ships an embedded Spark runtime, so that you could simply unzip the pack and start playing with it. But it can be configured to connect to an external Spark cluster [doc].
* Deep Learning algorithm support using H2O - this is currently underway as a GSoC project.
* Data pre-processing using DataWrangler - current GSoC project
* Recommendation algorithm support - current GSoC project
... whole lot of other new features and improvements.
This is basically a summary of what WSO2 ML 1.0 is all about. Please follow our GitHub repository for more information. You are most welcome to try it out and report any issues in our Jira.