With the ever-evolving technological landscape, the business needs and outcomes no longer exist as a default. Organizations across most industries are adopting Artificial Intelligence (AI) systems to solve complex business problems, design intelligent and self-sustaining solutions, and, essentially, stay competitive at all times. To this end, incessant efforts are being made to reinvent AI systems so that more can be achieved with less.
Adaptive AI is a key step in that direction. The reason why it could outpace traditional machine learning models in the near future is for its potential to empower businesses in achieving better outcomes investing lesser time, efforts, and resources.
Why is the traditional machine learning model not up to the task anymore?
A traditional Machine Learning (ML) model has two pipelines – Training and Prediction.
The Training pipeline collects and ingests data through the various stages of data cleaning, grouping, transformation, etc. The Prediction pipeline is the one where an ML model analyzes the data to yield accurate insights and predictions for effective decision making.
However, having two pipelines to cover the miles between ingestion and insight comes with its share of downsides. In addition to the obvious, surface-level challenges, like putting up an elaborate infrastructure for the 2 pipelines and the associated cost overheads, is the fact that the turnaround time is almost always long.
Now, let’s just say we have an organization with the most ideal conditions, i.e. an organization that reserves a generous budget for AI and is willing to invest enough time to let the two pipelines wrestle and wrangle all the data.
Does that solve problems across the board? Largely not, because the very nature of traditional AI poses a major challenge that any organization has to deal with on an ongoing basis.
In traditional Artificial Intelligence systems, the learning methodologies deployed in production are challenged when:
- the system’s operational environment changes; OR
- the underlining input to the system is altered; OR
- the outcome desired by the organization changes
Each of these conditions or events can significantly affect the functional accuracy and efficiency of a system.
Consider the following example.
You run a news website which has its revenue tied to the number of users that click on the news items posted throughout the day. Now, a users’ browser history and cookies help you in user-profiling and thus serving them interest-focussed news content.
But then, a large-scale event concerning the national security takes place; let’s say tensions over the border with your neighbouring country escalate and there is a growing fear of a war breaking out. In this matter, the government informs of holding a press conference at any hour of the day. As expected, everyone is interested to read about national affairs, including those who restrict their dose of news to sports or finance. And herein lies the challenge for you.
Even if you had batch-trained your model every single day, it would still be sharing items based on the content consumed on the day before, since the model is not quick enough to adapt to the dramatic change in user’s preference on the same day.
Now, when on the following day the data pertaining to the heightened interest in national affairs is fed to the new training cycle, the users start to receive the related news recommendations. However, since the data is from the day before, the users may no longer be as interested in national affairs as they were on the day of the press conference.
So, where did the model fall behind?
While the model is doing its job of refreshing the type of content delivered on per day basis, what you would have wanted it to do was to take the latest developments happening in the country and update the content-type by the minute or by the second. This holds true for businesses of all stripes. In the highly competitive and unpredictable business environment of today, your business can’t afford to wait an entire day for your AI to adapt and deliver.
How is Guavus’ Adaptive AI Different
Guavus Adaptive Learning method employs a single pipeline. With this method, we have developed a continuously enriched learning approach that keeps the system updated and helps it achieve high performance levels.
Guavus’ Adaptive Learning process monitors and learns the new changes made to the input and output values and their associated characteristics. In addition to that, it also learns from the events that may alter the market behavior in real-time and, hence, maintains its accuracy at all times. Adaptive AI accepts the feedback received from the operating environment and acts on it to make data-informed predictions.
In our assessment and experiments with multiple customers, we evaluate the results generated through our method in a qualitative and quantitative manner. The results obtained are consistently accurate, have excellent coverage, and lead to a significant impact on the performance of the learning system.
The process eliminates the hassle of creating a training pipeline for ML-AI systems. The system is flexibly designed to learn from the new observation while working on older predictions, keeping the processes updated in real time. This flexibility removes the risk of learning systems getting obsolete or working on outdated training samples that have made the conventional methods inefficient.
Adaptive Learning tries to solve these problems while building ML models at scale. Because the model is trained via a streaming approach, sparsity is taken care of, making it efficient for domains with highly sparse datasets where noise handling is important.
The pipeline is designed to handle billions of features across vast datasets while each record can have hundreds of features, leading to sparse data records. This system works on a single pipeline as opposed to the conventional ML pipelines that are divided into 2 parts, as discussed earlier. This provides quick solutions to proof of concepts and easy deployment in production. The initial performance is comparable to batch models but goes on to surpass them by acting and learning from the feedback received by the system. The process makes it way more robust and sustainable in long term.
Tech and Methodology
For streaming data-based supervised problems, online algorithms are used as they can provide a row-by-row training mechanism. There’s a data processing step leading to Feature hashing and then predict/update step depending on whether we are in training or prediction phase. Our pipeline developed initially is an ensemble of algorithms categorically – discriminative and generative.
Feature Hashing –
The number of features (D) to use are defined beforehand so that while learning in a sequential manner, the feature-space remains constant even if new feature-values are encountered. It is accomplished by storing the hash of the feature-value, and a dictionary is defined where key is the hash of the feature-value and value is the frequency of the hash in the row. In the predict/update step for a data point, we only need to consider the hashes returned and not the overall feature space defined by D. This method results in collisions and to minimise them D should be large and a strict hashing function be used.
The two models which are part of the ensemble have different attributes. The ensemble helped in overcoming the shortcomings of the individual models like over-fitting, longer convergence and consists of models which perform better for different parts of data.
The discriminative model has a per-coordinate learning rate schedule and it helps in modifying the learning rate as a function of how much data was observed for a feature (or the number of times we’ve encountered a feature). So, if a feature occurs frequently it may be noise and its impact on the model is downgraded through per-coordinate learning rate. It also supports L1 and L2 regularisation.
The generative model is based on a probit regression model that maps discrete or real-valued input features to probabilities. It maintains Gaussian beliefs over weights of the model and performs Gaussian online updates derived from approximate message passing. It assumes that the probability of target-class is a function of the weighted linear combination of the features (the function being the cumulative distribution function of the Gaussian distribution, replacing the more standard sigmoid). As an online algorithm, it updates the weight coefficients for each sample, in a way which minimises the “distance” between the new resulting distribution and the samples. As a Bayesian algorithm, it assumes the weights are indeed from a Gaussian distribution, and the algorithm keeps track of the centres and standard deviations.
Image attribution: metamorworks