Technical Blog

All the latest technical and engineering news from the world of Guavus

Guavus – state of the art K8s integration and orchestration for your data science applications

By Guillaume Lebault, Principal Technologist & Kiran Subash, Director of Engineering at Guavus, a Thales company

Productizing data science applications is complex and requires a huge level of expertise in domains from development to integration to run mode. The monolithic application paradigm is no longer valuable nor valid, and Guavus has the best level of skills to support its customers adopt and build reliable micro-services based architectures.

Modularity, interoperability, scalability, portability, hybrid architectures, microservices or containers orchestration: here are some of the key buzzwords from the IT press and social networking sphere, often associated with Kubernetes (K8s) architectures or platforms.

Kubernetes is the way to go when your applications must be modularized and segmented into microservices. It decreases the dependency caveats encountered in monolithic applications related to updating or changing too many interdependent parts.

By microservice, we mean an independently deployable component, which is supposed to be scalable, easy to manage, and quick to deploy.

Containerization of those microservices, e.g. Docker, provides resource isolation and dependency management, while orchestration, e.g. Kubernetes, allows abstraction of the underlying hardware resources.

5G, high availability, cloud native services, and edge computing are some of the reasons that Telecom Services Providers tend to breakdown the complexity into independent components.

The same applies for data science and analytics applications, which are intrinsically complex in mobile and fixed networks with millions of subscribers. So, Telco services providers are natural candidates for adopting microservices-based data analytics applications.

A typical data workflow relies on several steps summarized below:

Data sources & Acquisition: Our customers face a huge variety of data sources, for which the acquisition scheme may differ depending on the nature of the data: batch files, real-time data streams, or Kafka topics. How can your application adapt to more data sources as input?

Preparation & Transformation: The preparation of the data may require heavy processing when you need to enrich real-time data with contextual data. The involved process can be CPU intensive and result in performance challenges. How can you adapt when you face this challenge?

Analysis: One data science application may rely on multiple models each requiring many computations. How to benefit from the best compute engines at the right moment for your data analytics application?

Actionable Insights: Your application is done, the results have been computed and can be stored, but it is not yet reliabily accessible by your users. How can all your users access your application with the best level of quality of service?

Guavus Telco products, Ops-IQ and Service-IQ, allow resource efficiency at scale. Compared to data analytics applications that rely on legacy big data infrastructures, the Guavus reference base architecture used within our products can offer:

  • A smaller hardware footprint (from 10x to 30x)
  • 20% reduction in memory consumption
  • 30% improvement in events processed per second, per core
  • Low latency, sub-millisecond response time
  • Distributed architecture for better utilization of resources deployed

 

Let’s have a look at this Guavus base architecture data analytics application:

Guavus architecture data analytics application

Guavus relies on Kubernetes to orchestrate its workloads, which allows for easy scalability and resource optimization. This modular approach allows parts of the pipeline to be updated easily without the need to disrupt the entire data pipeline.

Ingestion starts at the edge, where our proven SQLstream technology allows collection of data from a huge variety of sources. Kafka allows data to be transported from the edge to the core. This process can be scaled independently to account for growing volumes of data. The data in Kafka can be used as an immutable source of the raw data, which is especially useful in cases of streaming sources.

Core workloads are encapsulated as SQLstream that allows highly optimized enrichment and data modeling on the fly, in real time. Multiple SQLstream pipelines can be orchestrated on Kubernetes infrastructure, reading from the same data sources to cater to many use cases. For ML and AI algorithms that are implemented in Scala, Java, Python, we utilize the ability to run distributed Spark applications on Kubernetes in a highly scalable model.

The architecture egresses enriched data to a variety of sinks using a rich set of connectors, providing the ability to write to destinations like Kafka, Elasticsearch, HDFS, RDBMS, S3, and more.

To allow report generation and to develop rich user interfaces, Presto is used to  distribute queries on a variety of data sinks, allowing rich and responsive user interfaces to be built and to feed user experiences. We can plug in other query engines like Tableau or even utilize a customer’s existing tools to access the enriched data.

This modular architecture allows solution architects to build out just the necessary components and utilize existing infrastructure that may already exist, avoiding costs in duplicating infrastructure.

The Guavus pipelines are easily transposed to the commercial cloud providers utilizing their PaaS offerings.

Kubernetes services such as EKS and AKS offer a very secure and reliable platform to run Guavus pipelines in the cloud, and Managed Kafka and S3 or ADLS provide the backing data stores that provide transformational and long-term storage options in the cloud. The data pipelines themselves are easily moved from a bare metal or VM infrastructure to a Cloud PaaS infrastructure with little modifications to the actual pipeline itself.

What are the benefits of such an architecture for Guavus customers?

As part of evolution of Guavus applications, we have migrated the main application payloads into containers to leverage the advantages of orchestration and scale. Taking advantage of modern architectures, microservices-based applications provide a tremendous benefit to manage the lifecycle of development to deployment and operations. Kubernetes provides a standard mechanism to orchestrate the application components on premises as well as in the public cloud providers like AWS and Azure.

What are some of the best practices to consider?

Some of the important considerations when moving to containers is to build out a comprehensive CI/CD pipeline that will allow developers to ship code out as quickly as possible while ensuring quality. Our CI/CD pipelines use Jenkins and Terraform to create and manage test environments, deploy artifacts, and destroy the environments when complete. This ensures efficient usage of hardware resources and the automation allows developers to not be bothered about constantly having to setup and maintain environments.

How to ensure the security in such a modular framework?

The container is the building block of microservices on Kubernetes and therefore we pay particular attention to the creation and management of containers. Containers are built from standard base images that are maintained in-house and kept up to date with security updates. Tasks that need privileged user permissions are evaluated carefully. The Guavus engineering team evaluates official OTS component images for security vulnerabilities, and we have a variety of tools that allow us to scan and evaluate the risks. In-house containers are required to follow standard practices. Kubernetes best practices are followed when building out the deployment, and only necessary services are allowed access. The Guavus Quality assurance team runs a variety of scans including Nessus and OWASP ZAP to look for open vulnerabilities and address them.

Mastering microservices-based architectures for intensive data analytics applications is a key differentiator to activate your machine learning and AI golden nugget components on a real production environment.

You can count on Guavus products and services to help you to deploy complex data science applications, as well as maintaining and updating them on dedicated architectures.

Posted by Mathilde Remy