Data platform under five minutes.

Spin up a data platform from scratch to build and run analytic apps anywhere. Secure, governed and compliant.

We do for data platform, what Shopify did for e-commerce.

The result, empowered data teams and trusted data-driven organisations.

Batteries Included

Frictionless start, no kubernetes or professional services required for installation.

Short feedback loop

Build on laptops and deploy on clusters. Rinse and repeat.

No vendor lock-in

Run in any cloud, fog or edge. Burst from on-prem to the cloud.

Best-of-breed

Opinionated open source data & AI tools, working out of the box. Extend it with your preferred tools.

Air-gapped

Secure-by-default and self-hosted services, including LLMs. (coming soon)

Compliant

IRAP assessed (coming Mid 2025)

Mission

Make data infrastructure simple, secure and accessible to everyone, anywhere.

There is no simple, composable and vendor-neutral data platform that runs anywhere. kubox is our solution to make data infrastructure simple and boring. So you spend less time on tedious configurations and focus instead on building awesome data products.

Chinkit Patel

Founder, kubox.ai

Blazing fast Kubernetes and a composable data and AI stack

An on-demand, repeatable and portable data platform as a service for everyone, from individual engineers to enterprises. It is unbelievably powerful and future-proofed.

Distributed Compute

Use Dask, Spark and federated query engines like Trino for distributed data processing.

Data Pipelines

Data pipelines written in Python with Dagster, tracked in version control and packaged as containers.

Model Serving

ML models deployed and served with Ray Serve - a framework agnostic python library that supports model composition, multiplexing, and fractional GPUs.

Vector Database

Use Qdrant as a self-hosted open-source vector database for next-generation AI-native semantic search and meaningful information extraction from unstructured data.

LLMs & RAG

Build self-hosted Retrieval Augmented Generation (RAG) pipelines to access your organisation's past and current knowledge in form of multi-modal data.

Anything else

If it runs on Kubernetes, it most likely will run in kubox.

Our promise

We want to create delightful developer experiences. If for some reason after using kubox AI for 3 months, you are not 100% satisfied we will give you a full refund. T&C apply.

75%

Developers increase productivity in 2 weeks

30%

Reduction in cloud and software licensing costs

$16M

Savings annually for organisation with more than 200 developers

Pricing plans

Early bird access and discounts are available. Pricing in AUD.

Basic
Starter
$0
for everyone
forever
AWS
Data & AI Tools
Self-hosted LLMs
GPU Support
Community Support
Professional
Basic plus
$100
per developer
per month
Multi-Cloud
Multi-User
CI/CD and GitOps
Fast LLM Inference
Dedicated Support
Enterprise
Professional plus
$500
per developer
per month
Australian IRAP Assessed
On-premises
Management Plane
Air-gapped
Security Hardened

Stay in touch

or have a question?

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thanks! We will be in touch shortly.
Oops! Something went wrong while submitting the form.

FAQs

If you don't find what you're looking for we're keen to connect, clarify and learn.

How are the data and AI tools deployed?

Data and AI tools are deployed using kustomize and FluxCD. Best-of-breed opinionated tools are already integrated, deployed and tested to work out-of-the. However, those opinions are loosely held and developers are free to deploy other tools as they deem fit.

Which GPUs can be used?

Currently with the Basic version you can use AWS instances with NVIDIA L4, G4dn and P3 GPUs.

What is kubox AI not fit for?

Engineering is about trade-offs. kubox AI is optimised for stateless or short-lived data analytics use cases such as exploratory data analysis, data pipeline execution, and ML model training and serving. However, it's not ideal for long-lived distributed databases that require high resilience, availability, or geo-distribution. The platform is designed for data teams with basic software engineering knowledge, rather than for no-code/low-code users.

How and where is data persisted?

While capable of handling diverse stateful workloads and databases, are designed for complete replacement, not ongoing maintenance. This means no patching or updates. So outputs from data pipelines should be persisted in external storage like S3.