Panini is a platform to serve ML/DL models at low latency.
Just upload your model and we will do the rest.
We take care of scaling with increasing demand. Our backend is written for performance, making it extremely fast for model inference!
We take care of caching and batching for you, providing you with low-latency prediction.
We provide many different deployment options. You can host in our Kubernetes or Deploy Panini in your private server via Helm or Docker.
CIFAR dataset was used for the benchmark test. A simple model to classify CIFAR was uploaded and a large request was made to the server using a single thread.
Note: Optimized version of Tensorflow serving would perform little better but building an optimized TF- serving is not documented properly and it's not easy.
We used SciKit-Learn's implementation of SVM, Log Regression, and Random Forrest. We used the MNIST benchmark dataset to measure the latency between batching via Panini and non-batching. If the request is a large batch, we have noticed throughput, can be improved up to 15x.
Our platform is free for Beta Testers!
We are always trying to make it more cost effective.
Deploy in our Kubernetes
Or Deploy in your private server
The easiest way to get started would be to deploy your model in our server. We use Kubernetes Engine to host your models. We take massive precautionary measures to make sure your models and files are safe and secure. They are encrypted and only you have access to it.
If you have sensitive data and prefer your model and data not to leave your private server, you can deploy panini to your private Kubernetes via Helm or your private server via DockerHub. Contact us to get started.