With the removal of deployment barriers, OctoML, a platform for Machine Learning (ML) deployment, recently launched a substantial platform improvement that will hasten the development of AI-powered apps. With the most recent updates, IT operations teams and app developers can transform taught machine learning models into flexible, adaptable, and production-ready software components that interact with already-existing application stacks and DevOps processes.
One of the main issues with enterprise software development today is creating trustworthy and efficient AI-powered apps. Furthermore, deploying models is challenging due to interdependencies between the ML training framework, model type, and necessary hardware at each point of the model lifecycle. Users require a solution to remove dependencies, and abstract complexity and provide models as software functions ready for production.
Luis Ceze, CEO of OctoML, said, "AI has the potential to change the world, but it first needs to become sustainable and accessible." He continued, "Today's manual, specialized ML deployment workflows keep application developers, DevOps engineers and IT operations teams on the sidelines. Our new solution enables them to work with models like the rest of their application stack, using their DevOps workflows and tools. We aim to allow customers to transform models into performant, portable functions that can run on any hardware."
Models as functions may run at high performance everywhere, from the cloud to the edge, while retaining stability and consistency even when hardware infrastructure changes. This DevOps-inclusive approach minimizes redundancy by integrating two concurrent deployment streams—one for AI and the other for traditional software. Additionally, it improves the results of prior operations and model generation investments.
With the latest OctoML platform release, customers can continue to use their existing teams and tools. While maintaining cost and performance SLAs, each user's model, development environment, developer tools, CI/CD framework, application stack, and the cloud can take advantage of intelligent functionalities.
These are the primary platform expansion features:
- Machine learning for machine learning skills – Automation locates dependencies and resolves them, cleans and optimizes model code, and accelerates and bundles the model for every hardware target.
- To create accelerated models-as-functions independent of hardware, OctoML CLI interfaces with SaaS capabilities and enables local use of OctoML's feature set.
- A wide-ranging fleet of more than 80 deployment targets, including GPUs, CPUs, and NPUs from NVIDIA, Intel, AMD, ARM, and AWS Graviton, is used for automated hardware compatibility testing performance analysis, and hardware optimizations on real hardware. These targets are deployed at the edge and in the cloud (AWS, Azure, and GCP).
- Real-world performance and compatibility insights can be utilized to precisely inform deployment choices and ensure that SLAs for performance, cost, and user experience are met.
- A comprehensive software library containing the software stacks from chip manufacturers, primary machine learning frameworks, and acceleration tools like Apache TVM.
- NVIDIA Triton Inference Server is the integrated inference serving software given with every model-as-a-function created using the OctoML CLI or OctoML platform.
- Combining NVIDIA Triton and OctoML allows users to more quickly choose, integrate, and deploy Triton-powered inference from any framework on standard data centre servers.
Shankar Chandrasekaran, the Product Marketing Manager of NVIDIA, said, "NVIDIA Triton is the top choice for AI inference and model deployment for workloads of any size, across all major industries worldwide." He continued, "Its portability, versatility and flexibility make it an ideal companion for the OctoML platform."
"NVIDIA Triton enables users to leverage all major deep learning frameworks and acceleration technologies across GPUs and CPUs," said Jared Roesch, CTO, OctoML. "The OctoML workflow extends the user value of Triton-based deployments by seamlessly integrating OctoML acceleration technology, allowing you to get the most out of both the serving and model layers."