Doc Ingestion CLI for AKS KAITO RAG Engine

On February 15, 2026February 22, 2026 By Roy Kim (MVP)In AI, Azure, KubernetesLeave a comment

I built a ragengine-ingest-docs CLI tool simplifies managing documents in the AKS KAITO Retrieval Augmented Generation Engine. It supports creating, updating, listing, and querying documents, enhancing demos and workshops by clearly demonstrating document indexing and retrieval behaviors.

My Journey Learning To Build AI Apps on Azure (March 2025 to Feb 2026)

On February 10, 2026 By Roy Kim (MVP)In AI, Azure, KubernetesLeave a comment

I started by learning how RAG works end-to-end - indexing documents, vectorizing with embeddings, retrieving with hybrid search, and grounding LLM responses. Once I understood the mechanics, I leveled up to Semantic Kernel to introduce agent abstractions and plugin-based extensibility. From there, I explored Azure AI Foundry's hosted agents and prompt engineering patterns. Finally, I built a production multi-agent platform on AKS using the Microsoft Agent Framework SDK, routing five agents across three distinct backends — cloud APIs, on-cluster GPU inference via KAITO, and server-side RAG via KAITO RAGEngine. Each project was a building block toward understanding how enterprise AI applications are designed, orchestrated, and deployed at scale on AKS.

Installing KAITO RAG Engine on Azure Kubernetes Service

On January 11, 2026January 22, 2026 By Roy Kim (MVP)In AI, Azure, Kubernetes1 Comment

The post outlines the installation process for the RAG Engine, detailing prerequisites like AKS and GPU provisioning. It guides users through creating an AKS cluster, installing the KAITO Workspace for model inference, and configuring the RAG Engine with a specific model. Finally, it demonstrates indexing documents and querying with RAG Engine.

Intro to KAITO RAG Engine on Azure Kubernetes Service

On January 4, 2026January 17, 2026 By Roy Kim (MVP)In AI, Azure, KubernetesLeave a comment

The Kubernetes AI Toolchaining Operator (AKS) features a RAG engine that enables users to interact with private documents using a hosted language model, like Phi-4. This tool allows for grounded AI responses by indexing and retrieving relevant data. This is an AI platform offering management control and scalability supporting many Gen AI applications.

Using Streamlit Chatbot UI with AKS KAITO Language Model Inferences

On December 22, 2025January 9, 2026 By Roy Kim (MVP)In AI, Azure, KubernetesLeave a comment

This blog post discusses setting up a chatbot UI using Streamlit alongside a deployed language model inference service in Azure Kubernetes. It details the process of testing the inference service with curl commands, implementing a Streamlit app, and configuring ingress rules for external access, highlighting Streamlit's user-friendly capabilities for chatbot development.

Install KAITO v0.8.x on Azure Kubernetes Service With Phi-4 Language Model

On December 21, 2025December 31, 2025 By Roy Kim (MVP)In AzureLeave a comment

The Kubernetes AI Toolchain Operator (KAITO) automates AI/ML model processes in Kubernetes, enhancing control and security. This setup involves creating an AKS cluster, installing KAITO, and configuring an Nginx ingress for public model access, showcasing actionable AI solutions.

Summary on AKS KAITO Preset Language Models and GPUs

On July 28, 2025February 16, 2026 By Roy Kim (MVP)In AI, Azure, Azure IaaS, KubernetesLeave a comment

The Azure Kubernetes Service AI toolchain operator facilitates language model deployment by automating GPU provisioning and inference setup. Different models offer unique capabilities for various applications, with detailed cost and configuration information based on virtual machine types for optimal usage and testing insights.

Running Open-Weight LLMs on AKS with KAITO: A Summary of Model Families

On July 13, 2025February 14, 2026 By Roy Kim (MVP)In AI, Azure, KubernetesLeave a comment

KAITO is an AI toolchain operator designed for deploying language models in Kubernetes. It features various model families, including DeepSeek for advanced reasoning, Falcon for custom fine-tuning, Llama for general assistance, Mistral for efficiency, Phi for cost-sensitive tasks, and Qwen for programming. Open-weight models ensure privacy and customization options, making them suitable for enterprise workloads while allowing fine-tuning and governance.

Deep Dive Into Fine-Tuning An LM Using KAITO on AKS – Part 3: Deploying the FT Model

On January 12, 2025January 14, 2025 By Roy Kim (MVP)In AI, Azure, Kubernetes4 Comments

Now that I have fine-tuned a model in Part 2, next is to deploy the fine tuned model into a new Kaito workspace. This blog post is part of a series.Part 1: Intro and overview of the KAITO fine-tuning workspace yamlPart 2: Executing the Training Kubernetes Training JobPart 3: Deploying the Fine-Tuned ModelPart 4: Evaluating …

Continue reading Deep Dive Into Fine-Tuning An LM Using KAITO on AKS – Part 3: Deploying the FT Model

Deep Dive Into Fine-Tuning An LM Using KAITO on AKS – Part 2: Execution

On January 11, 2025January 14, 2025 By Roy Kim (MVP)In AI, Azure, Kubernetes3 Comments

I will continue from the Part 1 to execute the deployment of the fine-tuning workspace job. This blog post is part of a series.Part 1: Intro and overview of the KAITO fine-tuning workspace yamlPart 2: Executing the Training Kubernetes Training JobPart 3: Deploying the Fine-Tuned ModelPart 4: Evaluating the Fine-Tuned Model Let' start the fine …

Continue reading Deep Dive Into Fine-Tuning An LM Using KAITO on AKS – Part 2: Execution

Roy Kim on Azure and AI

Tag: Kubernetes