10xLab Accelerates Enterprise RAG Deployment with Inferdat on AWS
10xLab is an AI engineering firm that designs and implements production-grade RAG architectures for enterprise clients, with a core philosophy that each model in a pipeline should be selected for its specific role rather than general capability. 10xLab's flagship product, KnowledgeVault AI, is a hybrid multi-source RAG system combining private document retrieval with public real-time data feeds. The company serves clients across a spectrum of deployment requirements, from organizations with strict on-premise data sovereignty mandates to those who want the elasticity and managed services of the cloud.
The Opportunity
10xLab's fully on-premise RAG architecture, while well-suited to clients with strict data sovereignty mandates, required significant local GPU infrastructure to run its model pipeline, including a 70B parameter reasoning model. This created a ceiling: prospects who wanted to move faster or avoid managing GPU infrastructure had no path into 10xLab's product. 10xLab needed the same knowledge assistant to also run natively on AWS, without forking the codebase or building a second product.
“We kept losing deals where the prospect loved the product but didn't want to manage GPU infrastructure. Inferdat built us an AWS path that uses Bedrock for inference without forking the codebase. Same product, two deployment options. That opened up a whole segment we couldn't reach before.”
Our Approach
Inferdat kept RAGflow as the constant document intelligence layer across both deployment modes, handling parsing, chunking, and retrieval orchestration identically regardless of where the system runs. For the AWS deployment, Inferdat replaced the local Ollama-served model layer with Amazon Bedrock, providing managed foundation models for query routing, retrieval, reranking, and deep inference. The same hybrid public/private retrieval pattern was preserved: a PII classifier and query router determine what context is safe to send to public sources, private retrieval runs against Aurora PostgreSQL within the client's AWS account, and Bedrock handles inference on the fused context. Inferdat built the customer-facing knowledge assistant UI/UX once, with a deployment-agnostic experience that works identically whether the backend is on-premise or AWS.
Architecture
The key architectural decision was isolating the model-serving layer as the only component that changes between deployment modes. RAGflow's document parsing and chunking, the PII boundary classifier, the hybrid public/private retrieval pattern, and the knowledge assistant UI are identical across both paths. In the on-premise configuration, Ollama serves open-weight models (Mistral 7B, Qwen 2.5-32B, DeepSeek-R1 70B) on local hardware. In the AWS configuration, Amazon Bedrock serves the equivalent pipeline roles, with on-demand pricing eliminating the idle GPU cost the on-premise path requires. This gives 10xLab a consistent answer to infrastructure questions during sales conversations, with the choice coming down to the client's governance posture and budget rather than feature availability.
The Outcome
10xLab now offers a single knowledge assistant product with two deployment paths, widening their addressable market to include clients who don't want to manage GPU infrastructure. The AWS deployment delivers query routing and retrieval in under two seconds, with Bedrock handling deep reasoning across 128K context windows for technical document analysis. Because RAGflow remains the constant ingestion layer, indexed client document sets are portable between deployment modes if a client's infrastructure requirements change.
AWS Services Used
Ready to write your success story?
Let's discuss how we can help you achieve similar results.