Technical Leadership | AWS l Azure | GCP MLOps, Platform Engineering, Solutions Architecture
Resuming from my initial article on the 31 of October, stating that a clear understanding of the evaluation pipeline simplifies subsequent stages of deploying LLMs to production.
I planned to cover the remaining steps, but Quantum Black’s comprehensive article has since addressed these topics fairly effectively (If you haven’t read it I recommend giving it a look). Therefore, I’ll focus on designing generative AI solutions, translating the Quantum Black reference architecture into AWS services, and explaining the architectural choices. I welcome discussion to refine this design.
The Key Takeaways from the Article for me were the following points:
- Service-Based LLMs
- Separated Unified Reporting Layer
With my background in deploying various solutions across enterprise environments, certain unique enterprise considerations have become second nature.
In this article, I will highlight several of these that are typically essential for any enterprise-level deployment. These aspects are crucial for teams aiming to integrate Knowledge Assistants (KAs) into most enterprise settings.
To demonstrate this, let’s envision a hypothetical situation involving a fictitious company named CryoDyne, a global pharmaceutical powerhouse, taking a bold step to incorporate AI, including but not limited to Large Language Models (LLMs), into their enterprise strategy, while still bound by various compliance rules including GDPR and HIPAA.
This hypothetical scenario, likely to be echoed globally this year in compliance heavy environments (Finance, Health, Public Sector), in our hypothetical scenario it places specific immutable requirements on all solutions and services developed for CryoDyne, including:
- CryoDyne operates a centralised or at least partly centralised API gateway, which serves various internal teams, each governed by unique Role-Based Access Control (RBAC).
- The integration of the knowledge assistant into this framework is a key initiative for the company, aimed at providing department-specific KAs, each adept in their particular domain language.
- CryoDyne upholds strict compliance standards, leading to the implementation of processes such as separation of duties meaning the company’s reporting structures are intricately segmented, often necessitating coordination with a specialised data team or manoeuvring through several firewalls and accounts for data exchange.
- As a pharmaceutical company, CryoDyne must adhere to stringent regulations such as HIPAA and GDPR in all software implementations. As an AWS client, the company already employs a robust enterprise cloud deployment strategy, incorporating services like AWS Config, AWS Control Tower, AWS Organizations, and Service Catalog. This is complemented by a multi-layered setup of Service Control Policies (SCPs) across their cloud accounts to ensure compliance and security.
- The company is committed to robust change management processes for its production deployments and expects similar automated processes to be in place on the part of their vendors, in line with industry best practices.
- CryoDyne’s enterprise architecture department requires a detailed peer review of any proposed solution designs.
- Security is paramount, with a stipulation that all services must be secure and only accessible to those within the company who have the necessary permissions.
Bearing these critical elements in mind, let’s delve into the “Quantum Black” article and examine how AWS services can be effectively utilised to platform this initiative.
The foundational article outlines a sophisticated data management strategy for modern enterprises, focusing on a Data Lake and Python base data processing libraries like Kedro. This system facilitates parsing, chunking, metadata enrichment, and vectorisation, leading to an organised vector database.
- Separation of Concerns: A standalone account for data management activities reinforces security and adherence to compliance norms.
- Data Access Control: This as with everything security based needs to be layered, typically vector databases interact with the user through http and APIs as such the natural fit was the API Gateway with RBAC.
- Data Staging: We’ve selected Amazon S3 as our dependable staging (landing) platform for its capacity to handle a multitude of data formats efficiently and its “infinite” scalability.
- Data Processing: For data engineering tasks such as parsing, chunking, and vectorisation, we leverage AWS services like Step Functions or potentially AWS Managed Airflow. The article references Kedro, a Python framework by Quantum Black, which might indicate a bias. However, various libraries exist ( LangChain , LlamaIndex ) for chunking and vectorization. These tasks are time-consuming, involving data partitioning and iterative processing for each partition, making Step Functions or Airflow suitable choices.
- Vector Store: Here you have plenty to consider. Amazon a managed service called OpenSearch, however many vector databases like Qdrant and Chroma offer container version that can be deployed with mounted volumes. This might need prototyping work to arrive at the correct final decision.
- Knowledge APIs: Knowledge APIs can be used along with vector DBs to enhance RAG applications so including AWS Neptune could potentially play a pivotal role in managing graph-based query operations.
- Scalability Concerns: In pursuit of handling vast data processing demands, we are investigating solutions that are compatible with S3’s scalable storage capabilities.
- Dynamic Data Access for LLMs: Function calls are designed to facilitate a dynamic and interactive engagement with LLMs within the datastore.
- Data Access Control Module: Central to our architecture is the API Gateway with RBAC, which ensures data layers are only accessible to authorised personnel.
- Vector Caching: There are many scenarios where semantically similar queries are not quantitatively identical, for instance: “How much are the adidas trainers” and “How expensive are those Adidas” semantically require the same response. There is no point making a round trip for the latter if you have the cached response for the former. DynamoDB seem a good noSQL option to implement this.
The Data Layer serves a dual role:
- It consumes embeddings from the LLM Gateway.
- It provides data and retrieval-augmented generation (RAG) capabilities to the Application Layer.
In addition to the immutable constraints, the skills required to maintain its vast amounts of data lie within the companies centralised data team and will lean on already tried and tested ETL/ELT techniques and tools.
The LLM Layer, central to our architecture as detailed in the Quantum Black paper, serves as the primary hub for processing language model requests. It comprises critical elements like the LLM API Gateway for scalable integrations and facilitating LLM Agnosticism, alongside a logging platform for data analytics and enhancement insights.
Embracing ‘LLM gateway’ reflects a readiness to separate LLM APIs from applications, enabling swift replacement of LLMs—a crucial factor given the rapid evolution and diversity of models. This adaptability is vital, especially when integrating with the complex AWS environment.
The LLM Layer leverages a comprehensive suite of AWS services to augment the lifecycle of machine learning models:
- Amazon SageMaker for building, training, and deploying models, offering a pipeline that can and should include evaluation steps resulting in a production ready sagemaker endpoint.
- Amazon DynamoDB, ensuring quick, consistent NoSQL database performance with easy scalability.
- Amazon CloudWatch for monitoring cloud resources and applications on AWS.
- Amazon API Gateway to efficiently create, manage, and secure APIs at scale.
- AWS CodePipeline for automated continuous integration and delivery.
- AWS CodeCommit for secure cloud-based code storage (More likely to be GitHub in most companies)
- AWS CodeBuild for continuous integration tasks like compiling, testing, and packaging software. (GitHub Actions also a likely candidate)
- AWS CodeDeploy for automated code deployments across environments.
- Amazon Elastic Container Registry (ECR) for managing Docker container images.
Data Science Account
In our enhanced model framework, I’ve implemented the ‘AI Factory’, a specialized environment dedicated to refining Large Language Models (LLMs) for domain-specific lexicon, in the case of CryoDyne possibly molecular structure analysis in drug discovery. We utilise AWS SageMaker Studio and SageMaker JumpStart for the deployment of both standard machine learning models (which are often more appropriate for many use cases initially thought to be suitable for LLMs) and HuggingFace open source LLMs. This layer acts as a central hub for LLM development, offering a dedicated testing space for different business sectors. It supports experimentation with both proprietary LLMs from companies like Anthropic and Cohere, as well as open-source models, subject to legal approvals.
The AI Factory is designed to perpetuate a cycle of continuous improvement through an LLM Evaluation Pipeline. This could include A/B testing frameworks leveraging the model variants feature in SageMaker, allowing for comparative analysis and optimisation of different models. This setup ensures a dynamic and adaptable deployment of LLMs throughout various corporate functions, aligning with evolving enterprise requirements.
I have extended the interpretation of this account to be the service account including being an account factory. Cost control room and early warning account.
The reasoning for this lies in immutable requirement 4. All accounts are created using Control Tower landing zone architecture which already has centrally managed logs and seems an ideal place to place cost reporting and FinOps operations
The Reporting Layer, integral for transparency in costs, usage, and data analytics, is implemented using AWS services like CloudWatch and Cost Explorer. This layer is designed to provide comprehensive insights into the KA’s operational dynamics, crucial for both management and continuous improvement, as noted in the foundational article.
The following services were chosen:
- LLM API Gateway: This is a custom-named service in the architecture using Amazon API Gateway, which is used to create, publish, maintain, monitor, and secure reporting related APIs.DynamoDB: A NoSQL database for user metadata
- Cost Explorer: A service that enables you to visualise, understand, and manage your AWS costs and usage over time.
- Alarm: This is referring to Amazon CloudWatch Alarms, leveraging this for an early warning system for specific conditions in your application and sending notifications or taking automated actions.
- Budgets: AWS Budgets gives you the ability to set custom budgets that alert you when your costs or usage exceed (or are forecasted to exceed) your budgeted amount.
- Cost & Usage Report: This refers to AWS Cost and Usage Reports service that delivers detailed reports on your AWS costs and usage.
- AWS Organizations: Enables the logical grouping of accounts with similar policy-based management needs. This feature allows CryoDyne to automate compliance requirements across all its AWS accounts, aligning with their immutable requirements.
- CloudFormation: Infrastructure as code required for consistent infrastructure deployments across the enterprise. (Terraform also a popular choice)
- Config: AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources.
- Control Tower: AWS Control Tower is a service that provides the easiest way to set up and govern a new, secure, multi-account AWS environments for large organisations.
- Service Catalog: AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS.
- EventBridge: Amazon EventBridge is a serverless event bus service that you can use to connect your applications with data from a variety of sources.
- CloudWatch: Amazon CloudWatch as a central monitoring and observability service for AWS cloud resources, dashboards related to KA metric can be collated and rendered.
This represents a typical AWS environment setup for monitoring and managing a multi account structure with AWS governance services, particularly focusing on cost management, resource configuration, and service orchestration.
The Application Layer, where user interactions occur, comprises the frontend, operational stores, configuration stores, and backend. This is where the bulk of software development will be delivered. Ultimately this work will be software based and as such container seem the logical choice leveraging AWS container orchestrations services like EKS, ECS, Fargate, and Lambda, along with a React-based frontend have been selected.
I decided to just include them all as it really depends on many factors including cost, scale, expected ave execution time etc.
- User Interface. The article talks about using react framework. I also always like to include the possibility of a chatbot seeing as chatbots represent the most natural way to converse with emojis providing wealth of evaluation feedback.
- EKS (Elastic Kubernetes Service): This is a managed service that makes it easy to run Kubernetes on AWS without needing to install and operate your own Kubernetes control plane.
- ECS (Elastic Container Service): A highly scalable, high-performance container management service that supports Docker containers and allows you to run applications on a managed cluster of Amazon EC2 instances.
- Fargate: A compute engine for Amazon ECS that allows you to run containers without having to manage servers or clusters.
- Lambda: A compute service that lets you run code without provisioning or managing servers. AWS Lambda executes your code only when needed and scales automatically.
- DynamoDB: A fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale.
- API Gateway: As explained in other layers.
- Application Container Registry: Although not an AWS service by this exact name, it is likely referring to Amazon Elastic Container Registry (ECR), a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images.
- S3 Bucket for Log Storage: This bucket will be the central logging bucket with some of its content being shared to the reporting layer for cost management and the Data Science layer for feedback.
I will be in Copenhagen from Jan 29th to Jan 31st. New York Jan 21st to Jan 28th (Seattle 24th). Orlando Feb 11th to Feb 14th If you are about and you want to talk about all things Cloud, MLOps and LLMs or just talk tech in general. Feel free to connect on LinkedIn, always happy to have conversations online.
Acknowledging Dr. Sokratis Kartakis and Heiko Hotz from AWS (Heiki is soon to be Google DeepMind congrats to him), who pioneered architectures for standardising LLM deployment and operations on AWS. This work draws significant inspiration from their insightful Twitch talk. They have worked on some great starter libraries for LLM pipelines at scale on SageMaker (disclaimer: might not work out of the box).Tags: LLMs, Machine learning, MLops