AI Cost Considerations for Enterprise AI Success
Taking AI to production means factoring in costs for strong profit margins and sustainable development. Enterprises typically create solutions incorporating a generative AI model (LLM) and customize it with fine-tuning or Retrieval-Augmented Generation (RAG) based on domain-specific context. Deploying LLMs on their own infrastructure gives organizations full control over the models and allows for extensive customization and integration with internal data.
This article explores the key factors determining AI cost in LLM application development and covers different AI cost management and optimization strategies.
Summary of AI cost concepts
Concept | Description |
---|---|
Infrastructure costs | Costs associated with computing resources necessary to keep AI models running. |
LLM costs | Performance vs cost metrics. |
Data costs | Costs associated with data source integration pipelines for LLM customization. |
Human resource costs | Human resource requirements for planning, managing, and maintaining the LLM customization data pipelines. |
Budgeting | The process of establishing budget limits for generative AI models. |
Automation | Setting up alert thresholds on cost metrics to manage costs. |
AI cost estimates
Generative AI models run on top of IT infrastructure, such as CPU or GPU processing power, RAM, and storage. Two different ways of managing the infrastructure exist.
- You purchase or lease bare metal infrastructure upfront.
- You use cloud service providers like Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
The infrastructure required for on-prem AI deployment
Powering data engineering automation for AI and ML applications
-
Enhance LLM models like GPT and LaMDA with your own data -
Connect to any vector database like Pinecone -
Build retrieval-augmented generation (RAG) with no code
Let’s review the costs related to running the generative AI model on-premises. In a production solution, you require more than 1 cluster to provide high availability, so the total costs are given for a three-cluster set-up. All costs are mentioned in US dollars.
Component | Single unit cost | Single cluster cost | Total costs |
---|---|---|---|
CPU | $350 – $2,500 | $1,500 – $10,000 | $4,500 – $30,000 |
GPU | $240 – $1,800 | $720 – $5,400 | $2,160 – $22,500 |
Memory | $160 – $240 | $640 – $960 | $1,920 – $2,880 |
Disk Space | OS Disks: $120 – $400
Data Disks: $120 – $1,200 |
N/A | $1,440- $14,400 |
Additionally, indirect costs associated with running the on-premises infrastructure include physical space for infrastructure storage, electricity, security, software licensing, and human resource costs. As technology evolves, you also have to factor in the hardware lifecycle. For cutting-edge performance, you may have to upgrade hardware every five years or less.
If using the cloud, upfront costs are substantially lower. You can use the table below to generate estimated costs. (All prices in USD)
Cloud provider | Foundation model | Input pricing | Output pricing | Units | Input bill estimate (100k requests 100M tokens) |
---|---|---|---|---|---|
Microsoft (Open AI) | gpt-3.5-turbo-0125 | $0.5 | $1.5 | 1M tokens
1M tokens |
$50 |
Microsoft (Open AI) | gpt-4 | $30 | $60 | 1M tokens
1M tokens |
$3000 |
Amazon Web Services | Amazon Titan Text Embeddings V2 | $0.01 | N/A | 1M tokens
1M tokens |
$10 |
Amazon Web Services | Claude 3 Sonnet | 0.003 | 0.015 | 1,000 tokens
1,000 tokens |
$300 |
Amazon Web Services | Meta Llama 3 Instruct (70B) | $0.00265 | $0.0035 | 1,000 tokens
1,000 tokens |
$265 |
Google Cloud Platform | Gemini 1.5 Pro Multimodal (Text) | $0.0025 | $0.0075 | 1,000 characters
1,000 characters |
$250 for 100M characters |
Google Cloud Platform | PaLM 2 for Text
(Text Unicorn) |
$0.0025 | $0.0075 | 1,000 characters
1,000 characters |
$250 for 100M characters |
Last updated May 2024—for the latest pricing, go to:
- https://openai.com/api/pricing,
- https://aws.amazon.com/bedrock/pricing/,
- https://cloud.google.com/vertex-ai/generative-ai/pricing
As you can see, using a cloud provider is an effective way to handle rapid model provisioning while avoiding significant initial costs. Additionally, cloud providers offer support and Service Level Agreements (SLAs) that guarantee high availability at scale.
When the solution becomes profitable and generates tangible revenue, it may be wise to reevaluate your model hosting costs and strategy.
Unlock the Power of Data Integration. Nexla's Interactive Demo. No Email Required!
Key factors that drive costs in generative AI development
Some factors impacting AI costs are given below.
Infrastructure setup
Cloud infrastructure is more cost-effective as you can pay-as-you-go without up-front investments. Some cloud providers allow you to make a larger commitment for long-term savings. For example, AWS EC2 Reserved Instances provide up to 72% discount if you make an upfront 1-3 year commitment. This can provide more reliable cost control and protection against inflation. On-premise expenditure is only worth it if you plan to build your own foundation model and offer it as a service.
Model selection
Most organizations prefer to select an existing foundation model and customize it further with their organization’s data. The foundation model you select also impacts costs. It is important to remember how the model works and how fine-tuning is achieved.
Consider the following matrix.
Model type | Billing type | Infrastructure | Output |
---|---|---|---|
Open source | Token-based | Cloud | Text |
Closed source | Character-based
Image size in pixels (hw) |
On-premise | Visual content |
You can usually select models that have one attribute from each column. Various permutations are possible. In general, closed-source models offer fewer options than open-source models. They also charge more based on column billing type.
Other factors impacting cost related to LLM selection include:
Throughput
The number of tasks executed per time unit (Tasks/Time). This metric is associated with the pricing units related to your LLM.
Latency
Time to respond. The longer the LLM generates a response, the higher its underlying costs. This metric is also related to the input tokens or characters and the generated output tokens/characters.
Data costs
Another key factor for evaluating costs is the data for LLM customization. It can be
- Data that is fed into the model for customization purposes or
- Data that adds context to improve the model response.
Large and sparse data, or frequently changing data, can be more expensive than domain-specific data with well-known volumes. You must also consider training pipeline and Retrieval Augmented Generation (RAG) costs. Both increase the costs of running the vector databases, ETL/ELT, and storage necessary for LLM customization.
Human resources
Human resource requirements for planning, managing, and maintaining the LLM customization data pipelines are expected to be high during initial development but may not reduce significantly over time. Most AI applications require teams to refresh the datasets continually with the latest updated information and input the LLMs as needed. However, you can reduce human costs by automating mundane administrative tasks in LLM customization.
AI cost management
Next, let’s look at cost management strategies at every phase of the AI development process.
Preparation
The preparation phase typically involves model selection and training data preparation through exploratory analysis and data pipeline development. You may factor in cost when choosing a model, but the emphasis in the model selection process is more on performance for your use case.
If your generative AI workflows include RAG, you should consider vector database costs and requests made to the embedding model to set up the vector database. Both costs scale linearly with the overall solution. They can add up quickly if not managed correctly.
Prompt engineering is also an important part of the preparation process. You should include a management module that quickly identifies how prompt engineering impacts model performance, accuracy, and the number of requests to value on the customer side. During prototyping, the impact on costs may seem negligible, but it can be beneficial in production once your application scales.
Implementation
Part of implementation cost management is setting up the necessary Infrastructure as Code (IaC) templates for your model environments. Script-based definitions automate the deployment of infrastructure and services that support the regular operations of your generative AI solutions. You can manage entire environments easily without manual resource configurations. IaC allows you to destroy or pause infrastructure that is not being used and save costs.
Post-implementation
Let’s say you successfully launched your generative AI solution, and it gained popularity. However, your revenue margins are not growing as expected. Post-implementation cost management lets you discover and curb hidden costs eating into your profits.
Firstly, you need to install the right monitoring tools that measure and visualize the appropriate metrics like:
- How AI infrastructure resources are being used
- Cost per unit resource
- Hourly, daily, and monthly utilization
Metrics generate data at scale, and automation becomes essential to gain meaningful insights. You can also trigger automated actions, such as database clean-up when useless data accumulates or auto-scaling functionality if bottlenecks introduce latency.
AI cost optimization strategies
Below are more cost optimization strategies that can keep your AI costs from spiraling out of control.
Budgeting
Budgets can be effectively utilized within infrastructure as code (IaC) templates to manage and control costs from the project’s outset. Organizations can group infrastructure resources by their intended environment and then define budget rules for each environment. For example, you can tag resources with key-pair definitions such as Environment-Development, then ensure that the spending on each resource group stays within its budget limit.
Automation
You can set up cost-related thresholds that trigger environmental automation processes. For example, set up:
- Limits on service utilization, such as the number of requests made to the models or available tokens per month.
- Notifications for when infrastructure costs reach certain milestones—like 25%, 50%, and 75% of the budget.
This way, teams are alerted in advance if spending approaches or exceeds thresholds. This helps prevent going over budget unexpectedly during a specific period. You can also set infrastructure guardrails as part of the budgeting definition to limit infrastructure services creation and unplanned expenses that could harm the project or the business.
Consolidation
As different teams build generative AI solutions, tools accumulate, resulting in overlapping functionality and increased expenditure. Tool management is key to controlling costs. Consolidating data and AI operations allows teams to identify unused tools and reduce the existing ones to those necessary.
Discover the Transformative Impact of Data Integration on GenAI
Conclusion
AI cost is one of the pillars for designing and building enterprise generative AI applications. The cost of underlying infrastructure and services related to LLM deployment, processing of embeddings, vector database(s), and the RAG components that feed the model increases as AI usage grows.
Cost management requires planning and consideration at every stage of the development process. Cost optimization involves setting up a budget and introducing automation to evaluate costs proactively and take action if necessary.
As new LLMs are released, there is a wide margin of cost optimization opportunity. Choose the LLM best for your generative AI pipeline and the solution’s use case.