This article is adapted from an internal company blog post. Identifiers and costs have been anonymized/changed.
Infrastructure as Code (IaC) has become the backbone of modern cloud operations, but migrating existing cloud configurations to Terraform can be a time-consuming and error-prone process. This analysis examines a real-world project where I used an agentic coding assistant (specifically Roo, but any agent will work) to migrate our entire cloud infrastructure configuration from manual management to a comprehensive Terraform codebase. The backend LLM model was Claude 3.7 Sonnet, accessed via AWS Bedrock.
The project spanned approximately two weeks and involved migrating configurations for multiple domains, creating reusable Terrform modules, and establishing a maintainable infrastructure codebase. This post provides a transparent analysis (as much as I can be transparent) of costs, benefits, and lessons learned from this LLM-assisted development approach.
I was able to write this article because I used the same VSCode window throughout the project and expoerted all of the AI tasks associated with the project into a directory. I then had AI analyze the files in this directory and help contribute to this article (total cost: ~$4). Inception!
2. Project Overview
Scope: Complete migration of cloud infrastructure to Terraform
- Multiple domains (exaple.com, analytics.example.com, app.example.com, etc.)
- Various coud services: DNS, WAF, Page Rules, Rate Limiting, Access Rules, Logging Jobs
- Creation of reusable Terraform modules for each service type
Timeline:
- Duration: Approximately 2 weeks
- Effort: Less than 8 work days
Total Sessions: ~50 AI sessions
Total Cost: Under $50
Work Days Analysis
While the project spanned approxmiately 2 weeks in calendar time, actual development work was concentrated into just 8 work days. This efficient pattern included:
- Week 1: 6 active work days with intensive module creation and initial implementations
- Week 2: 2 targeted follow-up days for final configurations and testing
The peak activity occurred mid-project with one particularly intensive day accounting for approxmiately 30% of all development sessions. This concentrated work pattern demonstrates the efficiency of LLM-assisted development, where focused bursts of activity can accomplish substantial infrastructure migration work.
3. The LLM-Assisted Development Process
Systematic Module Creation
The AI assistant excelled at creating consistent, well-structured Terraform modules by following established patterns. The process typically involved:
- Documentation Analysis: The AI would read Terrform provider documentation to understand resource schemas
- Pattern Recognition: New modules were modled after existing ones, ensuring consistency
- Code Generation: Complete modules with main.tf, variables.tf, outputs.tf, and README.md files
- Import Block Creation: Automated generation of Terraform import statements for eisting resources
Example: WAF Managed Rules Implementation
One of the most complex sessions involved implementing WAF managed rules:
“We are working on a new feature that allows us to manage cloud WAF managed rules in Terraform. This will require a new module to be made, named
cloud_waf_managed_rule
. This new module is to be modeled after thecloud_ratelimit_rule
module since the underlying Terraform resource is similar.”
The AI successfully:
- Created a comprehensive module with extensive documentation
- Generated complete Terraform configuration
- Produced comprehensive output definitions
- Created all necessary import blocks for existing resources
Pattern-Based Development
The most efficient session occurred when the AI could leverage existing patterns. For example, after creating the initial page rules module, subsequent domain imlementations became much faster and cheaper, often requiring minimal additional cost per session.
4. Cost Analysis
Total Investment Breakdown
- Total Cost: Under $50
- Sessions: ~50
- Average per Session: ~$1.00
- Timeline: ~2 weeks
- Daily Average: Under $4
Peak Activity Analysis
Peak Activity Day - Highest activity day
- Cost: ~$15 (approxmiately 30% of total project cost)
- Sessions: ~15 (approxmiately 30% of total sessions)
- Focus: Core module creation and initial domain implementations
Most Expensive Sessions
- Page Rules Implementation - ~$7
- Created comprehensive page rules module
- Generated import blocks for existing resources
- Established pattern for subsequent implementations
- WAF Managed Rules - ~$6
- Most complex module due to ruleset complexity
- Required deep understanding of cloud API structure from both human and AI
- Initial Setup and Rate Limiting - ~$4
- Foundation work and complex rate limiting rules
Cost Comparison to Traditional Development
Estimated Traditional Development Time:
- Senior Infrastructure Engineer: $150/hour
- Estimated time for manual migration: 40-60 hours
- Traditional cost: $6,000 - $9,000
LLM-Assisted Development:
- AI cost: Under $50
- Human oversight time: ~8 hours
- Human cost (at $150/hour): $1,200
- Total cost: ~$1,250
Cost Savings: 80-85% reduction compared to traditional approach
5. Technical Achievements
Code Impact
- 7,000+ insertions - New code and configurations
- 500+ deletions - Cleanup and optimization
- 120+ files created - Complete infrastructure codebase
- 1,000+ import/moved blocks - Migration of existing resources
Module Library Created
cloud_account_access_rule
cloud_cache_rule
cloud_logpush_custom_fields
cloud_page_rule
cloud_ratelimit_rule
cloud_redirect_rule
cloud_logging_job
cloud_transform_request_headers
cloud_transform_response_headers
cloud_user_agent_blocking_rule
cloud_waf_custom_rule
cloud_waf_managed_rule
cloud_zone_access_rule
- DNS record modules (CNAME, MX, TXT)
Infrastructure Coverage
- Multiple domains fully migrated to Terraform
- 600+ rate limiting rules for primary domain
- 400+ custom WAF rules implemented
- Complete DNS management for all domains
- Comprehensive logging configuration with external system integration
6. Lessons Learned
LLM Capabilities
Strengths:
- Pattern Recognition: Excellent at modeling new modules after existing ones
- Documentation Comprehension: Effectively parsed Terraform provider documentation
- Consistency: Maintained coding standards across all modules
- Bulk Operations: Efficiently handled repetitive tasks like creating 1,000+ import blocks
- Error Correction: Quickly identified and fixed syntax or configuration issues
Limitations:
- Context Switching: Required clear instructions when moving between different types of work
- Complex Logic: Needed human guidance for business logic decisions
- API Nuances: Sometimes missed subtle API behaviour differences
- Testing Strategy: Required human oversight for comprehensive testing approaches
Human Oversight Requirements
Critical Human Inputs:
- Strategic Direction: Defining module structure and project organization
- Business Logic: Determining which rules and configurations to migrate
- Quality Assurance: Reviewing generated code for correctness
- Integration Testing: Ensuring modules work together properly
- Documentation Review: Verifying that generated documentation was accurate
Collaboration Pattern: The most effective approach was treating the AI as a highly skilled junior developer who could execute well-defined tasks but needed clear direction and oversight.
Optimal Use Cases
Best Results When:
- Clear patterns existed to follow
- Tasks were well-defined and scoped
- Documentation was available for reference
- Repetitive work eneded to be done quickly
Challenges When:
- Requirements were ambiguous
- Novel approaches were needed
- Complex business logic was involved
- Integration between multiple systems was required
7. Challenges and Issues Encountered
While the project was largely successful, several recurring issues emerged that are woth documenting for future LLM-assisted infrastructure projects:
Configuration Translation Limitations
When translating large cloud configurations into Terraform resources, the AI would frequently stop processing after approximately 20 resources, citing brevity concerns. This required consistent re-prompting to complete entire implementations, adding overhead to what should have been automated tasks.
Web Browser Reading Failures
The AI encountered significant difficulties when attempting to read lengthy web documentation. The browser tool would make multiple attempts, scrolling and reading incrementally, before abandoning the task with loop detection errors. While ignoring these errors and proceeding with implementation didn’t appear to affect subsequent work quality, this limitation required manual documentation review for complex provider resources.
Token Limit Issues with Import Blocks
Large files containing numerous Terraform import
blocks would occastionally cause the AI to enter processing loops, likely due to token limitations in the underlying language model. Early detection of these loops was crucial to prevent unnecessary cost accumulation, requiring active monitoring during bulk import operations.
Import Block Syntax Errors
The AI consistantly generated incorrect import
block definitions, particularly with resource identifiers and target attributes. This appeared to be a training data limitation rather than a contextual understanding issue. Fortunately, manual correction of these blocks was straightforward, but it represented a predictable overhead that needed to be factored into project timelines.
Backend Configuration Misunderstanding
Despirte being explicitly informed about the project’s remote backend configuration, the AI would frequently suggest running terraform apply
commands. This demonstrated a limitation in maintaining context about project-specific constraints and required regular correction to prevent workflow disruptions.
Mitigation Strategies
These challenges led to several effective mitigation approaches:
- Breaking large translation tasks into smaller, manageable chunks
- Providing documentation excerpts directly rather than relying on web browsing
- Monitoring for processing loops and intervening early
- Establishing templates for correct import block syntax
- Consistently reinforcing backend configuration constraints
8. Effective Prompt Engineering Examples
The success of LLM-assisted development heavily depends on the quality and clarity of prompts provided to the AI. Analysis of the project sessions revealed distinct patterns between effective and ineffective prompting strategies. I cannot stress this enough: writing clear, actionable, and goal-oriented prompts will help AI assistants be more effective.
Examples of Clear, Detailed Prompts
1. Provider Upgrade with Step-by-Step Instructions
Please upgrade our cloud provider from v4.0 to v5.4.0 following these steps:
1. First, examine our current provider configuration and check the current version
2. Following the upgrade guide, implement a two-step upgrade:
- First upgrade to v4.52.0 (latest v4)
- Then upgrade to v5.4.0
3. Update the authentication method if needed using API tokens
4. Review all cloud resources in our code to identify any that might need updates
5. Make necessary changes to ensure compatibility with v5.4.0
6. Run terraform plan to test the changes and verify there are no errors
Please only perform the work outlined in these instructions. When complete, provide a concise summary of what changes were made and the results of testing.
2. Module Creation with Specific Requirements
Create a new feature that allows us to manage cloud logging jobs in Terraform. This will require a new module to be made, named `cloud_logging_job`.
There are multiple domains with logging job configurations existing in the cloud provider already. This means you will need to creat an import file and populate it with import blocks.
The documentation for the logging job resource is located at [provider documentation URL]. Please visit this page to understand how the resource should be defined.
When defining the infrastructure in Terraform for these existing resources, you can use the API JSON payloads I have added to the project directory. Do not stop for brevity, I want you to create all of the resources in each of the domains, plus their import blocks.
- domain1.com JSON payload file is called `logging-domain1.json`
- domain2.com JSON payload file is called `logging-domain2.json`
[Additional domain specifications...]
You will also need to ensure that the README.md file is updated in the root directory.
3. Comprehensive Module Specification
Create a new Terraform module for cloud access rules in the `modules/cloud_access_rule` directory. This module should:
1. Define a cloud access rule resource based on the provider documentation
2. Include the following files:
- main.tf (resource definition)
- variables.tf (input variables)
- outputs.tf (output values)
- README.md (module documentation)
3. The module should allow setting common values for all resources
4. Support all necessary parameters for the access rule resource, including:
- zone_id
- mode (challenge, block, etc.)
- configuration (target IP/IP range)
- notes (which should be standardized)
5. Include appropriate outputs for the created resources
When complete, provide a summary of the module you've created.
4. Architecture Design with Context
Based on the analysis of the current cloud Terraform project, please design a modular structure for managing multiple domains in IaC. We need to support the following domains:
- primary.com
- staging.primary.com
- analytics.primary.com
- api.primary.com
[Additional domains...]
Requirements:
1. The structure should be modular and reusable across domains
2. Each domain should have its own configuration file
3. DNS records sould include comments stating "(managed by Terraform)"
4. The structure should support different record types (A, CNAME, MX, TXT)
5. We should be able to specify if records are proxied through the cloud provider
We'll start with implementing subdomain.com which has a single DNS record: subdomain.com CNAME www.primary.com (proxied).
5. Feature Implementation with Technical Context
Implement a new feature in this codebase that will allow us to manage cloud page rules for each domain. This will require you to create a module that can be reused by each domain when creating the page rules.
Make sure you first read the documentation on the cloud page rule Terraform resource.
Once the module has been created you will need to create the definitions for each domain. There is existing data from the cloud provider, so we will also need to import the resources, in a file called `import-pagerules.tf` located at the root of the project.
The existing cloud page rule data is available and I will provide it for each domain when you are ready. Just ask for the specific domain and I will provide it.
6. Specialized Module with Abstraction Requirements
Implement a new feature in this repository to allow Terraform to start managing request and response headers (aka transform rules).
You will need to build a module so that we can abstract many of the implementation details out of the underlying Terraform resource.
The module should be specific to transofmring headers, enve though what it's doing behind the scenes is creating a ruleset resource. This allows for a less leaky abstraction.
The documentation for this resource is here: [provider documentation URL]
An example Terraform code to create a transofrm rule is provided in the documentation.
Key Characteristics of Effective Prompts
Clear Prompts Include:
- Specific technical requirements and constraints
- Step-by-step instructions or clear deliverables
- Reference documentation URLs
- Context about existing infrastructure
- Expected file structures and naming conventions
- Success criteria and completion requirements
- Explicit instructions about scope (“Do not stop for brevity”)
Ineffective Prompts Lack:
- Specific technical details
- Clear scope boundaries
- Context about existing systems
- Success criteria
- Actionable steps
The most successful sessions occurred when prompts rpovided comprehensive context, specific technical requirements, and clear success criteria, enabling the AI to generate precise, complete implementations without requiring multiple claritification rounds.
9. Human Oversight Requirements
Essential Human Responsibilities
1. Architecture Decisions
- Module structure and organization
- Naming conventions and standards
- Integration patterns between modules
2. Business Logic Translation
- Understanding existing cloud configurations
- Determining migration priorities
- Validating rule logic and conditions
3. Quality Control
- Code review for generated modules
- Testing strategy development
- Security configuration validation
4. Project Management
- Task prioritization and sequencing
- Progress tracking and milestone definition
- Risk assessment and mitigation
Collaboration Effectiveness
The project demonstrated that LLMs work best as force multipliers rather than replacements. The human developer provided strategic thinking and oversight while the AI handled the tactical implementation work. This division of labour proved highly effective, with human time focused on high-value activities while routine coding was automated.
10. Cost-Benefit Analysis
Return on Investment
Quantifiable Benefits:
- Time Savings: 30-50+ hours of development time saved
- Cost Reduction: $4,500-$7,500+ savings compared to traditional development
- Consistency: Standardized module structure across all components
- Documentation: Comprehensive README files for all modules
- Maintainability: Well-structured, reusable infrastructure code
Strategic Value:
- Accelerated Migration: 2-week timeline vs. estimated 4-6 weeks traditional approach
- Knowledge Transfer: Documented patterns for future infrastructure work
- Reduced Technical Debt: Clean, maintainable Terraform codebase
- Team Capability: Demonstrated effective LLM integration workflow
Risk Mitigation
Reduced Risks:
- Human Error: Automated generation reduced manual coding mistakes
- Inconsistency: Standardized patterns across all modules
- Documentation Debt: Comprehensive documentation generated automatically
Managed Risks:
- Code Quality: Human review ensured correctness
- Security: Manual validation of security configurations
- Integration: Testing verified module compatibility
11. Conclusion
This project demonstrates that LLMs can significantly accelerate infrastructure development while maintaining high quality standards. The modest investment in AI assistance gerated substantial value through time savings and improved consistancy.
Key Success Factors:
- Clear Task Definition: Well-scoped, specific requests to the LLM
- Pattern Establishment: Creating templates for the LLM to follow
- Human Oversight: Strategic guidance and quality control
- Iterative Approach: Building complexity gradually through multiple sessions
Strategic Implications:
- LLMs excel at tactical implementation when given clear direction
- Human expertise remains essential for architecture and business logic
- Cost savings of 80-85% are achievable for infrastructure automation projects
- The combination of human strategic thinking and LLM execution creates a powerful development approach