Somewhere between your third vendor call and your second technical whitepaper, someone told you the decision was simple. Pick RAG for dynamic knowledge. Pick fine-tuning for consistent behavior. Ship it.
If only it were that clean.
In practice, the teams that get this decision wrong don’t fail because they chose a bad technology. They fail because they chose the right technology for the wrong problem, fine-tuning a product catalog that changes every week, or trying to enforce brand voice through a retrieval prompt that breaks the moment a user goes off-script. Both look reasonable on a whiteboard. Both cost months to undo in production.
This guide exists to close that gap. It’s written specifically for CTOs and founders who are about to make an architecture decision that carries real budget and real timeline consequences and who want a framework grounded in production outcomes, not vendor marketing.
Everything here comes from patterns we’ve observed across 800+ production AI systems at Scalacode, spanning recruitment, legal, e-commerce, fintech, and enterprise SaaS. Where we cite numbers, they’re from our own delivery data. Where we share opinions, they’re earned from watching what breaks at scale and what doesn’t.
By the end of this article, you’ll know whether you have a knowledge problem, a behavior problem, or both, and exactly what to build for each. Let’s get into it.
RAG vs Fine-Tuning at a Glance
Most teams that come to us expecting a clean “pick one” answer leave with a more nuanced roadmap. The question isn’t which approach is better; it’s what problem you’re actually solving. RAG solves a knowledge problem. Fine-tuning solves a behavior problem. If you have both problems (and most production systems do), you likely need both solutions.
| Dimension | RAG | Fine-Tuning | Combined |
| What it solves | Knowledge gaps, model lacks info | Behavior gaps, inconsistent output style/format | Both, knowledge + behavior |
| Build cost | $18K, $45K
Median $28K |
$22K, $55K
Median $35K |
$35K, $80K
Median $55K |
| Deploy timeline | 7 to 10 weeks | 9 to 14 weeks | 12 to 18 weeks |
| Knowledge freshness | Real-time, re-index, no retrain | Stale, requires retraining | Real-time (via RAG layer) |
| Latency impact | +50 to 500ms retrieval overhead | None, no retrieval step | +50 to 500ms (retrieval) |
| Run cost at scale | Higher, frontier API per call | 70 to 90% cheaper at 200K+/mo | Most cost-efficient at high volume |
| Citations / provenance | Native, grounded in docs | Opaque, weights, not sources | Available via RAG layer |
| Data governance | Cleaner, data stays in vector store | Harder, sensitive data in weights | Mixed |
| Training data needed | Documents (any size) | 500 to 5,000 labeled examples | Both |
| When to pick it | Dynamic data, citations needed, fast deploy | Consistent behavior, high volume, stable info | Branded agents, RAFT, routing architecture |
What RAG Actually Does
Retrieval-augmented generation is a runtime pattern. When a user asks a question, the system retrieves relevant documents from a knowledge base, packs them into the model’s context window, and generates a response grounded in that retrieved content. The model doesn’t memorize your data; it reads it at inference time.
This distinction matters enormously. It means your knowledge can be updated by re-indexing a document, no retraining, no downtime, no model rollback. It also means every response can be traced back to a source, which is non-negotiable for compliance-heavy industries like legal, finance, and healthcare.
The 2026 RAG Stack
| Layer | 2026 Standard Tools | Annual Cost (Est.) | Notes |
| Embedding | OpenAI text-embedding-3-large, BGE-M3 | $1,200-$8,000 | BGE-M3 wins for multilingual + code |
| Vector DB | Pinecone, Weaviate, pgvector | $1,800-$12,000 | pgvector at small scale; Pinecone for managed |
| Retrieval | Hybrid dense + sparse (BM25) | Included in DB | Hybrid consistently outperforms dense-only |
| Reranker | Cohere Rerank, BGE-Reranker | $600-$4,000 | Non-optional. 15 to 35% quality lift |
| Generator | GPT-5, Claude Sonnet 4.6, Llama 3.3 70B | $6,000-$48,000 | Self-hosted Llama for cost-conscious teams |
| Observability | LangSmith, Helicone, Arize Phoenix | $800-$6,000 | Non-negotiable for production debugging |
What Fine-Tuning Actually Does
Fine-tuning bakes new behavior into the model’s weights. After training, the fine-tuned model responds differently to new inputs without any retrieval at inference time. Tooling has matured around Hugging Face TRL/PEFT, NVIDIA NeMo, and frontier-model fine-tuning APIs (OpenAI, Anthropic).
The methods matter. LoRA and QLoRA offer parameter-efficient training on consumer hardware. DPO and ORPO are the go-to for behavior alignment, teaching the model to prefer certain response styles over others. Full fine-tuning is reserved for deep domain specialization, where the base model is fundamentally misaligned with your task.
The Decision Matrix: Which Should You Choose?
Run through these five questions before making any architecture decision. If more than one applies, you’re likely a combined-approach candidate.
- Is the information stable or does it change?
Static data (regulatory docs, classical research): both approaches work. Dynamic data (product catalog, customer records, news): RAG only. Fine-tuning on dynamic data creates stale models that require constant, expensive retraining cycles.
- Is the volume large or focused?
Large knowledge base (10K+ docs, 50M+ tokens): RAG is the right tool. Focused and stable knowledge base (<100K tokens): either can work, though fine-tuning alone is often overkill for small, stable corpora.
- Do you need behavior shaping, format consistency, or domain specialization?
If yes, fine-tuning is in the picture. A model that consistently uses your brand voice, follows your decision protocol, or returns a specific JSON schema needs fine-tuning (typically DPO or ORPO). Prompting alone breaks down under adversarial inputs and long conversations.
- What’s your data situation?
RAG needs documents (you almost certainly have them). Fine-tuning needs 500 to 5,000 high-quality labeled examples. Without labeled data, fine-tuning isn’t on the table, and generating synthetic labels requires careful quality control.
- What are your latency and cost constraints?
RAG adds 50 to 500ms, but doesn’t change per-token generation cost much. Fine-tuning has no retrieval overhead but locks you into a model. At 200K+ interactions/month, a fine-tuned smaller model (Llama 3.3 8B) can be 70 to 90% cheaper than frontier API calls alone.
| Your situation | Choose | Why |
| Model lacks current info (product catalog, docs, news) | RAG | Knowledge changes → vector store, not weights |
| Need consistent brand voice across all outputs | Fine-Tuning | Behavior shaping beyond what prompting reliably delivers |
| Compliance requires traceable source citations | RAG | Fine-tuning weights can’t produce citations natively |
| Volume is 200K+ interactions/month | Fine-Tuning | Smaller fine-tuned model dramatically cheaper than frontier API |
| Latency budget is <200ms end-to-end | Fine-Tuning | No retrieval step in the critical path |
| Branded customer-facing agent at scale | Both | Behavior (FT) + knowledge (RAG) are separate problems |
| Model uses retrieved context unreliably | RAFT | Fine-tune the model specifically to use retrieved docs well |
| High volume, mixed routine + edge case queries | Routing | Fine-tuned small model for routine; frontier API for edges |
The Pattern That Wins Most Often: RAG + Fine-Tuning
In production deployments across 2025 to 2026, roughly 60% of projects use both. Three combinations dominate, and knowing which one fits your architecture can save months of wasted iteration.
-
Behavior + Knowledge Split
Fine-tune for behavior (brand voice, decision protocol, output structure). Use RAG to supply the specific information the fine-tuned model needs to act on. Default architecture for branded customer-facing agents.
-
RAFT: Context-Faithful Training
Train the model on (question, retrieved-docs-with-distractors, correct-answer) triples. Drops irrelevant citation rates dramatically. We reduced irrelevant citations from 18% → 4% on a legal research system with no changes to the retrieval pipeline.
-
Routing Architecture
A lightweight classifier routes routine queries to a fine-tuned Llama 3.3 8B model; edge cases go to GPT-5 or Claude Sonnet 4.6. Both paths can use RAG. Typically, a 70 to 90% cost reduction vs. a pure frontier API at high volume.
What Each Approach Actually Costs
These are real project economics from our delivery team, not vendor quotes. We track time and materials across all engagements, and costs below reflect median outcomes, not best-case scenarios.
RAG vs Fine-Tuning vs Combined Approach
| Feature | RAG | Fine-Tuning | RAG + Fine-Tuning |
| Estimated Cost | $18K, $45K | $22K, $55K | $35K, $80K |
| Median Cost | $28K | $35K | $55K |
| Timeline | 7 to 10 Weeks | 9 to 14 Weeks | 12 to 18 Weeks |
Cost Breakdown (Detailed Comparison)
| Component | RAG | Fine-Tuning | RAG + Fine-Tuning |
| Discovery & Architecture | $2.5K, $5.5K | Included in overall setup | Shared infra (1.6 to 1.8x, not 2x) |
| Data Ingestion / Preparation | $4.5K, $10K | $5K, $14K | Combined pipeline (optimized shared cost) |
| Core System / Training | $7.5K, $18K (Retrieval + Generation) | $3K, $8K (Training runs, LoRA/DPO) | Hybrid system (retrieval + tuned model) |
| Evaluation & Testing | $3.5K, $8K | $4K, $10K | Unified evaluation across both layers |
| Deployment & Monitoring | Included in system cost | $3K, $8K | Slightly higher due to dual system |
| Infrastructure Cost | Vector DB + API usage | GPU (H100) training infra | Shared infra (1.6 to 1.8x total) |
| Annual Run Cost | High (API-dependent) | $35K, $45K / H100 node | 30 to 50% cheaper than pure RAG |
| Scaling Efficiency | Moderate | Good after training | Best at scale (70 to 90% cheaper vs API) |
Key Insights
| Aspect | RAG | Fine-Tuning | RAG + Fine-Tuning |
| Cost Efficiency | Lower upfront cost | Moderate | Higher upfront, best long-term |
| Performance | Strong for dynamic data | Strong for behavior control | Best of both worlds |
| Scalability | Good | Moderate | Excellent |
| Run Cost | Higher (API-heavy) | Lower after training | 30 to 90% cheaper at scale |
| Best For | Knowledge-heavy apps | Behavior-specific tasks | Production-grade AI systems |
Real-World Build Stories
Talent Matched: When a “Simple Scoring Problem” Was Actually Two Problems
The Brief
Talent Matched came to us with a bold vision: build an intelligent recruitment platform that could match tech employers with top-tier candidates faster, smarter, and at scale. Their specific ask was an AI-powered scoring engine that could assess candidate fit based on skills, experience, and context. On paper, it sounded like a straightforward retrieval problem: take a job description, retrieve the most relevant candidates, and score and rank them.
We almost agreed with that framing. Then the discovery started.
What Discovery Actually Surfaced
Within the first week of discovery, two completely separate problems emerged, and mistaking one for the other would have built the wrong system entirely.
| Knowledge Problem → RAG | Behavior Problem → Fine-Tuning |
| Every employer on the platform had its own evolving job criteria. Role requirements changed week to week. A “Senior React “Developer” for a fintech startup meant something completely different from the same title at an enterprise SaaS company. The model couldn’t be trained on this it needed to read the current criteria at the moment of scoring. That’s a retrieval problem, not a | Across the platform, different recruiters were scoring the same candidate profiles differently. There was no consistent rubric. The model needed to apply a standardized evaluation framework regardless of which recruiter was using it, what phrasing the job description used, or how unusual the candidate profile was. Prompting alone was too brittle for this it broke under adversarial inputs and drifted in edge cases. |
The Architecture We Built
The RAG layer handled the knowledge problem: vector embeddings for skills-similarity retrieval pulled current job criteria at inference time, so scoring was always grounded in the latest role requirements, not a training snapshot. The fine-tuning layer handled the behavior problem: a calibrated scoring rubric trained via DPO ensured evaluation quality was consistent regardless of recruiter, phrasing, or edge-case candidate profile.
On top of this sat a Whisper-powered voice screening layer that assessed communication clarity, tone, and confidence from recorded responses, a dimension no keyword-matching system could replicate. The whole system ran on a multi-tenant SaaS architecture with isolated data environments per employer, real-time analytics dashboards, and live integrations with LinkedIn, Google Jobs, and third-party job boards.
Why This Couldn’t Have Been Solved With Just One Approach
If we’d built RAG only, the retrieval would have been accurate, but the scoring would have been inconsistent: different employers, different phrasings, and wildly different output quality. If we’d built fine-tuning only, the behavior would have been consistent, but the knowledge would have been frozen at training time.Â
The moment the job criteria changed, the model would have been scoring against stale requirements. The combined architecture was the only path to a system that was both current and consistent simultaneously.
TourReview: When the Real Problem Was Data Chaos, Not Model Capability
The Brief
Tour operators were sitting on a goldmine of customer feedback reviews scattered across TripAdvisor, Google, and Booking.com and had no way to act on it systematically. The ask was straightforward: build a platform that aggregates customer reviews and surfaces actionable insights. But the real challenge wasn’t analysis; it was the chaos of the data itself.
Reviews arrived in different formats, different languages, different rating scales, and at wildly different volumes depending on the season. Any AI layer you built on top of this would only be as good as the consistency of the data flowing into it. This is a problem that a lot of teams solve in the wrong order; they build the model first and wrestle with data quality later. We did it the other way around.
What the Architecture Actually Needed
This project was a clean example of the RAG-behavior split pattern. The knowledge layer needed to be a live, continuously updated feed of reviews from multiple sources, not a static dataset. The behavior layer needed to apply consistent classification, sentiment scoring, and actionability tagging regardless of how the source review was phrased, what language it was written in, or which platform it came from.
| Knowledge Layer → Real-Time Scraping + RAG | Behavior Layer → OpenAI Semantic Analysis Pipeline |
| Python-based real-time scrapers continuously captured review data from TripAdvisor, Google, and Booking.com. This data was normalized and indexed so that the AI layer always had access to the most current customer feedback, not a snapshot from last month. AWS handled hosting; PostgreSQL and MongoDB handled structured and unstructured data, respectively. | OpenAI-powered semantic analysis pipelines applied consistent classification and sentiment scoring across all incoming reviews. The behavior layer ensured that a 3-star review saying “the guide was knowledgeable, but the bus was late” was tagged with the same structured output format as a 5-star review praising the same guide making the dashboard actually comparable across operators and time periods. |
What Made This Work at Production Scale
The dashboard worked because two layers operated together. The knowledge layer ensured real-time freshness, showing sentiment from recent reviews, while the behavior layer standardized classification across platforms and languages, enabling accurate comparisons.
Together, they enabled actionability tagging, highlighting specific service issues like delays or poor communication, not just overall sentiment. This turned the dashboard into a decision-making tool.
Previously, operators relied on multiple platforms and manual tracking, which was time-consuming and incomplete. The platform streamlined everything into one unified, actionable view.
5 Mistakes That Lead to Wrong Decisions
These aren’t hypothetical edge cases. We see all five of these in client discovery calls multiple times a year, and they each cost teams 3 to 6 months of wasted effort.
-
Fine-tuning to teach knowledge
“We need the model to know our product catalog, so we’ll fine-tune on it.” Catalogs change, fine-tunes get stale, and the model isn’t guaranteed to recall specific facts faithfully.
How to fix: Use RAG instead. Re-index when the catalog updates.
-
RAG to enforce behavior
Style examples in the RAG prompt work partway. But prompted behavior breaks under unusual inputs, drifts in long conversations, and fails on adversarial probes.
How to fix:Â DPO or ORPO fine-tuning is the upgrade path.
-
Skipping the eval set
Both approaches need rigorous evaluation. Teams that skip this have no way to know if a “fix” actually fixed anything and end up chasing regressions blind.
How to fix: Build 200 to 500 labeled examples before writing any code.
-
Wrong embedding model for RAG
Multilingual content, code, scientific text, and long documents all have different optimal embeddings. Default English models fail on specialized data.
How to fix: Benchmark ≥2 embedding models on your actual eval set before committing.
-
Ignoring the reranker
Most RAG quality wins in 2025 to 2026 came from better reranking, not better embedding. A cross-encoder reranker often improves quality by 15 to 35% with minimal engineering.
How to fix: Cohere Rerank or BGE-Reranker on top of hybrid retrieval.
What Each Approach CostsÂ
RAG project economicsÂ
- Discovery and architecture (1 to 2 weeks): $2,500-$5,500Â
- Data ingestion pipeline (2 to 3 weeks): $4,500-$10,000Â
- Retrieval and generation system (3 to 4 weeks): $7,500-$18,000Â
- Evaluation and productionization (1 to 2 weeks): $3,500-$8,000Â
- Total typical RAG project: $18K-$45K, median $28KÂ
Fine-tuning project economicsÂ
- Total typical project: $22K-$55K, median $35KÂ
- Annual run cost: $35K-$45K per H100 node, or $0.30-$2.40 per million generated tokens managedÂ
Combined RAG + fine-tuningÂ
- Total typical project: $35K-$80K, median $55KÂ
- Engineering overhead is 1.6 to 1.8x a pure RAG or fine-tuning project (not 2x, there’s shared infrastructure)Â
- At high volume, combined approach can be 30 to 50% cheaper at run cost than pure RAG with frontier modelsÂ
RAG project economicsÂ
- Discovery and architecture (1 to 2 weeks): $2,500-$5,500Â
- Data ingestion pipeline (2 to 3 weeks): $4,500-$10,000Â
- Retrieval and generation system (3 to 4 weeks): $7,500-$18,000Â
- Evaluation and productionization (1 to 2 weeks): $3,500-$8,000Â
- Total typical RAG project: $18K-$45K, median $28KÂ
Fine-tuning project economicsÂ
- Total typical project: $22K-$55K, median $35K
- Annual run cost: $35K-$45K per H100 node, or $0.30-$2.40 per million generated tokens managedÂ
Combined RAG + fine-tuningÂ
- Total typical project: $35K-$80K, median $55KÂ
- Engineering overhead is 1.6 to 1.8x a pure RAG or fine-tuning project (not 2x, there’s shared infrastructure)Â
- At high volume, combined approach can be 30 to 50% cheaper at run cost than pure RAG with frontier modelsÂ
RAG vs Fine-Tuning vs Prompt Engineering, How They CompareÂ
A third option often gets ignored in this debate: prompt engineering with frontier models. Three-way framing:Â
- Prompt engineering ($0 build, immediate), system prompts, few-shot examples, structured output schemas. Solves 60 to 80% of what teams initially scope as fine-tuning. Always try this first; never skip to RAG/FT before exhausting prompting.Â
- RAG ($18K-$45K, 7 to 10 weeks), knowledge layer. The right answer when prompting can’t solve the problem because the model lacks information.Â
- Fine-tuning ($22K-$55K, 9 to 14 weeks), behavior layer. The right answer when prompting + RAG can’t enforce the response format/voice/protocol you need.Â
The most expensive mistake in this space isn’t choosing the wrong tool. It’s choosing the right tool for the wrong problem, fine-tuning a product catalog that changes every week, or trying to enforce brand voice through a RAG prompt that breaks every time a user goes off-script. If you’re at this stage and need a team that’s done this before, our RAG Development Services are built specifically for production-grade retrieval systems, not prototypes.
Also Read: AI Agent Development Cost in 2026
ConclusionÂ
If there’s one thing that 800+ production AI projects have taught us, it’s this: the teams that ship the best systems aren’t the ones who picked the right architecture on day one. They’re the ones who resisted the urge to commit to an architecture before they understood what problem they were actually solving.
RAG and fine-tuning are not rivals. They solve different problems at different layers of the same stack. RAG keeps your model informed. Fine-tuning keeps your model consistent. Most production systems eventually need both, but very few need both on day one. The staged sequence wins more often than any other pattern we’ve seen: exhaust prompt engineering first, add RAG when you hit a knowledge wall, layer in fine-tuning when production data shows you where behavior is breaking down.
The most expensive mistake in this space isn’t choosing the wrong tool. It’s choosing the right tool for the wrong problem, fine-tuning a product catalog that changes every week, or trying to enforce brand voice through a RAG prompt that breaks every time a user goes off-script. Both errors look plausible on a whiteboard and cost months in production.
How Scalacode Approaches This Decision
Our honest track record: 25% of fine-tuning engagements end with “don’t fine-tune yet.”
Across 800+ AI projects, the staged sequence wins more often than any other pattern: ship prompting+RAG fast, learn from production, and layer in fine-tuning when production data justifies it. When we say “don’t fine-tune yet,” we mean it, and we’d rather lose a scope expansion than set a client up for a failed delivery.
Our discovery sprints (2 weeks, $3K-$5K) deliver a written go/no-go-and-how recommendation: pure RAG, pure fine-tuning, or combined. Discovery cost credits against the production build. For pure RAG engagements, our builds typically close at $18K-$45K with a 7 to 10 week timeline.
Get expert guidance to choose the right approach for performance, scalability, and cost, tailored to your needs.
FAQs
1. What is the difference between RAG and fine-tuning?Â
RAG retrieves relevant documents at inference time and lets the model read them when generating a response, gives the model new knowledge. Fine-tuning trains the model on labeled examples to change how it responds, gives the model new behavior. RAG keeps knowledge current and provides citation/provenance; fine-tuning enforces consistent behavior and structured output without retrieval overhead. Most production systems in 2026 use both.Â
2. When should I choose RAG over fine-tuning?Â
Choose RAG when 1. the information the model needs is large, dynamic, or both; 2. you need citations or provenance for compliance/audit reasons; 3. data governance requires source data to stay separable from the model; 4. you don’t have labeled training data; 5. you want fast time-to-deploy. RAG is also typically the right starting point even when you’ll eventually need fine-tuning.Â
3. When should I choose fine-tuning over RAG?Â
Choose fine-tuning when 1. you need consistent behavior, brand voice, or non-standard structured output that prompting can’t reliably enforce; 2. latency budget rules out the retrieval step; 3. volume is high enough that a fine-tuned smaller model is dramatically cheaper than a frontier API; 4. the information you need is stable. Fine-tuning is rarely the right first step.Â
4. Can I use RAG and fine-tuning together?Â
Yes, and most production systems do. Three common patterns: 1. fine-tune for behavior, RAG for knowledge; 2. RAFT, fine-tune on the pattern of using retrieved context well; 3. routing, fine-tune a smaller model for routine paths, route edge cases to a frontier API, both can use RAG. Combined approaches typically cost 60 to 80% more than pure builds but deliver materially better quality and unit economics at scale.Â
5. Is RAG cheaper than fine-tuning?Â
Build costs are comparable: typical RAG project is $18K-$45K, typical fine-tuning project is $22K-$55K. Run costs differ: at low-to-medium volume (under ~100K interactions/month), RAG with a frontier API is usually cheaper. At high volume (200K+/month), a fine-tuned smaller model can be 70 to 90% cheaper per interaction. The combined approach (fine-tuned smaller model with RAG) is most cost-efficient at high volume.Â
6. Does RAG eliminate hallucination?Â
It reduces but doesn’t eliminate. RAG hallucinates in three ways: 1. the model invents content not present in retrieved documents; 2. retrieval returns irrelevant documents and the model confidently uses them; 3. the model misinterprets ambiguous retrieved content. Mitigations: better reranking, explicit prompt instructions to cite sources, output validation against retrieved chunks, and (for severe cases) RAFT fine-tuning to teach the model to handle retrieved context faithfully.Â
7. How long does it take to ship a RAG system?Â
For a typical production RAG system: 7 to 10 weeks from contract to deployment. Discovery and architecture (1 to 2 weeks), data ingestion (2 to 3 weeks), retrieval and generation (3 to 4 weeks), evaluation and productionization (1 to 2 weeks). Simple prototype can ship in 2 to 3 weeks; production-grade systems with proper evaluation, monitoring, and citation handling are the 7 to 10 week range.Â
8. What is RAFT and when do I need it?Â
RAFT (Retrieval-Augmented Fine-Tuning) trains the model specifically on the pattern of using retrieved context, including handling cases where retrieval returns some relevant and some irrelevant documents. Helps when your RAG system is suffering from “retrieval hallucination.” Consider RAFT when (1) you have a working RAG system with measurable retrieval-related failures; (2) you have 1,000+ labeled (query, retrieved-docs, correct-answer) triples; (3) failures are model-behavior issues, not retrieval-quality issues.Â
9. What stack should I use for RAG in 2026?Â
Embedding: OpenAI text-embedding-3-large or open-source BGE-M3. Vector database: Pinecone, Weaviate, or pgvector for typical scale; Vespa or Milvus at very high scale. Retrieval: hybrid (dense + sparse/BM25). Reranker: Cohere Rerank or BGE-Reranker, non-optional. Generator: GPT-5, Claude Sonnet 4.6, or self-hosted Llama 3.3 70B. Observability: LangSmith, Helicone, or Arize Phoenix.Â
10. How do I evaluate a RAG vs fine-tuning recommendation from a vendor?Â
Three filters: 1. What’s the simplest version of this you can ship in 2 weeks, and what would it teach us? (Vendors who skip prototype-and-learn often overscope.); 2. What does your eval set look like, and how does it block bad deployments?; 3. When have you recommended against fine-tuning for a project that initially scoped fine-tuning? (Vendors who haven’t said no don’t have the discipline to call decisions honestly.)Â





