
Understanding Graded Relevance in RAG Pipelines
In the final installment of our RAG pipeline evaluation series, we explore graded relevance metrics that go beyond simple binary classification. Unlike binary metrics that categorize results as merely relevant or irrelevant, graded metrics recognize that relevance exists on a spectrum. This nuanced approach provides a more sophisticated framework for evaluating retrieval quality in modern AI systems.
DCG@k: Measuring Graded Relevance with Ranking Penalties
Discounted Cumulative Gain (DCG@k) represents a significant advancement in retrieval evaluation by combining graded relevance with ranking awareness. This metric quantifies how useful retrieved results are while accounting for their position in the ranking.
The DCG@k Formula Explained
The DCG@k formula incorporates both relevance scoring and ranking penalties: DCG@k = Σ(rel_i / log₂(i + 1)). Here, rel_i represents the graded relevance score of result i, while the logarithmic denominator penalizes items appearing at lower ranks. This mathematical structure emphasizes that highly relevant results appearing at the top of retrieval lists contribute more significantly to the overall score.
Practical DCG@k Calculation Example
Consider a retrieval scenario where results have graded relevance scores of [3, 2, 3, 0, 1]. The DCG@5 calculation would sum each relevance score divided by the log of its position, resulting in a comprehensive evaluation that rewards both high relevance and optimal ranking.
Normalizing Performance with NDCG@k
While DCG@k provides valuable insights, it presents a fundamental limitation: scores naturally increase with larger k values, making comparisons across different retrieval set sizes challenging. Normalized Discounted Cumulative Gain (NDCG@k) solves this problem by introducing normalization against an ideal scenario.
Introducing IDCG@k: The Perfect Benchmark
Ideal Discounted Cumulative Gain (IDCG@k) represents the maximum possible DCG@k score for perfectly ranked results. By calculating DCG@k for results sorted in descending relevance order, IDCG@k establishes the gold standard against which actual performance can be measured.
NDCG@k: The Complete Solution
NDCG@k = DCG@k / IDCG@k provides a normalized score ranging from 0 to 1, where 1 indicates perfect retrieval and ranking. This normalization enables meaningful comparisons across different k values and retrieval configurations, making it an indispensable tool for RAG pipeline optimization.
Python Implementation for Practical Application
The implementation involves three key functions: dcg_at_k calculates the discounted cumulative gain, idcg_at_k computes the ideal scenario by sorting relevance scores, and ndcg_at_k normalizes the results. This Python framework allows data scientists to seamlessly integrate graded relevance evaluation into their RAG development workflow.
Comprehensive RAG Evaluation Framework
This concludes our three-part series on RAG pipeline evaluation metrics. From binary order-unaware measures like Precision@K and Recall@K, through order-aware metrics like MRR and Average Precision, to the sophisticated graded relevance metrics DCG@k and NDCG@k, we’ve built a comprehensive toolkit for quantifying retrieval performance. These metrics form the foundation for building effective RAG systems that deliver accurate, well-ranked results grounded in relevant source documents.



