Metrics explained:
- faithfulness: How factually correct the answer is with respect to the retrieved context.
- answer_relevancy: How relevant the answer is to the user's question.
- context_precision: How much of the retrieved context is actually useful for answering the question.
- context_recall: How much of the necessary context for answering the question was actually retrieved.
- bertscore_f1: Semantic similarity between the generated answer and the reference answer.
Conclusion: Hybrid retrieval is best overall, with the highest faithfulness, context precision, and context recall.