Does Fine-Tuning LLMs on New Knowledge Cause Hallucinations?

Bad News For LLM Fine-Tuning?

Jul 08, 2024

Recently the process of fine-tuning large language models (LLMs) has been widely adopted by AI practitioners and researchers as a means to enhance model performance on specific tasks. However, emerging research and critical evaluations suggest that this approach may have significant drawbacks, particularly in the context of acquiring new knowledge and the propensity for hallucination—producing factually incorrect responses.

The Case Against Uninformed Fine-Tuning

A recent paper from Google Research has raised critical questions about the efficacy of fine-tuning LLMs for new knowledge. The study reveals that fine-tuning does not effectively teach models new information. Instead, it tends to increase the likelihood of hallucinations. The research highlights a fundamental issue: fine-tuning can mislead models into generating incorrect information, undermining their reliability.

The traditional belief has been that fine-tuning allows models to learn new facts and adapt to specific applications. This process involves taking a pre-trained model and further training it on a specialized dataset. However, the Google Research paper argues that LLMs primarily acquire knowledge during their initial pre-training phase. Fine-tuning, rather than expanding the model's knowledge base, merely teaches it how to use pre-existing knowledge more efficiently.

Fine-Tuning and Hallucination: An Unintended Consequence

The Google Research paper presents a nuanced understanding of how fine-tuning impacts LLMs. The findings indicate that when models are exposed to new information during fine-tuning, they struggle to integrate this knowledge efficiently. The new facts are learned significantly slower compared to the information already embedded in the model's pre-trained corpus. This discrepancy leads to an increase in hallucinations—a direct correlation highlighted by the study.

The researchers used a controlled setup focused on closed-book Q&A tasks to demonstrate this phenomenon. By varying the proportion of fine-tuning examples that introduced new knowledge, they observed that the model's tendency to hallucinate increased linearly with the amount of new information. This suggests that fine-tuning on new knowledge not only fails to enhance the model's factual accuracy but also degrades its overall performance by fostering incorrect responses.

Alternative Approaches: Retrieval-Augmented Generation (RAG)

Given these findings, the paper suggests alternative strategies for integrating new knowledge into LLMs. One promising approach is Retrieval-Augmented Generation (RAG), which involves using external databases to provide context and information that the model can use to generate accurate responses. This method leverages the model's ability to access and retrieve relevant information on-the-fly, rather than relying solely on pre-encoded knowledge.

Mitigating Hallucinations: Early Stopping and Data Filtering

The research also explores techniques to mitigate the adverse effects of fine-tuning. One such technique is early stopping, a common practice in machine learning where training is halted once the model's performance on a validation set ceases to improve. The study found that early stopping can significantly reduce the risk of hallucinations by preventing the model from overfitting to the fine-tuning data.

Additionally, the researchers introduced a novel methodology called SLICK (Sampling-based Categorization of Knowledge). This framework categorizes knowledge into four types: completely unknown, weakly known, maybe known, and highly known. By using this categorization, they demonstrated that fine-tuning with "maybe known" examples—information the model is somewhat familiar with but not entirely accurate about—plays a crucial role in improving the model's ability to use pre-existing knowledge effectively.

Conclusion: A Paradigm Shift in Fine-Tuning Practices

The insights from this research prompt a reconsideration of current fine-tuning practices. For practitioners, the key takeaway is that fine-tuning should not be used indiscriminately, especially when attempting to teach models entirely new information. Instead, leveraging retrieval-based methods and incorporating early stopping techniques can enhance model performance without increasing the risk of hallucinations.

Ultimately, the study underscores the importance of understanding the limitations and potential pitfalls of fine-tuning. By adopting more informed and nuanced approaches, the AI community can develop more reliable and accurate language models that better serve their intended applications.

You can download and read the paper here - https://arxiv.org/pdf/2405.05904v2

Jailbreak ChatGPT: Prompt Engineering Masterclass book course is now available! You can buy it on Amazon here - https://www.amazon.com/dp/B0D12XNF3G

Jailbreak AI - Prompt Engineering Masterclass

Discussion about this post