arxiv:2401.00544

A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models

Published on Dec 31, 2023

Authors:

Vansh Sharma ,

Abstract

The integration of Retrieval-Augmented Generation (RAG) framework with large language models (LLMs) enhances scientific data assimilation in combustion science through efficient text processing and improved data accuracy and privacy.

AI-generated summary

This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future deliberations. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2401.00544 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.00544 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.00544 in a Space README.md to link it from this page.