Tailoring large language models for specific energy domains | SLB
man in orange overalls looking at computer screen

Tailoring large language models for specific energy domains

Monisha Manoharan Prateek Srivastava Sai Shravani Sistla Advaya Gupta
by  Monisha Manoharan Prateek Srivastava Sai Shravani Sistla and  Advaya Gupta

Large language models (LLMs) excel in diverse tasks due to their broad training. However, their general-purpose nature can hinder their performance in specialized energy domains, where handling sensitive and domain-specific information is crucial. The vast technical jargon and unique language across energy disciplines pose a challenge. To unlock the full potential of LLMs in these specialized areas, targeted adaptation strategies are essential.

9 min read
Global

Selecting the right LLM for your specific energy application requires that you answer a set of key questions. And the flowchart below can help. It presents a customization decision tree developed to assist in making these decisions based on your data and needs. Only then can you combine those results with a better understanding of the different customization techniques to choose the right technique for your use case.

A decision tree for large language model customization.

Let's explore the advantages and drawbacks of three of the key approaches mentioned in the flowchart: prompt engineering, retrieval-augmented generation (RAG), and fine-tuning.

Prompt engineering: Guiding with context and examples

Prompt engineering involves giving the LLM extra information in the input prompt to guide its responses toward a specific area. This involves experimenting with different formats, phrases, and symbols to find the best way to direct the LLM for a meaningful interaction. Prompt engineering is a flexible tool that works well with other LLM fine-tuning techniques. lt can be used on its own or alongside methods like RAG and synthetic data generation to improve model performance during fine-tuning.

Popular techniques

  • Instruction-based prompts—Use clear, direct commands to guide the model (e.g., "Summarize this text").
  • Few-shot learning—Provide a few examples to show the model the desired input-output relationship.
  • Chain-of-thought prompting—Encourage step-by-step reasoning to produce structured, logical responses.
  • Domain context—Include relevant domain-specific information directly in the prompt to ensure the model has a better understanding of the subject (e.g., "In the context of renewable energy, explain the benefits of solar power").
  • Persona prompts—Assign the model a specific role to influence its perspective (e.g., "You are a historian...").

Advantages

  • Data efficiency—The dataset required is significantly smaller than for fine-tuning, which can be manageable for manual creation.
  • Dynamic updates—New information can be easily incorporated by modifying the prompt.
  • Rapid development—No model training is required, leading to faster development cycles.
  • Controlled outputs—Responses are more focused, grounded in the provided context, and follow a consistent format.

Challenges

  • Increased inference time—Longer prompts result in slower processing times.
  • Higher costs—Longer prompts consume more tokens, potentially increasing inference costs..
  • Contextual limitations—Providing comprehensive context, especially from multiple sources, can be difficult.
  • Interface changes—Prompting slightly alters the standard inference interface.

Retrieval-augmented generation: Leveraging external knowledge

While prompt engineering can be effective for adapting LLMs to certain tasks, it may not be the best solution for entire domains. This is because prompt engineering relies on embedding all relevant domain information within a single prompt, which may not be feasible when extensive knowledge is required.

RAG offers a more comprehensive approach by connecting LLMs to external knowledge sources and domain-specific information. This allows LLMs to access relevant context and produce more accurate and domain-specific outputs. RAG leverages a knowledge-intensive strategy, utilizing an external database of domain-specific documents to provide context and enhance LLM responses.

Here's what that process looks like:

  1. Database creation—Domain-specific documents are encoded into embeddings using a language model, after which the embeddings are stored in a vector database.
  2. Query embedding—At inference time, the user's query is also converted into an embedding using the same language model.
  3. Document retrieval—The query embedding is compared against the embeddings in the vector database to retrieve the most relevant documents.
  4. Response generation—The top-ranked documents are provided to the LLM as context, enabling it to generate a more accurate and informative response.

Advantages

  • Automated context—Eliminates the need to manually provide context within the prompt.
  • Comprehensive answers—Access to the entire corpus allows more complete and informative responses.
  • Dynamic updates—The database can be easily updated with new information.
  • Interpretability—Examining retrieved documents provides insights into the LLM's reasoning process.
  • Modular design—Different LLMs can be switched for easy comparison and improvement.

Challenges

  • Cost—RAG systems involve higher computational and storage costs compared with stand-alone LLMs.
  • Latency—The multistep retrieval process increases response time.
  • Indexing time—Building and updating the database can be time-consuming.
  • Relevance challenges—Retrieved documents may not always be relevant or accurate.

Building domain-specific text embeddings

Text embeddings are essential for enabling Al to understand and process language. They convert words and sentences into numerical vectors that capture semantic meaning, facilitating applications such as content creation, translation, conversation, search, and information retrieval. While general-purpose text embeddings are widely available, they may not fully capture the nuances of specialized domains, limiting their effectiveness in specific applications. Domain-specific text embeddings address this by

  • Capturing technical language—Understanding domain-specific terms and concepts is key for accurate interpretation.
  • Improving decision-making—More accurate representations of domain knowledge enhance decision processes.
  • Enhancing knowledge management—Better organization, retrieval, and utilization of specialized knowledge is enabled.
  • Boosting operational efficiency—Accurate understanding of domain-specific information streamlines operations.

These benefits highlight the importance of developing domain-specific text embeddings for various applications, including improving retrieval in RAG systems. Here's the framework we recommend for effectively creating domain-specific embeddings:

  1. Preparing data—Ensure your training data is well-formatted (e.g., plain text, sentence pairs, labeled data) and of high quality.
  2. Selecting architecture—Choose an appropriate architecture based on your data and task. Popular options include Sentence Transformers (SBERT), decoder models like LLM2Vec, or closed-source models from OpenAl and Google.
  3. Selecting the appropriate loss function—The Multiple Negatives Ranking Loss function is commonly used for fine-tuning. This function calculates the similarity between an anchor sentence and positive/negative samples within a batch.
  4. Evaluating and benchmarking
    1. Choose a relevant downstream task.
    2. Create a representative test dataset. This dataset should contain a list of data points that each has the following information relevant to your task: [question, answer, context, label].
    3. Select the appropriate metrics. Metrics can be absolute, or you can use an LLM as a judge for evaluation.

We’ve leveraged this framework to create specialized geoscience embeddings, for example, and the results are promising—indicating that the framework provides valuable insights and improvements.

How did we do it? We generated comprehensive datasets containing question-answer pairs, labels, and relevant context, along with the appropriate test datasets to evaluate our models. We experimented with different training methods, including supervised fine-tuning (SFT) and parameter-efficient fine-tuning (PEFT). We fine-tuned several open-source text embedding models, such as E5-mistral-7b-instruct and LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised, using our curated datasets.

We then evaluated retrieval performance on the test datasets using metrics such as hit rate and mean reciprocal rank (MRR), and we used the RAG with an LLM as a judge (Ragas) framework to conduct a comprehensive analysis of the models' performance.

Fine-tuning: Enhancing LLMs with specialized expertise

Fine-tuning is a more advanced technique for adapting LLMs to specific domains. The real power of fine-tuning lies in its ability to infuse LLMs with your domain knowledge, thereby accelerating the development of new workflows, improving existing processes, and unlocking new opportunities for innovation across domains including reservoir characterization, well planning, and risk assessment. This approach requires a substantial amount of clean, task-specific data to optimize the model’s performance. By leveraging transfer learning, fine-tuning adapts a pretrained LLM to a particular domain without starting from scratch, refining the model’s weights with domain-specific data to embed specialized knowledge.

Popular techniques

  • LLM alignment
    • Supervised fine-tuning (SFT)—Trains LLMs with curated datasets featuring industry-specific terminology and tasks.
    • Reinforcement learning with human feedback (RLHF)—Enhances SFT by refining the model based on feedback from industry experts, aligning its responses with real-world decision making.
    • Advanced techniques
      • Proximal policy optimization (PPO)—Stabilizes learning by preventing drastic behavior changes.
      • Direct preference optimization (DPO)—Optimizes the model based on expert preferences (e.g., prioritizing carbon reduction).
  • Network training
    • Full fine-tuning—Updates all model parameters, demanding significant resources.
    • Parameter-efficient fine-tuning (PEFT)—Fine-tunes a subset of parameters, reducing computational demands and data needs.
    • Prompt-tuning (P-Tuning)—Adjusts prompts for more relevant responses to specific queries.
    • Low-rank adaptation (LoRA)—Introduces low-rank matrices for task-specific adaptation with minimal changes.
  • Synthetic data generation
    • Address data scarcity—Generates additional training data to compensate for limited domain-specific data.
    • Methods
      •  Self-instruct data generation—Creates new training data for tasks like drilling, production, and reservoir engineering.
      • Evolve instruct data generation—Iteratively refines synthetic data based on model performance, improving both data and model.

Advantages

  • Retention of general knowledge—Maintains the broad language skills from the original model.
  • Data efficiency—Requires less data than training from scratch.
  • Speed—Performs faster than training a model from the beginning.

Challenges

  • Complexity and overfitting—Fine-tuning can lead to overfitting and a loss of previously learned information.
  • Dataset size—A large dataset is still necessary for effective fine-tuning.
  • Limited adaptability—Updating the model with new information requires additional rounds of fine-tuning.

Energy domain LLMs

Here's a breakdown of the methodologies currently employed in the energy sector:

  • Continual pretraining—Pretrained foundation models are enhanced by continuously training them with large quantities of unstructured data, thereby updating model layers without adding new parameters. This has been applied to open-source LLMs using masked language modeling.
  • PEFT—Adapter modules are used during training to fine-tune models without altering their original parameters. PEFT methods, such as LoRA and QLoRA, have been employed on Mistral Nemo and other open-source LLMs, adapting datasets accordingly.
  • Instruction fine-tuning—Innovative dataset generation pipelines have been developed to create large-scale, high-quality datasets specifically tailored to the energy sector. These pipelines leverage advanced synthetic data generation techniques and LLMs to produce diverse and comprehensive datasets.
    • Techniques—Self-instruct and evolve instruct iteratively refine prompts and responses, generating high-quality question-answer pairs relevant to the energy industry.
    • Dataset format—[instruction, question, answer, context]. o Fine-tuning techniques—PEFT (LoRA) and knowledge distillation.
    • Model evaluation
      • Industry benchmarks—Benchmarks curated specifically for the energy sector.
      • Performance metrics—BERT score, ROUGE score, and advanced frameworks like RAGAs using LLMs as judges.

We’ve seen that an instruction-tuned model shows a marked improvement over the base model across all metrics, meaning that it learns to understand the specific needs of the domain. This requires developing a new dataset creation process, which generates high-quality, tailored data for the energy sector and presents a valuable advantage for developing models that can address its unique challenges.

So, how do you choose the right approach for your domain?

Choosing the right LLM approach for your domain

Four large language model domain adaptation techniques, including retrieval-augmented generation, prompt engineering, fine-tuning, and hybrid.

While each method offers unique strengths, RAG generally proves more advantageous due to its

  • Flexibility—Adapts to data changes easily.
  • Grounded responses—Leverages external knowledge for more accurate and comprehensive answers.
  • Faster development cycles—Avoids lengthy model training procedures.
  • Interpretability—Provides insights into the reasoning behind the LLM’s responses.

Prompt engineering also offers significant benefits:

  • Low resource requirements—There’s no need for model retraining, thereby making it more cost-effective.
  • Task adaptability—It enables quick adjustments to handle different tasks by changing prompts.
  • Efficiency—Relevant results are delivered with minimal configuration.

However, fine-tuning may be preferable when you have

  • Sufficient data—A large high-quality, static dataset is available for training.
  • Domain-specific language—Fine-tuning excels in handling specialized terminology.

A hybrid approach, combining fine-tuning and RAG, may prove optimal in scenarios with

  • Large, static knowledge base—Fine-tuning based on fundamental domain knowledge enhances overall performance.
  • Dynamic information stream—RAG integrates new information and ensures up-to-date responses.

The future of domain adaption of LLMs

Domain adaptation of LLMs is a crucial step toward achieving truly versatile and reliable artificial intelligence systems. The techniques outlined in this article empower these models to adapt to new and diverse data distributions, unlocking their potential for a wider range of applications. While challenges remain, particularly in terms of resource efficiency and robust generalization, ongoing research in areas like fine-tuning, few-shot learning, and data augmentation holds promising prospects.

The future of LLMs hinges on their ability to seamlessly navigate the complexities of various domains (including ours), enabling them to empower industries, solve real-world problems, and ultimately, contribute to a more intelligent and adaptive world.

Special thanks to Neelansh Garg for his contributions to this article.

Contributors

Monisha Manoharan

More than a decade in data and AI

Monisha leads artificial intelligence (AI) for the Petrel Program, with a current focus on language modeling and generative AI. She specializes in natural language processing, knowledge management, unstructured data processing, and large language models, bridging research with real-world domain challenges.

Prateek Srivastava

Expert in large and vision language models

Prateek is a senior data scientist focusing on adapting large language models and vision language models for generative artificial intelligence applications in the energy industry. He holds Ph.D. and M.S. degrees from the University of Texas in Austin, with research interests spanning natural language processing, computer vision, statistics, optimization, and operations.

Sai Shravani Sistla

At the intersection of physics and data science

Sai is a data scientist in the Intelligent Systems Lab at the SLB Software Technology and Innovation Center (STIC). Since joining in 2021, her primary focus has been developing solutions using language tech, large language models, generative AI, and physics-inspired machine learning.

Advaya Gupta

Specialist in agentic large language models

As a machine learning engineer, Advaya’s work explores the development of multimodal as well as embodied large language model (LLM) agents for the energy domain. He also focuses on the safety of LLM applications through LLM guardrails and domain-specific evaluations. He holds an M.S. in Computer Science with a focus on Artificial Intelligence from Stanford University.

Subscribe