Automated Prompt Optimization with Gemini Flash

Transitioning to Programmatic Prompt Crafting

In this tutorial, we shift from traditional prompt crafting to a more systematic, programmable approach by treating prompts as tunable parameters rather than static text. Instead of guessing which instruction or example works best, we build an optimization loop around Gemini 2.0 Flash that experiments, evaluates, and automatically selects the strongest prompt configuration.

Benefits of Data-Driven Optimization

In this implementation, we watch our model improve step by step, demonstrating how prompt engineering becomes far more powerful when we orchestrate it with data-driven search rather than intuition.

Setting Up Gemini 2.0 Flash

import google.generativeai as genai
import json
import random
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
import numpy as np
from collections import Counter
 
def setup_gemini(api_key: str = None):
   if api_key is None:
       api_key = input("Enter your Gemini API key: ").strip()
   genai.configure(api_key=api_key)
   model = genai.GenerativeModel('gemini-2.0-flash-exp')
   print("✓ Gemini 2.0 Flash configured")
   return model

Creating Your Dataset

We generate a small yet diverse sentiment dataset for training and validation using the create_dataset function:

def create_dataset() -> Tuple[List[Example], List[Example]]:
   train_data = [
       Example("This movie was absolutely fantastic! Best film of the year.", "positive"),
       Example("Terrible experience, waste of time and money.", "negative"),
       # ... more examples ...
   ]
   val_data = [
       Example("Absolutely love it, couldn't be happier!", "positive"),
       Example("Broken on arrival, very upset.", "negative"),
       # ... more examples ...
   ]
   return train_data, val_data

Implementing the Model

We wrap Gemini in the SentimentModel class for evaluation:

class SentimentModel:
   def __init__(self, model, prompt_template: PromptTemplate):
       self.model = model
       self.prompt_template = prompt_template
 
   def predict(self, text: str) -> Prediction:
       prompt = self.prompt_template.format(text)
       # ... logic for prediction ...
   def evaluate(self, dataset: List[Example]) -> float:
       # ... logic for evaluation ...

Optimizing Prompts

We introduce the PromptOptimizer class:

class PromptOptimizer:
   def __init__(self, model):
       self.model = model
       self.instruction_candidates = [
           "Analyze the sentiment of the following text. Classify as positive, negative, or neutral.",
           # ... more instructions ...
       ]
 
   def select_best_examples(self, train_data: List[Example], val_data: List[Example], n_examples: int = 3) -> List[Example]:
       # ... logic to select examples ...
 
   def optimize_instruction(self, examples: List[Example], val_data: List[Example]) -> str:
       # ... logic to optimize instruction ...

Compile Best Practices

To finalize optimization, we compile results into a PromptTemplate:

def compile(self, train_data: List[Example], val_data: List[Example], n_examples: int = 3) -> PromptTemplate:
       best_examples = self.select_best_examples(train_data, val_data, n_examples)
       best_instruction = self.optimize_instruction(best_examples, val_data)
       # ... return optimized template ...

Testing and Results

Finally, evaluate the optimized model against baselines:

print(f"Baseline (zero-shot): {baseline_score:.1f}%")
print(f"Optimized (compiled): {optimized_score:.1f}%")

Conclusion

Through programmatic prompt optimization, we achieve a more robust, evidence-driven workflow for designing high-performing prompts, opening avenues for richer datasets and more extensive task coverage in AI applications.