<RETURN_TO_BASE

How to Remove Semantic Duplicates from Customer Reviews Using Mirascope and LLMs

Discover how to leverage Mirascope and OpenAI's GPT-4o model to identify and remove semantically duplicate customer reviews, enhancing feedback clarity.

Introduction to Mirascope and Its Capabilities

Mirascope is a versatile and user-friendly library designed to provide a unified interface for various Large Language Model (LLM) providers such as OpenAI, Anthropic, Mistral, Google (Gemini and Vertex AI), Groq, Cohere, LiteLLM, Azure AI, and Amazon Bedrock. It simplifies tasks ranging from text generation and structured data extraction to building complex AI workflows and agent systems.

Using Mirascope to Remove Semantic Duplicates

This guide focuses on leveraging Mirascope's OpenAI integration to detect and remove semantic duplicates from a list of customer reviews. Semantic duplicates are entries that differ in wording but convey the same meaning.

Installing Dependencies

To begin, install Mirascope with OpenAI support:

pip install "mirascope[openai]"

Setting Up OpenAI API Key

Obtain your OpenAI API key from https://platform.openai.com/settings/organization/api-keys. New users might need to add billing information and make a minimum payment of $5 to activate API access.

Set the API key in your environment:

import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

Defining Customer Reviews

Here is a sample list of customer reviews capturing various sentiments:

customer_reviews = [
    "Sound quality is amazing!",
    "Audio is crystal clear and very immersive.",
    "Incredible sound, especially the bass response.",
    "Battery doesn't last as advertised.",
    "Needs charging too often.",
    "Battery drains quickly -- not ideal for travel.",
    "Setup was super easy and straightforward.",
    "Very user-friendly, even for my parents.",
    "Simple interface and smooth experience.",
    "Feels cheap and plasticky.",
    "Build quality could be better.",
    "Broke within the first week of use.",
    "People say they can't hear me during calls.",
    "Mic quality is terrible on Zoom meetings.",
    "Great product for the price!"
]

These reviews highlight key themes such as sound quality, battery life issues, ease of use, build quality, microphone problems, and value for money.

Creating a Pydantic Schema for Deduplication

A Pydantic model is used to define the expected structure of the semantic deduplication response, which includes grouped duplicates and a deduplicated list of reviews:

from pydantic import BaseModel, Field
 
class DeduplicatedReviews(BaseModel):
    duplicates: list[list[str]] = Field(
        ..., description="A list of semantically equivalent customer review groups"
    )
    reviews: list[str] = Field(
        ..., description="The deduplicated list of core customer feedback themes"
    )

Defining the Semantic Deduplication Function

Using Mirascope's @openai.call decorator with the GPT-4o model, the following function clusters semantically similar reviews:

from mirascope.core import openai, prompt_template
 
@openai.call(model="gpt-4o", response_model=DeduplicatedReviews)
@prompt_template(
    """
    SYSTEM:
    You are an AI assistant helping to analyze customer reviews. 
    Your task is to group semantically similar reviews together -- even if they are worded differently.
 
    - Use your understanding of meaning, tone, and implication to group duplicates.
    - Return two lists:
      1. A deduplicated list of the key distinct review sentiments.
      2. A list of grouped duplicates that share the same underlying feedback.
 
    USER:
    {reviews}
    """
)
def deduplicate_customer_reviews(reviews: list[str]): ...

Executing the Deduplication and Displaying Results

The function is called with the customer reviews list, and the output is validated and printed:

response = deduplicate_customer_reviews(customer_reviews)
 
# Ensure response format
assert isinstance(response, DeduplicatedReviews)
 
# Print Output
print(" Distinct Customer Feedback:")
for item in response.reviews:
    print("-", item)
 
print("\nGrouped Duplicates:")
for group in response.duplicates:
    print("-", group)

The output provides a concise summary by grouping similar reviews, making it easier to analyze customer feedback by eliminating redundant entries.

Mirascope’s semantic deduplication feature is a valuable tool for businesses seeking to streamline feedback analysis and gain clearer insights from customer reviews.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский