<RETURN_TO_BASE

Build a Fast Semantic Search and RAG QA System with Together AI, FAISS, and LangChain

This guide shows how to build a semantic search and question-answering system using Together AI embeddings, FAISS vector search, and LangChain, all in a modular and efficient pipeline.

Leveraging Together AI for Semantic Search and QA

This tutorial demonstrates how to transform unstructured web-scraped text into a question-answering system that cites its sources using the Together AI ecosystem. The process involves scraping live web pages, splitting the content into manageable chunks, and embedding these chunks with the togethercomputer/m2-bert-80M-8k-retrieval model.

Preparing the Environment and Dependencies

To run the example smoothly, several libraries must be installed and upgraded, including LangChain core and community packages, the Together AI integration, FAISS for vector search, token handling utilities, and HTML parsers:

!pip -q install --upgrade langchain-core langchain-community langchain-together 
faiss-cpu tiktoken beautifulsoup4 html2text

Managing API Credentials Securely

The tutorial ensures secure handling of the Together AI API key by checking if it's set in the environment and prompting the user to enter it securely if not:

import os, getpass
if "TOGETHER_API_KEY" not in os.environ:
    os.environ["TOGETHER_API_KEY"] = getpass.getpass(" Enter your Together API key: ")

This avoids hardcoding or exposing the API key in the code.

Loading and Processing Web Content

The WebBaseLoader fetches specified URLs, removes boilerplate, and returns clean text documents with metadata:

from langchain_community.document_loaders import WebBaseLoader
URLS = [
    "https://python.langchain.com/docs/integrations/text_embedding/together/",
    "https://api.together.xyz/",
    "https://together.ai/blog"  
]
raw_docs = WebBaseLoader(URLS).load()

The documents are then split into chunks of around 800 characters with some overlap to preserve context:

from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
docs = splitter.split_documents(raw_docs)
print(f"Loaded {len(raw_docs)} pages → {len(docs)} chunks after splitting.")

Creating Embeddings and Building the Vector Store

Using Together AI’s embedding model, each chunk is converted into a vector and stored in a FAISS index for efficient similarity search:

from langchain_together.embeddings import TogetherEmbeddings
embeddings = TogetherEmbeddings(
    model="togethercomputer/m2-bert-80M-8k-retrieval"  
)
from langchain_community.vectorstores import FAISS
vector_store = FAISS.from_documents(docs, embeddings)

Setting Up the Chat Model

The ChatTogether wrapper uses a chat-optimized model from Together AI to generate answers with low randomness for stable responses:

from langchain_together.chat_models import ChatTogether
llm = ChatTogether(
    model="mistralai/Mistral-7B-Instruct-v0.3",        
    temperature=0.2,
    max_tokens=512,
)

Combining Retrieval and Answer Generation

A RetrievalQA chain uses the FAISS retriever to fetch top relevant chunks and feeds them into the chat model. It also returns the source documents to support answer citations:

from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever(search_kwargs={"k": 4}),
    return_source_documents=True,
)

Querying the System

An example question is posed to the QA chain, which returns an answer and the sources used:

import textwrap
QUESTION = "How do I use TogetherEmbeddings inside LangChain, and what model name should I pass?"
result = qa_chain(QUESTION)
 
print("\n Answer:\n", textwrap.fill(result['result'], 100))
print("\n Sources:")
for doc in result['source_documents']:
    print(" •", doc.metadata['source'])

Flexible and Modular Architecture

This pipeline, implemented in about 50 lines of code, can be adapted by swapping components such as the vector store or embedding model. The unified Together AI backend simplifies management and provides fast, affordable embeddings with a generous free tier, ideal for building knowledge assistants, documentation bots, or research aides.

Explore the provided Colab notebook and community channels for more resources and support.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский