Агентная ИИ с памятью: создание непрерывно обучающихся агентов с эпизодической и семантической памятью

Зачем нужна память

Память позволяет агенту связывать прошлые взаимодействия с будущими решениями. Вместо того чтобы рассматривать каждый пользовательский шаг как изолированный, агент с памятью сохраняет эпизоды (конкретные взаимодействия) и выделяет семантические паттерны (устойчивые предпочтения и стратегии действий). Это даёт систему, которая планирует, выполняет, пересматривает и рефлексирует между сессиями, делая ответы персонализированнее и автономнее со временем.

Проектирование эпизодической памяти

Эпизодическая память фиксирует конкретные ходы: состояние, действие, результат и метку времени. Ниже реализация, которая сохраняет эпизоды, строит простое встраивание и извлекает похожие прошлые опыты.

import numpy as np
from collections import defaultdict
import json
from datetime import datetime
import pickle
 
 
class EpisodicMemory:
   def __init__(self, capacity=100):
       self.capacity = capacity
       self.episodes = []
      
   def store(self, state, action, outcome, timestamp=None):
       if timestamp is None:
           timestamp = datetime.now().isoformat()
       episode = {
           'state': state,
           'action': action,
           'outcome': outcome,
           'timestamp': timestamp,
           'embedding': self._embed(state, action, outcome)
       }
       self.episodes.append(episode)
       if len(self.episodes) > self.capacity:
           self.episodes.pop(0)
  
   def _embed(self, state, action, outcome):
       text = f"{state} {action} {outcome}".lower()
       return hash(text) % 10000
  
   def retrieve_similar(self, query_state, k=3):
       if not self.episodes:
           return []
       query_emb = self._embed(query_state, "", "")
       scores = [(abs(ep['embedding'] - query_emb), ep) for ep in self.episodes]
       scores.sort(key=lambda x: x[0])
       return [ep for _, ep in scores[:k]]
  
   def get_recent(self, n=5):
       return self.episodes[-n:]

Проектирование семантической памяти

Семантическая память суммирует шаблоны и предпочтения по эпизодам. Она отслеживает веса предпочтений, контекст-действие паттерны и базовые показатели успеха, чтобы агент мог выбирать действия, которые исторически срабатывали.

class SemanticMemory:
   def __init__(self):
       self.preferences = defaultdict(float)
       self.patterns = defaultdict(list)
       self.success_rates = defaultdict(lambda: {'success': 0, 'total': 0})
      
   def update_preference(self, key, value, weight=1.0):
       self.preferences[key] = 0.9 * self.preferences[key] + 0.1 * weight * value
  
   def record_pattern(self, context, action, success):
       pattern_key = f"{context}_{action}"
       self.patterns[context].append((action, success))
       self.success_rates[pattern_key]['total'] += 1
       if success:
           self.success_rates[pattern_key]['success'] += 1
  
   def get_best_action(self, context):
       if context not in self.patterns:
           return None
       action_scores = defaultdict(lambda: {'success': 0, 'total': 0})
       for action, success in self.patterns[context]:
           action_scores[action]['total'] += 1
           if success:
               action_scores[action]['success'] += 1
       best_action = max(action_scores.items(), key=lambda x: x[1]['success'] / max(x[1]['total'], 1))
       return best_action[0] if best_action[1]['total'] > 0 else None
  
   def get_preference(self, key):
       return self.preferences.get(key, 0.0)

Восприятие и планирование

Агент с памятью должен распознавать намерение пользователя, обращаться к эпизодической памяти для контекста и использовать семантику для формирования плана. Ниже MemoryAgent объединяет восприятие, планирование и доступ к памяти.

class MemoryAgent:
   def __init__(self):
       self.episodic_memory = EpisodicMemory(capacity=50)
       self.semantic_memory = SemanticMemory()
       self.current_plan = []
       self.session_count = 0
      
   def perceive(self, user_input):
       user_input = user_input.lower()
       if any(word in user_input for word in ['recommend', 'suggest', 'what should']):
           intent = 'recommendation'
       elif any(word in user_input for word in ['remember', 'prefer', 'like', 'favorite']):
           intent = 'preference_update'
       elif any(word in user_input for word in ['do', 'complete', 'finish', 'task']):
           intent = 'task_execution'
       else:
           intent = 'conversation'
       return {'intent': intent, 'raw': user_input}
  
   def plan(self, state):
       intent = state['intent']
       user_input = state['raw']
       similar_episodes = self.episodic_memory.retrieve_similar(user_input, k=3)
       plan = []
       if intent == 'recommendation':
           genre_prefs = {k: v for k, v in self.semantic_memory.preferences.items() if 'genre_' in k}
           if genre_prefs:
               best_genre = max(genre_prefs.items(), key=lambda x: x[1])[0]
               plan.append(('recommend', best_genre.replace('genre_', '')))
           else:
               plan.append(('recommend', 'general'))
       elif intent == 'preference_update':
           genres = ['sci-fi', 'fantasy', 'mystery', 'romance', 'thriller']
           detected_genre = next((g for g in genres if g in user_input), None)
           if detected_genre:
               plan.append(('update_preference', detected_genre))
       elif intent == 'task_execution':
           best_action = self.semantic_memory.get_best_action('task')
           if best_action:
               plan.append(('execute', best_action))
           else:
               plan.append(('execute', 'default'))
       self.current_plan = plan
       return plan

Выполнение, пересмотр и рефлексия

Агент выполняет действие, может пересмотреть план при негативной обратной связи и фиксирует опыт в эпизодической памяти, а паттерны — в семантической.

 def act(self, action):
       action_type, param = action
       if action_type == 'recommend':
           if param == 'general':
               return f"Let me learn your preferences first! What genres do you enjoy?"
           return f"Based on your preferences, I recommend exploring {param}!"
       elif action_type == 'update_preference':
           self.semantic_memory.update_preference(f'genre_{param}', 1.0, weight=1.0)
           return f"Got it! I'll remember you enjoy {param}."
       elif action_type == 'execute':
           return f"Executing task with strategy: {param}"
       return "Action completed"
  
   def revise_plan(self, feedback):
       if 'no' in feedback.lower() or 'wrong' in feedback.lower():
           if self.current_plan:
               action_type, param = self.current_plan[0]
               if action_type == 'recommend':
                   genre_prefs = sorted(
                       [(k, v) for k, v in self.semantic_memory.preferences.items() if 'genre_' in k],
                       key=lambda x: x[1],
                       reverse=True
                   )
                   if len(genre_prefs) > 1:
                       new_genre = genre_prefs[1][0].replace('genre_', '')
                       self.current_plan = [('recommend', new_genre)]
                       return True
       return False
  
   def reflect(self, state, action, outcome, success):
       self.episodic_memory.store(state['raw'], str(action), outcome)
       self.semantic_memory.record_pattern(state['intent'], str(action), success)

Запуск сессий и оценка памяти

Простой цикл показа демонстрирует, как цикл восприятие→план→действие→рефлексия повторяется по ходам и сессиям. Демонстрация ниже симулирует три сессии, оценивает использование памяти и выполняет тест извлечения.

 def run_session(self, user_inputs):
       self.session_count += 1
       print(f"\n{'='*60}")
       print(f"SESSION {self.session_count}")
       print(f"{'='*60}\n")
       results = []
       for i, user_input in enumerate(user_inputs, 1):
           print(f"Turn {i}")
           print(f"User: {user_input}")
           state = self.perceive(user_input)
           plan = self.plan(state)
           if not plan:
               print("Agent: I'm not sure what to do with that.\n")
               continue
           response = self.act(plan[0])
           print(f"Agent: {response}\n")
           success = 'recommend' in plan[0][0] or 'update' in plan[0][0]
           self.reflect(state, plan[0], response, success)
           results.append({
               'turn': i,
               'input': user_input,
               'intent': state['intent'],
               'action': plan[0],
               'response': response
           })
       return results

def evaluate_memory_usage(agent):
   print("\n" + "="*60)
   print("MEMORY ANALYSIS")
   print("="*60 + "\n")
   print(f"Episodic Memory:")
   print(f"  Total episodes stored: {len(agent.episodic_memory.episodes)}")
   if agent.episodic_memory.episodes:
       print(f"  Oldest episode: {agent.episodic_memory.episodes[0]['timestamp']}")
       print(f"  Latest episode: {agent.episodic_memory.episodes[-1]['timestamp']}")
   print(f"\nSemantic Memory:")
   print(f"  Learned preferences: {len(agent.semantic_memory.preferences)}")
   for pref, value in sorted(agent.semantic_memory.preferences.items(), key=lambda x: x[1], reverse=True)[:5]:
       print(f"    {pref}: {value:.3f}")
   print(f"\n  Action patterns learned: {len(agent.semantic_memory.patterns)}")
   print(f"\n  Success rates by context-action:")
   for key, stats in list(agent.semantic_memory.success_rates.items())[:5]:
       if stats['total'] > 0:
           rate = stats['success'] / stats['total']
           print(f"    {key}: {rate:.2%} ({stats['success']}/{stats['total']})")
 
 
def compare_sessions(results_history):
   print("\n" + "="*60)
   print("CROSS-SESSION ANALYSIS")
   print("="*60 + "\n")
   for i, results in enumerate(results_history, 1):
       recommendation_quality = sum(1 for r in results if 'preferences' in r['response'].lower())
       print(f"Session {i}:")
       print(f"  Turns: {len(results)}")
       print(f"  Personalized responses: {recommendation_quality}")

def run_demo():
   agent = MemoryAgent()
   print("\n SCENARIO: Agent learns user preferences over multiple sessions")
   session1_inputs = [
       "Hi, I'm looking for something to read",
       "I really like sci-fi books",
       "Can you recommend something?",
   ]
   results1 = agent.run_session(session1_inputs)
   session2_inputs = [
       "I'm bored, what should I read?",
       "Actually, I also enjoy fantasy novels",
       "Give me a recommendation",
   ]
   results2 = agent.run_session(session2_inputs)
   session3_inputs = [
       "What do you suggest for tonight?",
       "I'm in the mood for mystery too",
       "Recommend something based on what you know about me",
   ]
   results3 = agent.run_session(session3_inputs)
   evaluate_memory_usage(agent)
   compare_sessions([results1, results2, results3])
   print("\n" + "="*60)
   print("EPISODIC MEMORY RETRIEVAL TEST")
   print("="*60 + "\n")
   query = "recommend sci-fi"
   similar = agent.episodic_memory.retrieve_similar(query, k=3)
   print(f"Query: '{query}'")
   print(f"Retrieved {len(similar)} similar episodes:\n")
   for ep in similar:
       print(f"  State: {ep['state']}")
       print(f"  Action: {ep['action']}")
       print(f"  Outcome: {ep['outcome'][:50]}...")
       print()
 
 
if __name__ == "__main__":
   print("="*60)
   print("MEMORY & LONG-TERM AUTONOMY IN AGENTIC SYSTEMS")
   print("="*60)
   run_demo()
   print("\n Tutorial complete! Key takeaways:")
   print("  • Episodic memory stores specific experiences")
   print("  • Semantic memory generalizes patterns")
   print("  • Agents improve recommendations over sessions")
   print("  • Memory retrieval guides future decisions")

Чему учит такой подход

Запуски и оценки показывают два эффекта: эпизодическое извлечение помогает агенту переиспользовать релевантные прошлые действия, а семантика направляет выбор стратегий, которые работали на практике. В сочетании они создают цикл, при котором агент постепенно становится лучше в персонализации и принятии решений.

Агентная ИИ с памятью: создание непрерывно обучающихся агентов с эпизодической и семантической памятью

Switch Language