Designing an Autonomous Agent with Hybrid Memory and Modular Tool Dispatch

Introduction

Modern autonomous agents must combine reasoning, memory, and action to operate independently. This article explores a modular architecture that fuses semantic vector search, keyword-based retrieval, and a flexible tool-dispatching loop. The result is an agent capable of maintaining its own long-term memory while dynamically selecting and executing tools. We will walk through the core design layers, from abstract interfaces to a live implementation powered by OpenAI's embeddings and chat models.

Designing an Autonomous Agent with Hybrid Memory and Modular Tool Dispatch — Source: www.marktechpost.com

Architecture Overview

The system is built around three key abstractions: a memory backend, a language model provider, and a tool interface. Each is defined as an abstract class, ensuring clean separation of concerns and easy substitution. The agent uses a hybrid memory that combines dense vector similarity (via OpenAI embeddings) with sparse keyword matching (via BM25). Results are fused using Reciprocal Rank Fusion (RRF) to produce a single ranked list.

Memory Backend

The MemoryBackend abstract class (UML: store, search, list_all) defines how data is persisted and retrieved. A concrete implementation, HybridMemory, stores memory chunks as MemoryChunk dataclass instances containing text, metadata, and optional embeddings. It maintains both a list of chunks and a BM25 index for keyword search. When a query arrives, it computes cosine similarity against stored embeddings and scores against the BM25 index, then combines results using RRF with a rank constant of 60.

Embedding and Keyword Search

Texts are embedded using OpenAI's text-embedding-3-small model via the _embed helper. Vectors are L2-normalized for cosine similarity. For keyword search, _tokenise lowercases and splits text on non-alphanumeric characters. The BM25 implementation from rank_bm25 fits on the tokenized corpus. These two retrieval methods complement each other: embeddings capture semantic meaning, while BM25 excels at exact keyword matching.

Tool and LLM Interfaces

The LLMProvider abstract interface only requires a complete method that takes messages and optional tool definitions, returning a response. This decouples the agent from any specific chat model. Similarly, each Tool subclass must define name, description, and a run method. A schema method auto-generates the function-calling JSON required by OpenAI. This makes adding new tools straightforward: just implement the abstract methods.

Implementation Details

The environment setup installs openai, numpy, and rank_bm25. The API key is securely collected via getpass. Two constants define the embedding model and the chat model (gpt-4o-mini). The code is organized into classes that mirror the architecture described above:

Abstract classes: MemoryBackend, LLMProvider, Tool.
Data structures: MemoryChunk dataclass.
Helper functions: _embed, _tokenise.
HybridMemory: Concrete memory backend with RRF fusion.

The core loop (not fully shown) dispatches tools based on LLM function calls, stores new experiences as memory chunks, and retrieves relevant context before each reasoning step. This creates an agent that can improve over time by referencing its own stored knowledge.

Conclusion

By separating memory, LLM interaction, and tool execution into modular components, we build an agent that is both extensible and maintainable. The hybrid memory approach balances semantic understanding with precise keyword recall. With OpenAI's powerful models and a clean architecture, you can deploy autonomous agents that reason, remember, and act in complex environments. The full code demonstrates each principle in action, from abstract interfaces to a live agent.