Understanding how your AI Agent Remembers (Retrieval, Chunking, and Context)

Understanding how your AI Agent Remembers (Retrieval, Chunking, and Context)

Understanding how your AI Agent Remembers (Retrieval, Chunking, and Context)

When our customers use the Herald platform, they often wonder at how AI agents can recall specific data points with remarkable accuracy, yet sometimes overlook connections that seem obvious to humans. Let's breakdown AI memory systems.

When our customers use the Herald platform, they often wonder at how AI agents can recall specific data points with remarkable accuracy, yet sometimes overlook connections that seem obvious to humans. Let's breakdown AI memory systems.

Understanding Memory

When discussing memory, it’s important to distinguish between reviewing information and memory. As humans, we are vulnerable to misremembering facts, forgetting information, and even creating memories of events that never happened. We can solve this problem by documenting events as they happen and referring to the recorded information later.

In much the same way, AI models are prone to “hallucination”, where they misremember facts, forget information, and even create memories of events that never happened. To combat this, AI companies such as Herald give the AI models records of your company’s documents, files, and databases. This technique is called Retrieval Augmented Generation (RAG).

For a deeper understanding of RAG and how Herald enables companies to leverage AI without needing to train it on their data, take a look at our previous blog post. 

How AI models retrieve information

Retrieval is more complex than simply just throwing the entire document into a LLM. There are two constraints that affect how the AI model process information:


  1. Context Windows: AI models have a limited context window that only really accepts so much information. While these context windows are expanding, it’s vital to keep them short for better accuracy. 

  2. Signal to Noise: Overloading the context window with unnecessary information can confuse AI models, making them give you the wrong data.

Given that most documents are larger than the context window of an LLM and contain superfluous information, it’s generally best practice to have a Chunking system.  Chunking involves breaking up a document into smaller chunks and then returned for analysis. This means that only the most relevant parts of the document are returned to the AI model.

For the sake of simplicity, let’s say each chunk is a single page in a document. Therefore when you search for a question, instead of returning the document to the AI model, the system will return only the relevant pages.

The core of the memory problem is that AI models are just looking at certain pieces of information. You can imagine that the AI model finds the relevant page, but is missing the context provided on the previous page of the document. Instead of having a memory of the entire document, they only have specific pages in front of them. 

AI Simon Says

A useful way to illustrate the differences between human memory and AI processing is by imagining a game of "Simon Says" with a twist. Suppose there are a hundred commands, and the first command states: "It is opposite day."

If you ask a human to recall the 75th command, most people would struggle to remember. Here, AI models show their clear advantage, as they can retrieve and recite the exact 75th command as given by Simon, showcasing precision in accessing specific data points.

Conversely, if you ask a human to perform the action described in the 100th command, most would not only execute it correctly but also remember the crucial twist introduced by the first command. An AI model, however, would likely return the 100th command verbatim, without considering the implication of the first command that it's opposite day.

This is the core weakness of AI memory, they only recall and review what you asked of them. They access only the "page" of notes relevant to your question and remain unaware of any additional context provided by other "pages." This selective memory can lead to gaps in understanding broader contexts and nuances, which humans naturally grasp.

If you're interested in learning more about Herald's Discover platform is able to resolve these issues, please schedule a meeting or reach out to us at support@heraldlabs.ai

Understanding Memory

When discussing memory, it’s important to distinguish between reviewing information and memory. As humans, we are vulnerable to misremembering facts, forgetting information, and even creating memories of events that never happened. We can solve this problem by documenting events as they happen and referring to the recorded information later.

In much the same way, AI models are prone to “hallucination”, where they misremember facts, forget information, and even create memories of events that never happened. To combat this, AI companies such as Herald give the AI models records of your company’s documents, files, and databases. This technique is called Retrieval Augmented Generation (RAG).

For a deeper understanding of RAG and how Herald enables companies to leverage AI without needing to train it on their data, take a look at our previous blog post. 

How AI models retrieve information

Retrieval is more complex than simply just throwing the entire document into a LLM. There are two constraints that affect how the AI model process information:


  1. Context Windows: AI models have a limited context window that only really accepts so much information. While these context windows are expanding, it’s vital to keep them short for better accuracy. 

  2. Signal to Noise: Overloading the context window with unnecessary information can confuse AI models, making them give you the wrong data.

Given that most documents are larger than the context window of an LLM and contain superfluous information, it’s generally best practice to have a Chunking system.  Chunking involves breaking up a document into smaller chunks and then returned for analysis. This means that only the most relevant parts of the document are returned to the AI model.

For the sake of simplicity, let’s say each chunk is a single page in a document. Therefore when you search for a question, instead of returning the document to the AI model, the system will return only the relevant pages.

The core of the memory problem is that AI models are just looking at certain pieces of information. You can imagine that the AI model finds the relevant page, but is missing the context provided on the previous page of the document. Instead of having a memory of the entire document, they only have specific pages in front of them. 

AI Simon Says

A useful way to illustrate the differences between human memory and AI processing is by imagining a game of "Simon Says" with a twist. Suppose there are a hundred commands, and the first command states: "It is opposite day."

If you ask a human to recall the 75th command, most people would struggle to remember. Here, AI models show their clear advantage, as they can retrieve and recite the exact 75th command as given by Simon, showcasing precision in accessing specific data points.

Conversely, if you ask a human to perform the action described in the 100th command, most would not only execute it correctly but also remember the crucial twist introduced by the first command. An AI model, however, would likely return the 100th command verbatim, without considering the implication of the first command that it's opposite day.

This is the core weakness of AI memory, they only recall and review what you asked of them. They access only the "page" of notes relevant to your question and remain unaware of any additional context provided by other "pages." This selective memory can lead to gaps in understanding broader contexts and nuances, which humans naturally grasp.

If you're interested in learning more about Herald's Discover platform is able to resolve these issues, please schedule a meeting or reach out to us at support@heraldlabs.ai

Understanding Memory

When discussing memory, it’s important to distinguish between reviewing information and memory. As humans, we are vulnerable to misremembering facts, forgetting information, and even creating memories of events that never happened. We can solve this problem by documenting events as they happen and referring to the recorded information later.

In much the same way, AI models are prone to “hallucination”, where they misremember facts, forget information, and even create memories of events that never happened. To combat this, AI companies such as Herald give the AI models records of your company’s documents, files, and databases. This technique is called Retrieval Augmented Generation (RAG).

For a deeper understanding of RAG and how Herald enables companies to leverage AI without needing to train it on their data, take a look at our previous blog post. 

How AI models retrieve information

Retrieval is more complex than simply just throwing the entire document into a LLM. There are two constraints that affect how the AI model process information:


  1. Context Windows: AI models have a limited context window that only really accepts so much information. While these context windows are expanding, it’s vital to keep them short for better accuracy. 

  2. Signal to Noise: Overloading the context window with unnecessary information can confuse AI models, making them give you the wrong data.

Given that most documents are larger than the context window of an LLM and contain superfluous information, it’s generally best practice to have a Chunking system.  Chunking involves breaking up a document into smaller chunks and then returned for analysis. This means that only the most relevant parts of the document are returned to the AI model.

For the sake of simplicity, let’s say each chunk is a single page in a document. Therefore when you search for a question, instead of returning the document to the AI model, the system will return only the relevant pages.

The core of the memory problem is that AI models are just looking at certain pieces of information. You can imagine that the AI model finds the relevant page, but is missing the context provided on the previous page of the document. Instead of having a memory of the entire document, they only have specific pages in front of them. 

AI Simon Says

A useful way to illustrate the differences between human memory and AI processing is by imagining a game of "Simon Says" with a twist. Suppose there are a hundred commands, and the first command states: "It is opposite day."

If you ask a human to recall the 75th command, most people would struggle to remember. Here, AI models show their clear advantage, as they can retrieve and recite the exact 75th command as given by Simon, showcasing precision in accessing specific data points.

Conversely, if you ask a human to perform the action described in the 100th command, most would not only execute it correctly but also remember the crucial twist introduced by the first command. An AI model, however, would likely return the 100th command verbatim, without considering the implication of the first command that it's opposite day.

This is the core weakness of AI memory, they only recall and review what you asked of them. They access only the "page" of notes relevant to your question and remain unaware of any additional context provided by other "pages." This selective memory can lead to gaps in understanding broader contexts and nuances, which humans naturally grasp.

If you're interested in learning more about Herald's Discover platform is able to resolve these issues, please schedule a meeting or reach out to us at support@heraldlabs.ai

Schedule a call with the Herald team

Herald Labs © 2024

Schedule a call with the Herald team

Herald Labs © 2024