Wrapping AI around Databases

Wrapping AI around Databases

Wrapping AI around Databases

Herald has developed the next generation of RAG toolsets to allow LLMs to interact with large organizational databases. Our THRAG framework lets users explore structured data beyond pre-set table values.

Herald has developed the next generation of RAG toolsets to allow LLMs to interact with large organizational databases. Our THRAG framework lets users explore structured data beyond pre-set table values.

Current Landscape

Institutions are adopting LLMs to increase output by asking tools to write summaries, build internal search tools, and generate internal and external chatbots. Despite this, these tools have failed to live up to the claimed potential as they cannot tap into databases effectively. 

Current AI toolsets such as Retrieval Augmented Generation (RAG) fail to interact with databases with the speed and effectiveness required for a good user experience. RAG solutions are incredibly effective at working with large amounts of unstructured data but fail to integrate well with structured data. 

The core issue is that RAG solutions attempt to interact with databases in the same as traditional programs and therefore miss out on the context-aware decision-making involved in true intelligence.

The current RAG stack from a Llama Index application has the following flow:

  1. Transform user query to SQL using a Text-SQL model (Often GPT) 

  2. Get the Table results from the SQL database  

  3. Summarize the table to paste into the LLM’s context window for an intelligent response

  4. Display the table data to the user on the front end

In a traditional RAG framework, for LLMs to interact with the database, they can only interact with SQL queries. This limits the application to interact with the columns in the dataset without any context. A lack of context limits any real actionable insights with organizational databases. 

Imagine going through a CRM or database within an organization and not knowing anything about the business besides the columns. Employees need to know unstructured information from calls/documents/conversations to get any real insights.

Table Hybrid Retrieval Augmented Generation (THRAG)

Herald has developed a new technique (THRAG) to help solve the current issues around database search via LLMs. 

Here’s an overview of a THRAG workflow:

  1. Retrieve matches from vector database of unstructured information

  2. Extract table metadata from matches

  3. Get the table results from the SQL server

  4. Copy vector matches and table results into LLM for an intelligent response 

  5. Display the table to the user on the front end 

THRAG drastically shifts the approach of using databases with LLMs as it bypasses a traditional SQL approach of querying the database directly. By addressing the unstructured data associated with each database entry, THRAG can better understand the entire database than a purely keyword-driven search in traditional SQL queries.

Extracting the relevant information associated with the table enables a much richer analysis beyond basic RAG. The LLM can see the associated information of the entities and start to see trends from the structured information that was missing in the original retrieved chunk.

Combining structured and unstructured information answers questions at a much deeper level than previously possible and leads to more “human” responses to questions and searches

Value of THRAG for customers 

Our customers have seen the value of THRAG firsthand by querying a VC’s portfolio for companies vulnerable to rising interest rates. 

Traditional RAG capabilities are limited by the information contained within the text chunks. The RAG system focused solely on the textual investor updates and failed to use any context present in the portfolio database on Airtable. 

The THRAG system scans both unstructured and structured information for deeper analysis. The LLM could utilize both the textual and contextual information for reacher results:

  • The proper name of the company (PDF File Title vs Real Company Name)

  • Region Information of Startups

  • Avoid Duplicate Responses (the RAG approach outputted the same company twice despite knowing the same name)

  • Contact Information of CEO for follow-ups 

More impressively, THRAG identified that most affected companies were in Europe,  enabling the investor to contact other portfolio companies in the region to assist in fundraising.

Contact us

We've launched our new product LOCATE to take advantage of the THRAG framework, we're excited for organizations to test our AI platform for faster and more accurate insights. If you’re interested in our solution, reach out to us via our website or at support@heraldlabs.ai

Current Landscape

Institutions are adopting LLMs to increase output by asking tools to write summaries, build internal search tools, and generate internal and external chatbots. Despite this, these tools have failed to live up to the claimed potential as they cannot tap into databases effectively. 

Current AI toolsets such as Retrieval Augmented Generation (RAG) fail to interact with databases with the speed and effectiveness required for a good user experience. RAG solutions are incredibly effective at working with large amounts of unstructured data but fail to integrate well with structured data. 

The core issue is that RAG solutions attempt to interact with databases in the same as traditional programs and therefore miss out on the context-aware decision-making involved in true intelligence.

The current RAG stack from a Llama Index application has the following flow:

  1. Transform user query to SQL using a Text-SQL model (Often GPT) 

  2. Get the Table results from the SQL database  

  3. Summarize the table to paste into the LLM’s context window for an intelligent response

  4. Display the table data to the user on the front end

In a traditional RAG framework, for LLMs to interact with the database, they can only interact with SQL queries. This limits the application to interact with the columns in the dataset without any context. A lack of context limits any real actionable insights with organizational databases. 

Imagine going through a CRM or database within an organization and not knowing anything about the business besides the columns. Employees need to know unstructured information from calls/documents/conversations to get any real insights.

Table Hybrid Retrieval Augmented Generation (THRAG)

Herald has developed a new technique (THRAG) to help solve the current issues around database search via LLMs. 

Here’s an overview of a THRAG workflow:

  1. Retrieve matches from vector database of unstructured information

  2. Extract table metadata from matches

  3. Get the table results from the SQL server

  4. Copy vector matches and table results into LLM for an intelligent response 

  5. Display the table to the user on the front end 

THRAG drastically shifts the approach of using databases with LLMs as it bypasses a traditional SQL approach of querying the database directly. By addressing the unstructured data associated with each database entry, THRAG can better understand the entire database than a purely keyword-driven search in traditional SQL queries.

Extracting the relevant information associated with the table enables a much richer analysis beyond basic RAG. The LLM can see the associated information of the entities and start to see trends from the structured information that was missing in the original retrieved chunk.

Combining structured and unstructured information answers questions at a much deeper level than previously possible and leads to more “human” responses to questions and searches

Value of THRAG for customers 

Our customers have seen the value of THRAG firsthand by querying a VC’s portfolio for companies vulnerable to rising interest rates. 

Traditional RAG capabilities are limited by the information contained within the text chunks. The RAG system focused solely on the textual investor updates and failed to use any context present in the portfolio database on Airtable. 

The THRAG system scans both unstructured and structured information for deeper analysis. The LLM could utilize both the textual and contextual information for reacher results:

  • The proper name of the company (PDF File Title vs Real Company Name)

  • Region Information of Startups

  • Avoid Duplicate Responses (the RAG approach outputted the same company twice despite knowing the same name)

  • Contact Information of CEO for follow-ups 

More impressively, THRAG identified that most affected companies were in Europe,  enabling the investor to contact other portfolio companies in the region to assist in fundraising.

Contact us

We've launched our new product LOCATE to take advantage of the THRAG framework, we're excited for organizations to test our AI platform for faster and more accurate insights. If you’re interested in our solution, reach out to us via our website or at support@heraldlabs.ai

Current Landscape

Institutions are adopting LLMs to increase output by asking tools to write summaries, build internal search tools, and generate internal and external chatbots. Despite this, these tools have failed to live up to the claimed potential as they cannot tap into databases effectively. 

Current AI toolsets such as Retrieval Augmented Generation (RAG) fail to interact with databases with the speed and effectiveness required for a good user experience. RAG solutions are incredibly effective at working with large amounts of unstructured data but fail to integrate well with structured data. 

The core issue is that RAG solutions attempt to interact with databases in the same as traditional programs and therefore miss out on the context-aware decision-making involved in true intelligence.

The current RAG stack from a Llama Index application has the following flow:

  1. Transform user query to SQL using a Text-SQL model (Often GPT) 

  2. Get the Table results from the SQL database  

  3. Summarize the table to paste into the LLM’s context window for an intelligent response

  4. Display the table data to the user on the front end

In a traditional RAG framework, for LLMs to interact with the database, they can only interact with SQL queries. This limits the application to interact with the columns in the dataset without any context. A lack of context limits any real actionable insights with organizational databases. 

Imagine going through a CRM or database within an organization and not knowing anything about the business besides the columns. Employees need to know unstructured information from calls/documents/conversations to get any real insights.

Table Hybrid Retrieval Augmented Generation (THRAG)

Herald has developed a new technique (THRAG) to help solve the current issues around database search via LLMs. 

Here’s an overview of a THRAG workflow:

  1. Retrieve matches from vector database of unstructured information

  2. Extract table metadata from matches

  3. Get the table results from the SQL server

  4. Copy vector matches and table results into LLM for an intelligent response 

  5. Display the table to the user on the front end 

THRAG drastically shifts the approach of using databases with LLMs as it bypasses a traditional SQL approach of querying the database directly. By addressing the unstructured data associated with each database entry, THRAG can better understand the entire database than a purely keyword-driven search in traditional SQL queries.

Extracting the relevant information associated with the table enables a much richer analysis beyond basic RAG. The LLM can see the associated information of the entities and start to see trends from the structured information that was missing in the original retrieved chunk.

Combining structured and unstructured information answers questions at a much deeper level than previously possible and leads to more “human” responses to questions and searches

Value of THRAG for customers 

Our customers have seen the value of THRAG firsthand by querying a VC’s portfolio for companies vulnerable to rising interest rates. 

Traditional RAG capabilities are limited by the information contained within the text chunks. The RAG system focused solely on the textual investor updates and failed to use any context present in the portfolio database on Airtable. 

The THRAG system scans both unstructured and structured information for deeper analysis. The LLM could utilize both the textual and contextual information for reacher results:

  • The proper name of the company (PDF File Title vs Real Company Name)

  • Region Information of Startups

  • Avoid Duplicate Responses (the RAG approach outputted the same company twice despite knowing the same name)

  • Contact Information of CEO for follow-ups 

More impressively, THRAG identified that most affected companies were in Europe,  enabling the investor to contact other portfolio companies in the region to assist in fundraising.

Contact us

We've launched our new product LOCATE to take advantage of the THRAG framework, we're excited for organizations to test our AI platform for faster and more accurate insights. If you’re interested in our solution, reach out to us via our website or at support@heraldlabs.ai

Schedule a call with the Herald team

Herald Labs © 2024

Schedule a call with the Herald team

Herald Labs © 2024