New tool builds RAG with Page Rank algorithm
November 19, 2024 12:49 PM   Subscribe

RAG: Retrieval Augmented Generation They built an RAG tool using Page Rank, the basic algorithm that Google built Web Search on. They are offering 100 free trials using their tool. See below for a typical task analyzing Dicken's "Christmas Carol". RAG:Retrieval Augmented Generation is a technique that grants generative artificial intelligence models information retrieval capabilities. It modifies interactions with a large language model (LLM) so that the model responds to user queries with reference to a specified set of documents.]

(I'd suggest opening a separate Browser Tab to Wikipedia: Knowledge Graph)

I was intrigued by:

"...we always loved the parallel between pagerank and the human memory. We believe that searching for memories is incredibly similar to searching the web."

************************
"Show HN: FastGraphRAG – Better RAG using good old PageRank (github.com/circlemind-ai)"

Page at HN


[snip]
Before we built this, Antonio was at Amazon, while Luca and Yuhang were finishing their PhDs at Oxford. We had been thinking about this problem for years and we always loved the parallel between pagerank and the human memory [2]. We believe that searching for memories is incredibly similar to searching the web.

Here’s how it works:

- Entity and Relationship Extraction: Fast GraphRAG uses LLMs to extract entities and their relationships from your data and stores them in a graph format [3].

- Query Processing: When you make a query, Fast GraphRAG starts by finding the most relevant entities using vector search, then runs a personalized PageRank algorithm to determine the most important “memories” or pieces of information related to the query [4].

- Incremental Updates: ...

- Faster: These design choices make our algorithm faster and more affordable to run than other graph-based RAG systems because we eliminate the need for communities and clustering.

Suppose you’re analyzing a book and want to focus on character interactions, locations, and significant events:

from fast_graphrag import GraphRAG

DOMAIN = "Analyze this story and identify the characters. Focus on how they interact with each other, the locations they explore, and their relationships."

EXAMPLE_QUERIES = [
"What is the significance of Christmas Eve in A Christmas Carol?",
"How does the setting of Victorian London contribute to the story's themes?",
"Describe the chain of events that leads to Scrooge's transformation.",
"How does Dickens use the different spirits (Past, Present, and Future) to guide Scrooge?",
"Why does Dickens choose to divide the story into \"staves\" rather than chapters?"
]
posted by aleph (5 comments total) 7 users marked this as a favorite
 
I feel like I've read about countless large companies implementing (very sensible) policies around use of LLMs that basically instruct their workers, "don't blindly rely on what the LLM says. A human expert must verify its accuracy."

Then they turn around and pay millions for engineers to build RAG pipelines or integrate services like these with their "knowledge bases" so that people who don't know something can ask a machine to give them an answer that they then have to go find a human to verify, when they would have done better just to go do that last step in the first place.

In other words, I'm not sure how this is project is doing more than trying to reinvent the search engine. At great cost.
posted by dsword at 5:47 PM on November 19, 2024 [1 favorite]


Companies always want to have some kind of knowledge base/“intranet”/wiki/whatever they’re calling it these days that people will use to figure out how to submit their expenses, when they can update their insurance preferences, and when they’re allowed to serve a client a glass of wine.

The problem is these things get out of date or accumulate contradictory info, and half the time the search feature doesn’t really work, and nobody’s really incentivized to update them technically or contentwise. So people end up asking a human anyway, and in cases where there’s any risk or complexity may always do so.

It sounds good in theory that you could Slack a bot instead of an overworked human to get an answer. Time will tell if they can make the bots accurate enough and if they can keep the backend current enough to have this actually pay off.
posted by smelendez at 6:37 PM on November 19, 2024 [1 favorite]


I was intrigued by:

"...we always loved the parallel between pagerank and the human memory. We believe that searching for memories is incredibly similar to searching the web."

I'm old. The memory degradation and consequent struggles are *very* like what is described. This tells me they *may be* on to something.

But no, the current hype is not it. But each "AI Winter" cycle gets further and further. It currently *is* more than hype, but also mostly hype.

edit: the comments at the HN page are people sharing tips on the new methods with several reporting impressive tips.
posted by aleph at 9:57 PM on November 19, 2024


"...we always loved the parallel between pagerank and the human memory. We believe that searching for memories is incredibly similar to searching the web."

So, easily-gamed by malicious actors, and subject to inevitable enshittification because of predatory venture capitalists?

...actually that might explain a lot.
posted by Mayor West at 8:54 AM on November 20, 2024 [1 favorite]


ask a machine to give them an answer that they then have to go find a human to verify, when they would have done better just to go do that last step in the first place

My experience is that LLMs work best on tasks where checking the answer is a lot faster and easier than creating it in the first place. Software is a good example of this: it’s often much faster to test whether it’s giving the right result than it is to write the code in the first place.

(Then again, this isn’t always true — especially for very complex software where subtle bugs can creep in. But for simpler tasks it does work really well.)

By the same token, LLMs are really bad for tasks where checking the answer is time consuming and difficult. I think they’re a really bad idea for legal documents, for example, where every word needs to be correct but you can’t automate checking them.

I’m not sure where using RAG for search falls in this spectrum, and it probably depends more on the subject matter than the tool itself.
posted by learning from frequent failure at 9:40 AM on November 20, 2024


« Older When you fight corruption, corruption fights back   |   The Disease of the Powerful Newer »


This thread has been archived and is closed to new comments