How not to run a sAAs company
October 15, 2024 6:28 AM   Subscribe

Founder Mode. Somewhere someone said something and a meme was born. David Gerrells is a better developer than me. He flipped the switch on founder-mode /s and built a web-crawler-data-parser using Python(!) and SQLite(!!) to provide free* backlink analysis as an elaborate, yet philosophical middle finger to SEO marketing companies like Ahref and Semrush because "backlinks are the digital road signs for the public and they should be freely and EASILY searchable by anyone." * You can "pay" to unlock AI generated analytics reports, Stripe is configured to auto-refund your payment. Clicking the payment button is enough to fake it out. It's for fun! not profit

This is dense, and more than a bit technical - in short David Gerrells built a tool to analyze roughly 2% of internet traffic captured by Common Crawl (a stupefying amount of data) using popsicle sticks, gum, and many of SQLite databases.

David Gerrells previously
posted by device55 (5 comments total) 9 users marked this as a favorite
 
I'm neither a programmer nor a marketer, but I found this fascinating.

I've understood backlinks for a long time--most people in academia can relate to them as basically a citation score or H-index for a website.

But what's fascinating about this to me is how a (free) backlink API can be connected to a LLM to spot patterns in those links. I've been the budget owner for plenty of marketing initiatives, and I can see how this would save absolute TONS of money for possibly better insights.
posted by yellowcandy at 7:58 AM on October 15


But what's fascinating about this to me is how a (free) backlink API can be connected to a LLM to spot patterns in those links.

This is such a weirdly retro exercise. Goodhart's Law But URLs was a thing for a while, but HTTPs ditching referrers and pingbacks getting abandoned because spammer infestations respectively was, I thought, the end of it. Doing a round trip through the whole common-crawl data and LLMs so you can to re-live 2000s-era SEO is a pretty strange idea.

If this is a better way of doing things today, you have to think that an awful lot of people in an awful lot of places have dropped a lot of balls.
posted by mhoye at 10:06 AM on October 15


TIL that Goodhart's Law refers to that measurement / metric problem. It's a day of learning!
This is such a weirdly retro exercise.
Oh it so it, and I love it.

I don't think it has any direct practical application. I think it's a bit* of fun and like "can I even do this?".

* lot of work, small fun?
posted by device55 at 12:56 PM on October 15


type 2 fun: work right now but you recall it being fun after the event

That aside, PageRank is still a great idea when not poisoned by spammers, so it's tempting to look for a "federated ranking" where your search results come from a mix of scores from organisations you trust to focus on backlinkers relevant in to your interests. You might mash together or deny a spread of summaries for useful search results.
posted by k3ninho at 2:23 PM on October 15


On that score, a custom index on Common Crawl is $60 and 30 hours of processing:

We all know that go and rust would be far better languages to use. As a matter of fact this little lad has a great write up on parsing it all in a day for $60 bones. Slick.
posted by k3ninho at 2:35 PM on October 15 [1 favorite]


« Older In the darkness there’s so much I wanna do   |   "Still, it gave her something useful to do." Newer »


You are not currently logged in. Log in or create a new account to post comments.