NYC Data #357: Building a Search Engine from Scratch, Reddit's Analytics Engineering, Hidden Door, FreshDirect, Suno, Credit Genie, AI is Bad at SQL
Plus, more on the GPT-5 rollout
Hi friends, I hope your summer is going well! I’m leaving on vacation this weekend, so no letter next week, but I’ll be back for the last (!) Friday in August.
As always, help me keep this space up-to-date: please send me posts, events, and job openings. If you know someone who might enjoy or benefit from this newsletter, please share it with them. [image credit: Jim Harper]
Good Local Posts
Reddit Engineering wrote a post about their Analytics Engineering team. Surprisingly (to me) the function is less than a year old! The post is a good overview of the mission, but also digs into incremental aggregates, sketches, and workload optimization. Highly recommended!
Hillary Mason’s new company, Hidden Door, is open for early access. She describes it as “a game where roleplay meets fanfiction” with generative components. I’m still on the waitlist; it looks fun!
Wilson Lin built a search engine from scratch and wrote an 8,000 word opus on it. So, so interesting, from the infrastructure layer up to the SERPs. This is also the first project I’ve ever seen built on Oracle’s cloud (and I worked for a company that’s now part of Oracle)! Really impressive.
Upcoming In-Person Events (new listings in bold)
8/20: NYC Data Exploration with Plotly Python
8/25 - 8/27: Data Science & AI Conference
8/28: Python and Data: Project Night with PyData
8/31: Enterprise AI Summit 2025
9/8: Building Scalable Systems with ClickHouse & Docker
9/9: Got Data, Now What? Storytelling Through Accessible Design
9/18: Data Management Summit
9/19: Cornell University Artificial Intelligence Investing Conference
9/25: What's the Big Deal with Postgres?
9/29 - 10/3: MLCon
10/22: Introduction to Analysis of Public Survey Data
Open Roles
Squarespace is hiring a Data Governance Lead and Database Engineers.
Ocrolus is looking for a Small Business Credit Analytics Lead.
FreshDirect is hiring a Manager, Data Analytics.
Credit Genie is seeking an Applied AI Engineer but it sounds a lot like a Data Scientist.
Suno is looking for a Sr. Data Scientist, Growth Marketing.
Etsy is seeking a Senior Software Engineer II, Machine Learning.
Miscellany
Tomasz Tunguz wrote about the Spider 2.0 comprehensive text-to-SQL benchmarks, which the big models have been surprisingly poor at. Tunguz says this points “to a fundamental truth about data work. Technical proficiency in SQL syntax is just the entry point. The real challenge lies in business context”. +1
The conversation about the GPT-5 rollout continues! I thought this thread from Steven Sinofsky (former head of Windows) about backlash to product rollouts was really interesting. My favorite line:
[I]n the early days there are tech enthusiasts. They love change. They embrace it. Then suddenly they don't want change right now because a product is important to them.
Thanks so much for being a subscriber. To see previous job listings (many of which are still open!) and blogs, check out the archive, which has emails from the tinyletter days. Feel free to forward this to anyone: they can subscribe here: