NYC Data #353: LLMs for Autocomplete at Reddit, Hidden AI Prompts, Etsy, Vevo, SecurityScorecard, Plotly & Python, Chobani, The Hottest Trains
Plus, cap and trade for NYC rents?
Hi friends, I hope your summer is going well and you’re staying cooler than I have; technicians are in my apartment fixing my HVAC system as we speak!
As always, help me keep this space up-to-date: please send me posts, events, and job openings. If you know someone who might enjoy or benefit from this newsletter, please share it with them. [image credit: The Public Theater]
Good Local Posts
Mike Wright of Reddit wrote about using LLMs to generate autocomplete options. We talked about this at work and like the design of using the slower LLMs offline to generate the key-value table, and then using the faster lookup at query-time. Mike provides some improvement numbers, which is great too. I imagine there’s a non-LLM, ‘all-inference’ way to do this (probably Google does that?): if anyone knows more about this I’d love to hear.
Andrew Gelman has a post up with examples of research papers containing hidden prompts directing AI tools to give them good reviews. I don’t know how effective these prompting strategies are, but this makes me want to think twice about just dumping a lease or other text into an LLM to summarize without a manual review.
This piece on implementing cap and trade for NYC rent stabilization was really fun and creative. After some thought, I probably wouldn’t support due to its potential for abuse, but I love these market-policy initiatives like Green Taxis (which I thought worked pretty well before Uber and Lyft made it a redundant option).
Upcoming In-Person Events (new listings in bold)
7/22 - 7/23: International Conference on Applied Statistics for Agricultural and Life Sciences
7/20: ASA AI Workshop NYC
7/21: NYC Data Exploration with Plotly & Python
7/23: Databricks Summer Meetup
7/24: Python and Data: Project Night with PyData
7/29: AI Meetup (July): Agentic AI and MCP
9/18: Data Management Summit
10/22: Introduction to Analysis of Public Survey Data
Open Roles
Squarespace is hiring a Data Governance Lead and Data Platform Engineers.
Vevo is hiring a Senior Manager, Data Analytics.
SecurityScorecard is looking for a Principal Data Scientist.
Etsy is seeking a Senior Data Scientist, Analytics.
NYSERDA is hiring a Business Analyst.
Chobani is looking for a Senior People Data Analyst.
Miscellany
How people figured out NYC cross-streets from addresses pre-smartphone. TIL! Though having been in NYC back in the old days, I can’t remember someone failing to provide the nearest intersection.
Ramsey Khalifeh of Gothamist wrote about data on the Hottest Trains, and Mark Sidall normalized complaints by ridership.
Thanks so much for being a subscriber. To see previous job listings (many of which are still open!) and blogs, check out the archive (which has emails from the tinyletter days!). Feel free to forward this to anyone: they can subscribe here: