On reliable LLMs
I didn’t initially pay attention to the o3 announcement, but the buzz built quickly. Ethan Mollick wrote insightfully as usual about some of the implications, but my takeaway is more prosaic. After close to 20 separate conversations, I’m finding the responses to be consistently quick, useful and concise. I have previously found almost all models to be useful, but not close enough to completely reliable to induce me to change my habits. The liberal usage of web search during the reasoning phase appears to ground the model and provide useful and current output. Also, I’m finding that I don’t have to spend effort thinking about how to prompt well – simple questions are clearly parsed for intent.
What to make of all this?
Similar to what Simon Willison said today, I anticipate defaulting to LLM queries instead of web searches going forward.1 For me, o3 crosses a threshold where I don’t expect that 15 or 20% of the time, I’ll have to give up and go back to a search engine.2 The output isn’t perfect, but it’s reliably useful and the mistakes are few and inconsequential. I recommend trying o3, particularly if you haven’t tried a recent model. You’re likely to be amazed and quickly think of real-world use cases.