⚠️ Here is the issue with AI doing actual work

As I mentioned at the beginning of the year, this would be The year of agents. Not only have we seen leading providers switching their focus to agentic capabilities like computer use, software engineering, and their assistants thinking before answering, but agents have also shown up in the projects we use to do work, a good example being the Notion AI agent announced a few weeks ago.

On the one hand, this is great news: AI is doing actual work, automating tasks for us, freeing us to do other things or at least doing more of the things we were doing. At Mantis, for example, we routinely use AI as coding assistants to draft solutions and review our code. We still have to carefully review what the AI is suggesting, but it does accelerate our development cycle. Ethan Mollick, in his newsletter which I highly recommend, also mentioned how he can now use AI to reproduce results from academic papers, something that would have been infeasible to do for most research work before.

Here's the catch though: do we trust what AI did to the extent that we will not thoroughly check the process and results? Here's another example — just a few days ago, a project was delayed by a week, so I asked Notion AI to shift resources and tasks by one week and it did 😮 but did it do it correctly? In order to answer that, I need to spend time carefully checking what AI did. Most of the time this saves me time, but there are also times where it would have been quicker or at least as quick to do it manually.

This also lies at the core of why AI is not here to replace jobs but make us more productive. And irrespective of how accurate AI really is, at the end of the day a human is accountable for the work done. The good news is that AI is pulling its weight lately, delivering actual value instead of being a better search engine 💪