Capable AI agents will require new infrastructure. In order to expedite the utility of digital assistants by granting human-like access to them, we need founders devoted to solving API-ification of legacy systems, hardened sandboxes for AI agents, and agent authentication.
The libraries and frameworks for AI systems are pretty immature right now. Most applications are simple chatbots or forms of retrieval. But as we increase the complexity of the use-cases, the current architecture won’t be sufficent. Anyone who has spent a few minutes thinking about AI has considered how useful it would be to have a cheap personal assistant that could tackle complex tasks or how much money it would save businesses if they were able to replace significant headcount with AI agents.
For a pesonal use case, we can imagine the task of ordering some food for pickup from a restaurant. A workplace example might be: Go into the AWS console, spin up a virtual machine, and install a specific opensource project in it. These are real tasks that someone would want to give to a real assistant.
For nearly every system that an agent would need to interact with to accomplish a task, there has to be an API or credentialed UI access (the capability to browse to the application with those credentials similar to how a human would). The infrastructure to do most of that doesn’t exist right now. Let’s showcase that through the examples above.
For the food example, the AI needs:
For the AWS example, the AI needs:
I think the most secure, efficicent, and accurate infrastructure would be entirely API-based, but I don’t see us getting there soon. And AI agents are too valuable for us to “just wait” for the API-ification to be better. So what do we do? We build both.
The most popular “AI Device” stand-alone device was recently launched. It’s called the Rabbit R1. In the Rabbit R1 keynote, they reveal that they have made a credentialed sandbox virtual machine for the AI to use to accomplish tasks. In the keynote, he says you’ll just sign in to your personal accounts during an onboarding flow, and it’ll be stored securely, etc. Personally I find it quite unsafe to grant an AI agent access to my services. Prompt injection in that setup would grant an attacker full control of all the credentialed applications.
Here’s how I think we could solve each of the limiting factors for good and secure AI agents.
So let’s assume the agent needs access to a service without a functional API. To securely give the agent credentialed access, it requires a few things:
We need to find a way to grant an AI agent authorized access to various services. Traditional authentication methods are not well-suited for agents as it would require using your own personal information for the agent’s account on the service if you decided it was unsafe to use your own. At that point, you’re creating two accounts on every service with your own data or fake data, which will often break the terms of service. Even if it doesn’t, you’re managing multiple email addresses or something else weird.
To address this issue, I think a new authentication framework is needed (or something built on top of OAuth, etc). The framework could include some or all of the following components:
One of my favorite ideas for enabling AI agents to perform more complex tasks is through automated API-ification of services that currently lack one. A good example of this is the restaurant industry, where menus and prices change, and every restaurant isn’t going to develop and maintain their own API. However, there is an opportunity for a third-party to come in and do it.
A company could use an LLM with vision capabilities to parse restaurant menus from various sources, such as websites, social media, or even photos of physical menus in reviews. By using vision models, the company could extract relevant information from the menus and present it in a standardized API format. This API would then be accessible to AI agents, allowing them to query the latest menu items, prices, and specials from any restaurant.
This is just one example. There are other industries where similar API-ification would unlock new AI agent functionality.
So yeah, getting human-like capabilities in our AI agents securely will require new infrastructure. We need to create hardened sandboxes for secure UI access, potentially build authentication protocols made specifically for AI agents, and start chipping away at the API problem.
Which part will you build?
Thanks,
- Joseph
To know when I drop a new post, join my email list.