2024 has been the year of artificial intelligence (AI). So much so that most tools have developed AI capabilities—to improve their customers' overall productivity and experience.
In our recent conference, Dr. David Yang, co-founder of Newo.ai, took us through a relatively new concept: AI Digital Employees (human-like AI autonomous agents).
These agents take on complex workflows, giving you back time (and money) to focus on the things that matter most.
This article is based on the talk, which covers:
- What AI agents and AI digital employees are and how they function in modern businesses.
- How digital employees can augment human capabilities and improve productivity.
- The key benefits of browserless technology for deploying AI agents.
What are AI digital employees, and how are they different from regular AI agents?
AI agents help you conduct specific tasks, answer questions, and automate processes as needed. They can either be simple rule-based bots or complex AI systems. They’re not autonomous but can be if needed. AI autonomous agents on the other hand can handle these tasks with little human intervention.
AI digital employees are a special, elite class of AI Agents. They’re also AI Agents that possess “human-like interfaces in the workplace.” For example, if you’re booking a table in a restaurant via a receptionist, these AI agents replace the receptionist’s role and handle the end-to-end call process.
These intelligent agents can:
- Communicate over the phone
- Send and receive SMS
- Correspond via email
- Engage in conversations through corporate messengers
They can also communicate with the browser—meaning they can simulate keyboard and mouse actions within web browsers. This allows them to retrieve and input data into any existing corporate systems (CRM, ERP, finance, booking, helpdesk, etc.) without costly integrations.
These agents don’t require expensive integrations because they aren’t integrated but rather hired to work. Hence, the term digital employees.
From a technical standpoint, this class of agents is characterised by four elements:
- Presence in the physical world: These agents can function within physical environments, interacting with users via kiosks, phone calls or embedded systems.
- Omnichannel capabilities: They can communicate with customers across various platforms, including phone, email, and chat, creating a seamless experience.
- Omni flow capabilities: They handle complex workflows that involve switching between multiple tasks or systems without losing context.
- Omni user capabilities: These agents manage interactions with multiple users simultaneously, ensuring that service quality is consistent across all conversations.
Essentially, they’re designed to integrate into existing work environments—performing tasks that traditionally require human workers. They can interact across multiple platforms, handle complex workflows, and engage with users—mimicking human interactions.
But you need a tool that supports scalability to make them work for your business. For example, tools like Newo.ai have features like:
- Low-code development environments
- Pre-built workflows for standard business processes
- Integration with popular business tools and services
- Scalable architecture to support growing business needs
These employees also offer multiple levels of customisation:
- Business-level design: Allows non-technical users to create workflows using pre-built modules and visual interfaces.
- Low-code level design: Provides more advanced customisation options for users with basic programming knowledge.
They lower the barrier to entry and let businesses of all sizes take advantage of advanced technologies to improve their business processes. For instance, digital employees can be used in restaurants to handle table bookings, take food orders, and manage large group reservations, allowing human staff to focus on more complex, value-driven tasks.
Types of AI agents
Depending on the complexity and workflow, here are the most common types of agents:
1. Simple reflex agents
Simple reflex agents work only based on the current prompt you give them. They have a set of condition-action rules that decide the output. So, it perceives the current state of the input/environment, finds a matching rule and performs the action.
Example: Basic email spam filters that block certain messages.
2. Model-based reflex agents
Model-based reflex agents have a consistent internal state and keep track of their environments (partially) even though it’s not immediately perceptible. Once it perceives the current state of the environment, it updates its internal state based on the model of how it should behave.
Example: Estimated time arrival (ETA) applications that monitor traffic conditions and provide ETAs while travelling.
3. Goal-based agents
These agents take the future into account. They'll take the necessary steps to reach the end goal that you've configured. It perceives the current state, considers possible action sequences and chooses the one that'll help you achieve said goal.
Example: Chess-playing AI — its goal is to win the game (checkmate the opponent’s king).
4. Utility-based agents
They’re similar to goal-based agents—but the key difference is that they use a utility function to measure the desirability of different states. It allows them to make trade-offs between conflicting goals or handle uncertainty.
These agents perceive the current state of the environment, consider possible sequences, and choose the sequence that maximises utility function.
Example: Stock-trading AI that balances multiple factors like profit potential or risk before deciding whether to sell, buy, or hold a stock.
5. Autonomous agents
Autonomous agents operate independently, make decisions, and take actions without constant human oversight. They also have the ability to adapt to new situations, unlike the previous agent types. They perceive their environment continuously, process information, and take the right actions to achieve the primary goal.
Example: Newo.ai's digital employees make independent decisions based on the goal you've given them. For example, booking a table or ordering food.
6. Collaborative agents
Collaborative agents are designed to work together with other agents (AI or human) to achieve shared goals. These agents share information, coordinate actions, and solve problems together.
They usually communicate with each other to share information/goals, coordinate actions and decide, and adapt behaviour based on how other agents behave.
Example: Smart traffic management systems that use multiple AI agents to optimise traffic flow. Since each intersection has an agent that controls traffic lights, it coordinates actions to decide which signal to display.
7. Copilot agents
These agents exist to assist or augment humans—instead of operating independently. They provide suggestions, automate routine tasks, and improve decision-making.
The agent observes your input within the required context and provides relevant suggestions or information. It also has the ability to learn from your feedback and preferences.
Example: GitHub Copilot (AI-powered code completion tool) that lets developers write and test code based on their coding style and preferences.
How do AI autonomous agents work?
AI autonomous agents work using a multi-agent system. In this architecture, you'll find a combination of specialised agents that take directions from a higher-level "super agent." It lets them handle complex tasks by breaking them down into smaller chunks—each managed by a sub-agent.
The larger agents work based on the following:
- Multi-agents
- Browserless’s technology
- Natural language processing
- Omnichannel capabilities
Browserless lets these agents interact with web-based systems without needing an entire browser interface. For example, you can use Playwright or Puppeteer for test generation or visual analysis. The main benefit is that you can improve the speed of these agents and reduce resources.
For example, in a restaurant setting, AI agents using browserless technology can do the following:
- Table booking: Integrating with systems like OpenTable or Yelp, digital employees can manage reservations, check availability, and confirm bookings with customers.
- Food ordering: Processing takeout or delivery orders, including special requests and dietary requirements.
- Large group orders: Handling complex orders for events or large parties, ensuring accuracy and efficiency.
That said, its ability to understand natural language is what sets it apart from the rest of these agents. Since it can generate human-like responses (text and voice), it helps you offload menial tasks. Just like human agents, they can adapt to unexpected situations, such as adding the wrong input or using error-handling mechanisms.
Since its omnichannel, it can handle queries across multiple channels (phone, email, chat, etc.) without losing context. And all this happens in real-time—ultimately mimicking a normal human interaction.
{{banner}}
Roadblocks to expect with AI agents
Even though AI agents are helpful, there are several challenges you can expect to run into. Here are some of the most common ones:
Non-deterministic behaviour
Since these agents interact with external systems like third-party software or websites, they could behave unpredictably.
You could experience a service outage or changes in user interfaces, or APIs could have downstream effects. This becomes an issue when you're running hundreds of tasks at scale because it's hard to pinpoint what went wrong.
Let's say you're using an agent to book a flight, and it has to deal with a new popup on the airline's website that asks for vaccine information or details for a coupon code. If you haven't added the necessary steps to tackle these popups in your workflow, the booking might not happen, or you could end up with booking errors.
You need to implement error-handling mechanisms and potential fallback scenarios to prevent such issues.
Multi-step workflows
Another issue with complex agents is that these workflows have actions that require a lot of back-and-forth between multiple users or systems. As you keep using it, it could become harder for it to maintain context across multiple interactions.
For example, if your agent is helping a user with their mortgage application that needs to pull financial data from multiple systems, it requires the right context and decision-making flow to do that. It's probably collecting data through credit checking, underwriting, and your own application systems.
But what do you do if one step fails or there isn't enough data to process the application? Keep these possibilities in mind while building your agents.
Optimising token usage and response time
As you eventually scale your AI usage, you can expect token usage and response times to become an issue. Think about it this way. If you're running the same process but the size of your website and user base keeps increasing, the number of actions increases with time for every query you run.
This means more data to parse, more resource usage, and longer response times.
That’s why you need to make sure your workflows only include the necessary steps. Here are a few other ways you optimise token usage:
- Cache frequently used information
- Use a tiered response system
- Use smaller, task-specific models if appropriate
- Use shorter and more precise prompts
- Request more efficient output formats (bullets, tables)
Reducing latency for verbal communication
Response times in a human-to-human conversation from the time of an utterance can happen between -280 and +758 msecs. On average, you can expect it to be +239 msec delay. But most AI agents take much longer because they're processing your queries. If it's not under 500 msecs, it could sound unnatural and have the opposite effect on your users.
For example, if you’re booking a table and ask the agent to book a table for four people, you don’t want to stand there waiting for an answer for a seemingly long time. It's awkward, and you'll question if the bot understood you.
So, to prevent this, consider using methods like:
- Optimising speech recognition and synthesis algorithms
- Using predictive processing
- Use edge computing for faster local processing
- Developing better natural language models
AI agents are the future
As AI agents are transforming how businesses of all sizes operate, it's clear that this is the future. Suppose you've created agents that follow the same workflows you currently use. In that case, you can expect to see better cost savings, productivity gains, and more satisfied employees and customers in the long run.
With solutions like Newo.ai, it's easy to build these agents. You need minimal coding capabilities to get started.
So, why not give it a shot?
Want an unlocked browser for your AI to interact with?
Browserless offers a pool of managed browsers, ready to be controlled through libraries such as Puppeteer and Playwright, or by AI agents.
Give your AI the ability to browse live websites to collect information or submit forms and purchases. We even offer stealth options such as our /unblock API to help you get past bot detectors.