Copilot Vision takes the generative AI concept in a new direction: Instead of creating text or images based on prompts, it can understand and react to visual input and provide context and explanations. Copilot Vision is currently in limited preview in the Edge web browser, which runs on Android, iOS, macOS, and Windows.
I got to try Copilot Vision firsthand, and it’s like nothing you’ve ever seen in a web browser. The Google Lens feature in Chrome bears a slight resemblance, letting you highlight objects on a page and get search results in a side panel, but it’s not conversational. Copilot Vision, by contrast, is a real browsing companion. It takes in everything visual and textual on a page and verbally converses with you about it. I'm here to walk you through how to get it and how it works.
How to Set Up Copilot Vision
For now, Copilot Vision works only for select Copilot Pro subscribers ($20 per month) and is entirely opt-in. Microsoft hasn’t stated whether the feature will be available to free users. Upon first use, you must go through a simple startup process. Since the feature is speech-based (even though it’s called Vision), you need to enable your mic for it. Once you hit the Copilot Vision button on the final page of the setup process, the lifelike AI speech of Copilot Voice welcomes you. Clicking the settings gear icon gives you a choice of four voice personalities: Canyon, Grove, Meadow, or Wave. I stuck with the default Canyon.
Copilot Vision can describe and give you info on what you’re seeing on a site, but also just chat about anything. Here’s a small taste of what it looks and sounds like:
As you can see, the interface for Copilot Vision is completely different from the Edge browser’s Copilot sidebar, which is a standard ChatGPT-like chatbot interface. Vision appears as a bar at the bottom of the web page and collapses to a tiny indicator bar at the bottom center of the screen when not in use.
Getting Started With Copilot Vision
When I first clicked the Copilot Vision screen-share icon, the edges of the browser window took on a slight tint, which is more evident in Light mode than in Dark mode. The mic icon also turned red to show it's active. The friendly Canyon voice said, “Hey Michael, how are you doing today? What’s on your mind? Or should I surprise you with something fun?”
The main Copilot Vision page displays sample things to ask. For example, it showed a web page with four cute dogs and suggested I ask it to “tell me more about these breeds.” The next suggestion was “Summarize these articles,” which showed that the tool works with both images and text on web pages. Then, it demonstrated that it knows about geography by showing four cityscapes and prompting, “Which of these cities has the oldest buildings?” Finally, it said, “Now it’s your turn,” and suggested some sites like Amazon, Target, Tripadvisor, and Wikipedia to get started.
One of the sites Copilot proposed was GeoGuessr, which has its own World Cup. I told it that I wasn’t interested in the soccer World Cup, and it assured me that this wasn't related.
When I stopped speaking with it for a while, I got a message saying, “Sorry, nodded off for a second! Try reconnecting.” That’s actually a good thing, since you don’t want it to keep listening if you accidentally leave it on. When I asked if I could provide feedback to Copilot Vision directly with my voice, I was impressed by its reply: “Your feedback will be passed on to my developers.” If you interrupt Copilot Vision, its voice politely steps aside.
What's It Like to Use Copilot Vision?
Interestingly, Copilot Vision told me that it sees only the part of the browser that you can see, and not anything beyond the visible window. But when I went to a page on Everyday Health, it summarized content far below what I could see. Since the standard Copilot sidebar in Edge can summarize entire web pages, it’s possible that Vision was tapping into it for this info.
When I navigated to my photos on OneDrive, Copilot Vision told me that it couldn’t see what was on the page. I asked, and the response was, “I can’t see any photos on personal websites.” It did, however, perfectly describe public photo on Flickr of a group of people enjoying a view of a valley. It also knew that a highly distorted photo was of a spider, which I couldn’t recognize at first. It didn’t see content on Instagram or other social networks.
There’s a mute button to make the AI stop listening to you, but I wish there was a button to silence it when it goes on a bit too long about background information on a page or in response to a query. Instead, you can say, “Quiet!” This will stop it from speaking and seeing what you're browsing. That said, it remained active one time in testing when I told it to stop watching. You can simply hit the X in its control panel at the bottom of the screen to close it in that case.
If you close Copilot Vision, you see a text box at the bottom of the web page in which you can type queries to the regular Copilot; your answers appear in Edge’s sidebar. You can minimize this text box further, and a discreet, thin bar appears along the bottom of the browser, from which you can open the search box anytime. This bar even appears when you view a website in full screen. To start Copilot Vision again, open the text box and click the screen share icon to the right of it. When minimized, Copilot Vision doesn't see what's on your screen.
Co-Gaming
Don’t let this section get your hopes up too much: Copilot Vision can be active while you play games on the web, but not as a competitor or partner. Instead, it provides strategy tips or commentary about what’s on the screen. When I started playing Mr. Mine on CrazyGames.com, Copilot Vision knew how to play the game and what the goal was. When I asked how it knew about this little-known game, Copilot said it could read what was on my screen and, “Yeah, I’ve got a knack for games.”
What Can't Copilot Vision See or Do?
I asked Copilot Vision whether it would be watching if I went on a pornography website and got a thoughtful answer saying, “for safety and privacy, I don’t store or share personal info.” Microsoft’s documentation states that Copilot Vision doesn’t use input for AI training. In other words, it doesn't see sensitive or legally protected information, including bank account credentials or passwords. It also doesn't view private websites, meaning any content behind logins or paywalls. When I navigated to the Ally online bank website, Copilot Vision stopped working. I got the message below, which offered the option to reconnect. However, it wouldn't reconnect until I was on a public website.
Copilot Vision sees only the browser tab you start it on. It can’t open a new web page either. That's good, since you don't want the AI to take action by itself should things go haywire. The tool also can't detect your cursor position, which might actually be a disadvantage for gaming advice.
Copilot Vision technically can’t see video, but it can provide feedback based on still frames from a video. It can’t hear and interpret web page audio either.
Finally, Copilot Vision can't provide a written transcript of your interactions with it; it would be nice to be able to see its answers. The regular Copilot, whether in the Edge sidebar or a separate app, does this.
Is Copilot Vision Useful?
Copilot Vision is excellent at providing detailed spoken descriptions of what’s on a web page, alongside rich background and context. It speaks sort of like a friend who has no opinions of their own—something you may appreciate! It has many protections and limitations, too, most of which are for the better.
However, I’m not sure why Microsoft didn’t make Copilot Vision a part of the existing sidebar-based Copilot. The conversations you have with each have the same shape, after all, with the only difference being Vision’s ability to see what’s on the current web page. I also hope that Microsoft extends Copilot Vision’s capabilities beyond the Edge browser.
For more on Copilot, check out our comparison between Copilot and Copilot+ and cool things you can do with a Copilot+ PC.
Get Our Best Stories!
This newsletter may contain advertising, deals, or affiliate links. By clicking the button, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
Thanks for signing up!
Your subscription has been confirmed. Keep an eye on your inbox!
Sign up for other newsletters