Comparing Microsoft Copilot Vision and Google Lens

If you've ever seen something while browsing the web and wondered what it was or where to find it, you're like many others. Luckily, browser vision tools like Google Lens in Chrome and Microsoft Copilot Vision in Edge can help, but which is better?

 

Which tool helps you find things faster?

Run the same website through Google Lens and Copilot Vision to compare the speed and accuracy of identifying objects and providing helpful suggestions. The article used a blog post about shirt materials and a post about trees shared in a Facebook group . For the blog post, the example focuses on the section about Oxford fabric.

Both Google Lens and Copilot Vision recognized the objects immediately, identifying the tree as a Moringa tree; however, there was a difference in how they described the Oxford shirt. Google Lens described it as Nylon Black Oxford Fabric, while Copilot Vision called it Oxford Shirt and Fabric.

 

The real difference is in what they do next.

Google Lens is much more useful if you're trying to find or buy what you're looking at. It suggests similar items, provides clickable product links, and directs you to stores and blog posts where you can learn more or make a purchase. The layout, with all results appearing in a sidebar, makes it easy to browse without leaving the page.

Copilot Vision doesn't make product recommendations or direct you to outside sources. It recognizes what's on the page and can answer any questions you have about it. It was interesting to learn about the health benefits of Moringa oleifera, as identified in a Facebook post. The author asked if it was possible to grow it in his living room, and it replied, 'Probably not.' In this regard, Copilot Vision is useful if you just want to understand something you're looking at, but not great if you're hoping to explore or buy it.

 

Copy, translate and ask questions about text

Next, I tested both tools on text-related tasks, such as copying, translating, and asking follow-up questions. I used a bilingual German-English learning PDF and a scanned image of the back of one of my ID cards to see how well each tool handled text in different formats.

Comparing Microsoft Copilot Vision and Google Lens Picture 5

Google Lens is great for extracting and translating text from images and documents. You can copy text from an image and translate it instantly in the sidebar, which is great if you're working with documents in a foreign language or want to get a phone number, name, or ID number without typing it out. You can also use the sidebar to explore search results, get a quick definition, or type in additional keywords to find related information. Everything happens in the sidebar, and you can easily see what you're highlighting.

Comparing Microsoft Copilot Vision and Google Lens Picture 6

 

On the other hand, Copilot Vision as an app doesn't let you copy text and only provides spoken translations, meaning you can't copy or take notes like you can with Google Lens translations. But it handles real-time interactions with text surprisingly well. For example, when I opened my ID photo (which was intentionally upside down), I asked Copilot Vision to read what was on it. It suggested rotating the photo and zooming in. After I did so, it read the text aloud and even provided a German translation when asked.

Which app gives better insights into web pages and PDFs?

Copilot Vision really shines when analyzing entire websites. I tested both tools on a full PDF book and a YouTube video page, specifically MKBHD's WWDC impressions, to see how well they could summarize and provide insights into broader content.

Google Lens is mostly limited to individual elements you click on (text, objects, and images). When you highlight something, it can show you additional information or similar results, but it doesn't process entire pages or PDFs. So you could just highlight the title of a book or the cover to get similar results.

Comparing Microsoft Copilot Vision and Google Lens Picture 7

Meanwhile, Copilot Vision is designed to interpret everything on the page at once. It answered questions about the author's main idea, navigated to the main section, and even highlighted relevant sentences (although it started to glitch and reject subsequent requests after that, likely due to the large file size).

Comparing Microsoft Copilot Vision and Google Lens Picture 8

 

Its performance is sometimes slow with large files, but it's clearly built to handle all pages and larger content.

Google Lens or Copilot Vision, which tool is best for you?

Both Google Lens and Microsoft Copilot Vision are powerful tools, but they serve fundamentally different purposes. Rather than one tool being better all around, the right choice depends on how you use your browser vision tool.

Choose Google Lens if you want to:

  1. Instantly identify products, clothing or plants and find where to buy them
  2. Copy and translate text directly from images, web pages or documents
  3. Use the clean sidebar layout to browse links and definitions without leaving the page
  4. Get fast visual search results and Google AI Overviews without much interaction
  5. Simple, easy to understand and ideal for getting quick answers to what you see while browsing.

Choose Copilot Vision if you want:

  1. Interact with complex documents, videos, or entire websites
  2. Ask detailed follow-up questions about what you are reading or watching.
  3. Summarize, paraphrase, or discuss text with a friendly AI chatbot
  4. Use the dock interface (Highlights) to provide additional information based on what's on the screen (again, if you have a recent update)

Copilot Vision doesn't just identify what you're looking at. It wants to have a full conversation with you about it.

4.5 ★ | 2 Vote

May be interested