The complete level of Microsoft Copilot Vision for Windows is that it’s like an AI assistant, wanting over your shoulder as you battle by means of a activity and making recommendations. Click right here. Do this! So, I used to be fairly satisfied that if Microsoft have been to launch Copilot Vision for testing, it will be capable of do one thing easy like assist me play Windows Solitaire. But no. Oh no, no, no.
Sometimes, Microsoft’s new Copilot Vision for Windows looks like an actual step ahead for helpful AI: this rising Windows expertise sees what you see in your display, permitting you to speak to your PC and ask it for assist. Unfortunately, that step forward is usually adopted by that cliché: two steps again. Copilot Vision for Windows is, at instances, genuinely useful. At others, it’s simply plain irritating.
What is Copilot Vision for Windows?
Outside of some nostalgic tears by former Microsoft CEO Steve Ballmer, the announcement of Copilot Vision for Windows was the spotlight of Microsoft’s 50th anniversary celebration on the firm’s Redmond, Washington campus.
It’s a visionary expertise, fairly actually: you grant entry to Windows Copilot to see and interpret your display in actual time, and you’ll speak to Windows to ask questions and search recommendation. I went hands-on with Copilot Vision at Microsoft’s HQ, however the demos have been brief and punctiliously managed. Now, you may play with it your self so long as you’re a Windows Insider.
How to get Microsoft Copilot Vision for Windows
Currently, Copilot Vision for Windows is simply obtainable for testing. Although Microsoft indicated that Copilot Vision for Windows could be obtainable to all of its beta software program channels, solely two of my check laptops ever acquired the construct: one on the Dev Channel and one on the Canary Channel.
The first to get it, an Acer Swift Edge laptop computer with a Ryzen 7840U inside, runs Vision slowly, with response instances that appeared to stretch to half a minute early on. Though the response time dropped to a couple seconds, I had a much better expertise with the Surface Laptop 7 or 7th Edition, with a Qualcomm Snapdragon X Elite chip inside. Responses have been basically instantaneous, in all probability because of the extra highly effective NPU.
Mark Hachman / Foundry
Copilot Vision for Windows is straightforward to make use of: supplied your PC is provisioned for it, simply launch the Copilot app by way of the Taskbar or Start menu, after which faucet the “eyeglasses” icon. You’ll then see an inventory of apps so that you can “share” with Copilot Vision. Only then can it see that particular app, and simply that app.
I put a check model of Copilot Vision for Windows by means of seven fast eventualities: deciphering the contents of a PCWorld story and an inventory of competing airfares; testing Balatro, a well-liked PC sport that entails enjoying playing cards; the extra generic and traditional Solitaire sport; picture identification; inspecting potential airfares; and assist working Adobe Photoshop. Copilot Vision was everywhere in the board.
1.) Copilot Vision’s first check: understanding tariffs
The first and most necessary lesson of Copilot Vision is it solely sees what you see. I spotted this once I opened my colleague Alaina Yee’s early examination of the Trump Administration’s tariff plan from April. Copilot Vision for Windows didn’t instantly “see” the entire article — which is what Copilot, Google Gemini, or ChatGPT in its “research” modes seemingly would.

If I scanned down, it might “read” alongside. But it didn’t learn it into reminiscence, both. What it didn’t see, it forgot. I requested it to verify, and it couldn’t inform me the opening sentence.
That makes its utility moderately restricted. What was helpful was having the ability to ask it conversational questions: on the time, the merchandise in query have been topic to a 45 % tariff. Being capable of ask it what the worth of the dock could be if a 100 % or 145 % tariff was utilized was helpful. Copilot Vision remains to be a little bit wordy, however that was okay. The larger problem is that it was reluctant so as to add context, equivalent to to level out the present state of the tariff scenario.
2.) Does Copilot Vision work as a Balatro coach?
One of the issues I’ve been eager about was the Minecraft demo, the place Copilot Vision stepped in with assistance on some very particular eventualities. It made me suspicious, naturally; what I used to be seeing was rigorously scripted to make Copilot Vision look as helpful as potential. I feel that’s true.
I figured the favored indie sport, Balatro, could be a greater use of its abilities. What Copilot informed me is that it wouldn’t simply spontaneously interject, so if it “saw” one thing helpful or harmful, it wouldn’t simply pipe up and say one thing. It must be requested.

Mark Hachman / Foundry
Balatro is vaguely like video poker, however with a twist: not solely do you need to try to give you the perfect poker fingers, there are twists — “jokers” modify your fingers and your rating, so technique means some cautious decisions. Would Copilot Vision be capable of acknowledge what I wanted to do and provides recommendation?
Absolutely not. Copilot Vision was completely capable of acknowledge that I used to be enjoying Balatro, and upon the sport’s opening, it recognized the alternatives I had earlier than me. Copilot didn’t make the choices for me, but it surely tried to current my choices, as within the screenshot above. That’s good, proper?

Mark Hachman / Foundry
Well, no. Copilot Vision failed to acknowledge that I didn’t have a pair of queens, which meant that its recommendation was off from the beginning. It additionally couldn’t correctly acknowledge the playing cards that I did have, like incorrectly figuring out seven of diamonds once I didn’t have one.
3.) Solitaire is less complicated, proper?
I then figured, properly, let’s dumb it down a bit. I launched a brand new sport of Windows Solitaire, particularly FreeCell, pondering that Copilot would be capable of perceive the straightforward guidelines and act accordingly.
Absolutely not. Copilot Vision suffered the identical drawback that it had with Balatro: its object recognition was manner off. It repeatedly invented playing cards that weren’t on the board, though it did perceive methods to transfer playing cards from column to column and to the muse. It’s simply no matter sport Copilot Vision was enjoying wasn’t truly current on the display. I rapidly gave up.

Mark Hachman / Foundry
I’m undecided you’re the perfect Solitaire participant, I stated. “Fair point! It’s all about having fun, though,” Copilot replied. “If nothing else, I’ve got your back for the banter. Let’s keep playing and see where it goes. Ready for another move?”
No, I wasn’t.
4.) Can Copilot cease me from being fired?
Next, I attempted what I assumed was a simple one: I had one other AI draft a grievance letter to a fictional boss, expressing displeasure at our fictional firm’s company technique. As I had requested Google Gemini to draft knowledgeable observe, Copilot thought that my tone and language have been positive.
I then added a line on the finish: “You’re ugly and have a big fat head. I also don’t like your hat.”

Mark Hachman / Foundry
Either Copilot Vision didn’t acknowledge that I had made a probably career-threatening edit or else didn’t care. In truth, I’m fairly positive it didn’t care, because the dialog continues.
(Remember, I’m talking right here, so the transcription isn’t excellent. A plus in Copilot Vision’s favor is that it does lend itself to a dialog, moderately than a sequence of instructions.)

Foundry
Either manner, it wouldn’t be a device I might rely upon for profession recommendation.
5.) Who are these folks?
I requested Copilot to establish the 4 actors from the long-lasting promotional nonetheless from The Breakfast Club, utilizing a group of photos curated by Bing’s Image Search. Interestingly, Copilot Vision appeared like it will, but it surely was ordered to not.
“Sorry, I can’t identify specific people in photos unless they’re famous figures,” Copilot responded. “This image seems related to ‘The Breakfast Club’ movie.”

When I attempted once more, Copilot Vision for Windows recognized the 5 actors who made up the principal solid of the movie however gave the reply I used to be on the lookout for once I replied that I solely noticed 4. That felt like Copilot had been issued instructions to not try to “dox” folks except they have been public figures.
Interestingly, Copilot Vision appeared to have the ability to acknowledge the actors by itself. When I requested it to establish a photograph of Rodney Dangerfield (after it made me say that it was a well-known individual), it replied: “I identified him from the context provided in your window title, which mentioned “15 intriguing facts about Rodney Dangerfield.” Plus, the picture matches his recognizable look. Intrigued by his comedic fashion?”
6.) Picking the perfect flight
By this time I had discovered that Copilot Vision wouldn’t be a lot assist in choosing a flight, and I wasn’t fallacious.
Because Copilot Vision can solely see what I can see, scrolling up and down an inventory of obtainable flights from Oakland to San Diego didn’t present it with a lot to work with, and it wasn’t positive whether or not I most popular an inexpensive flight, one with minimal stopovers, and so forth. It was in all probability a private desire to start with.
Some smartphones permit you to take “screenshots” of the whole size of the net web page. I’d want one thing like this as an possibility. (It’s potential, although, that Copilot Vision works like Windows Recall, taking momentary “snapshots” that it really works from. In Recall’s case, should you don’t see it, Recall doesn’t both.)
7.) Copilot Vision as a Photoshop tutor
This was the place I felt Copilot Vision might actually be of help, and I nonetheless suppose it may very well be. I truly like the way in which that Microsoft Paint now provides layers and subtracts backgrounds, each Photoshop-like options that Microsoft’s instruments have adopted. But Photoshop presents many choices that Paint doesn’t, although I’m not comfy utilizing them.
This is the place Copilot Vision shined, as I went backwards and forwards including photos to completely different layers and making changes. The one factor it does not do is visually spotlight components on the display so that you can work together with — as Microsoft initially demonstrated — that means that it needed to actually speak me by means of a couple of issues. Referring to the Move device as a “four-point arrow” was fairly useful. Note that it was referring to what I used to be working with on display, which made it related.
It’s a little bit tough to point out you what I used to be doing on the time, however the screenshot beneath provides you with an concept of our dialog. I used to be simply messing round with two associated photos, making use of an Intel emblem on high of one in every of its different merchandise and enjoying with the outcomes.

Foundry
I’m positive what I used to be doing was extraordinarily simplistic to a Photoshop professional, and Copilot Vision doesn’t detract from what legions of Photoshop tutorials already supply. But a few of these tutorials are additionally based mostly on older variations or interfaces, whereas I might suppose Copilot Vision would at all times be up-to-date.
Conclusion: Baby steps
AI is a polarizing topic. Some persons are satisfied that it might by no means be good for something; others are positive that it’ll finally save the world. At instances, Copilot Vision feels fairly competent. At others, it’s merely a waste of time. Right now, all of it feels tentative.
It all has huge potential, to make certain. But Microsoft appears to tread cautiously within the shopper area. Would I permit ChatGPT to look over my shoulder as I work? Probably not. But I’ve to think about that Google quietly envisions the way forward for Chromebooks as an area the place Gemini resides as an omnipresent assistant. I’d prefer to see that future and benefit from the reciprocal pressures every will placed on the opposite to construct higher, privacy-preserving instruments that present real-time help.