Artificial Intelligence

What is Computer Vision?

Computer vision is the field of AI that enables machines to interpret and understand visual information from images and video.

Computer vision is the branch of AI that gives machines the ability to interpret and understand images and video. It's how phones unlock with your face, cars detect pedestrians, and apps identify plants from a photo.

How It Works:

  1. An image is represented as a grid of pixel values
  2. A model (often a convolutional neural network) scans for patterns
  3. Early layers detect edges and textures; deeper layers detect objects
  4. The model outputs labels, boxes, or pixel masks

Common Tasks:

  • Image classification: What is in this picture?
  • Object detection: Where are the objects (bounding boxes)?
  • Segmentation: Which pixels belong to which object?
  • Face recognition: Who is this person?
  • OCR: Reading text from images

Where It's Used:

  • Healthcare: Analyzing scans and X-rays
  • Automotive: Self-driving perception
  • Retail: Cashier-less checkout
  • Security: Surveillance and access control

FAQ

What model powers most computer vision?

Historically convolutional neural networks (CNNs), though vision transformers (ViTs) are increasingly popular for state-of-the-art results.

Why does lighting and angle matter so much?

Models learn from the data they're shown. If training images differ a lot from real-world lighting, angles, or backgrounds, accuracy can drop — a challenge called distribution shift.

Promote your content

Reach over 400,000 developers and grow your brand.

Join our developer community

Hang out with over 4,500 developers and share your knowledge.