ARKit is Apple's proprietary software framework that provides developers with the core technologies to build augmented reality (AR) experiences for iOS and iPadOS. It abstracts complex computer vision and sensor fusion tasks, offering robust world tracking, scene understanding, and face tracking capabilities. By leveraging the device's camera and motion sensors, ARKit establishes a correspondence between the digital content and the physical world in real time, enabling virtual objects to interact convincingly with real surfaces and lighting.
Glossary
ARKit
What is ARKit?
ARKit is Apple's foundational software framework for building augmented reality (AR) applications on iOS and iPadOS devices.
The framework's architecture is built upon key spatial computing primitives. Its Visual-Inertial Odometry (VIO) fuses camera data with input from the device's Inertial Measurement Unit (IMU) for highly accurate 6DoF pose estimation. ARKit performs plane detection to identify horizontal and vertical surfaces, creates a coarse world mesh for environmental interaction, and manages persistent spatial anchors. This allows developers to focus on application logic while the framework handles the underlying challenges of simultaneous localization and mapping (SLAM), sensor calibration, and real-time rendering integration.
Core Capabilities of ARKit
ARKit is Apple's foundational software framework for iOS and iPadOS that enables developers to build augmented reality experiences by providing robust, device-native spatial computing capabilities.
How ARKit Works: The Technical Pipeline
ARKit is Apple's integrated software framework that enables iOS devices to understand and interact with the physical world, creating a foundation for augmented reality applications.
ARKit's pipeline begins with sensor fusion, combining data from the device's motion sensors and camera. It performs Visual-Inertial Odometry (VIO) to track the device's precise 6DoF pose in real-time by matching visual features across frames while using inertial data to smooth motion during rapid movement or poor lighting. This continuous pose estimation is the core of world tracking, allowing virtual objects to stay locked in place.
Concurrently, ARKit runs scene understanding processes. It performs plane detection to identify horizontal and vertical surfaces and executes light estimation to match virtual lighting to the environment. For advanced devices, it creates a coarse world mesh for physics and occlusion. All processing is optimized for the device's Neural Engine, enabling these complex computer vision tasks to run efficiently on mobile hardware.
ARKit Evolution: Key Version Capabilities
A technical comparison of core capabilities introduced across major versions of Apple's ARKit framework, highlighting the progression of spatial computing features for iOS.
| Core Capability / Feature | ARKit 1-2 (2017-2018) | ARKit 3-4 (2019-2020) | ARKit 5-6 (2021-2022) | ARKit Latest (2023-Present) |
|---|---|---|---|---|
World Tracking (6DoF Pose) | ||||
Plane Detection (Horizontal) | ||||
Plane Detection (Vertical) | ||||
Scene Geometry / Mesh Generation | Low-res mesh | Real-time mesh | Real-time mesh | |
People Occlusion | ||||
Motion Capture (Body) | ||||
Simultaneous Front & Back Camera | ||||
Face Tracking (Front Camera) | ||||
Multiple Face Tracking | ||||
Image Tracking | ||||
Object Scanning & Detection | ||||
Location Anchors (GPS + City Data) | ||||
Raycasting (Coarse) | Scene understanding | |||
Raycasting (Fine) | Scene geometry mesh | |||
Collaborative Sessions | ||||
Video Textures | ||||
Depth API (LiDAR Scanner) | ||||
Instant AR (LiDAR) | ||||
Room Capture (LiDAR) | ||||
Spatial Audio | ||||
App Clip Code Tracking | ||||
RealityKit Integration | Native support | Native support | Native support | |
USDZ File Format Support | Native support | Native support | Native support |
Primary Use Cases and Industries
ARKit provides the foundational spatial computing capabilities that enable a diverse range of augmented reality applications across consumer, enterprise, and industrial sectors by leveraging device sensors for real-time environment understanding.
Gaming & Interactive Entertainment
ARKit is used to create immersive games that blend digital objects with the real world, utilizing world tracking, plane detection, and image recognition for interactive gameplay.
- Pokémon GO: Uses ARKit for improved rendering and placement of Pokémon in the environment.
- Minecraft Earth: Allows players to build creations on real-world surfaces.
- The Machines: A real-time strategy game where the battlefield is your tabletop, using vertical and horizontal plane detection.
Industrial Design & Architecture
Professionals use ARKit for design review, on-site visualization, and collaborative planning. It allows architects and engineers to overlay proposed designs, such as building structures or machinery, onto physical job sites for scale verification and client presentations.
- Shapr3D: A CAD tool that uses AR for visualizing 3D models in real space.
- ARki: An interactive real-time augmented reality visualization service for architectural models.
- Use Case: Overlaying HVAC ductwork or electrical conduits onto a construction site to check for clashes.
Education & Training
ARKit creates interactive learning experiences by bringing 3D educational models and historical reconstructions into the classroom or home. It supports procedural training, such as simulating assembly or maintenance tasks on physical equipment.
- Froggipedia: Allows students to interactively explore the anatomy of a frog.
- JigSpace: Provides step-by-step, interactive 3D explanations of how complex objects work.
- Medical Training: Overlaying anatomical models onto mannequins or spaces for surgical planning demonstrations.
Marketing & Live Events
Brands deploy ARKit for interactive advertising, product launches, and enhanced live experiences. This includes creating AR filters, placing branded content in specific locations (geo-anchored experiences), or adding digital layers to physical posters and packaging.
- AMC Theatres: Used ARKit to create promotional experiences for movies like The Walking Dead.
- Posters & Packaging: Scanning a movie poster with an app triggers a character to appear in AR.
- Sports Events: Overlaying real-time stats and player information onto the live field view for broadcast.
Healthcare & Medical Visualization
ARKit assists in patient education, surgical planning, and physical therapy. It can visualize complex medical data, such as CT or MRI scans, as 3D holograms projected onto a patient's body, aiding in diagnosis and pre-operative planning.
- AccuVein: Projects a map of veins onto the skin's surface (uses similar projection principles).
- EyeDecide: Uses the camera to simulate the impact of eye conditions on vision.
- Anatomy Visualization: Medical students can walk around and examine detailed, life-sized 3D models of organs.
Frequently Asked Questions
ARKit is Apple's foundational software framework for building augmented reality experiences on iOS and iPadOS devices. These questions address its core capabilities, technical architecture, and role within the spatial computing ecosystem.
ARKit is Apple's software framework that enables developers to build augmented reality (AR) applications for iOS and iPadOS by providing real-time sensor fusion, world tracking, and scene understanding. It works by fusing data from the device's camera, Inertial Measurement Unit (IMU), and, on supported hardware, the LiDAR Scanner to create a live model of the environment. At its core, ARKit performs Visual-Inertial Odometry (VIO), continuously estimating the device's 6DoF Pose (position and orientation) while simultaneously building a spatial map of the surroundings. This allows virtual objects to be placed and anchored realistically in the physical world.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms in Spatial Computing
ARKit integrates several foundational computer vision and sensor fusion techniques to enable robust augmented reality. Understanding these related concepts is key for developers building advanced spatial applications.
Visual-Inertial Odometry (VIO)
Visual-Inertial Odometry (VIO) is the core tracking technology in ARKit. It fuses data from the device's camera (visual) and Inertial Measurement Unit (IMU—gyroscope and accelerometer) to estimate the device's 6DoF pose (position and orientation) in real time. The IMU provides high-frequency motion data that compensates for rapid device movement or temporary visual occlusion (e.g., a blurry camera feed), ensuring stable and continuous tracking. This sensor fusion is what allows virtual objects to stay locked in place as you move your iPhone or iPad.
Plane Detection
Plane detection is ARKit's process of identifying flat, horizontal, and vertical surfaces in the physical environment, such as floors, tables, and walls. Using data from the camera and motion sensors, ARKit classifies feature points to find large, continuous planes. This is a prerequisite for placing virtual objects that appear to rest realistically on real-world surfaces. Developers can access properties like the plane's extent (boundaries), alignment (horizontal/vertical), and center to anchor content accurately.
World Tracking & World Mapping
World Tracking is ARKit's continuous process of establishing and maintaining a correspondence between the device's coordinate system and the real world. World Mapping is the creation of a persistent, shareable spatial map of an environment. ARKit generates a sparse point cloud of recognized features and can save this map data. In multi-user AR sessions, this allows multiple devices to share a common coordinate frame, enabling shared experiences where all users see virtual content in the same physical location.
Scene Understanding
Scene Understanding in ARKit extends beyond basic plane detection to provide a higher-level semantic and geometric parse of the environment. This includes:
- Scene Geometry: Generating a coarse, real-time mesh of the environment.
- Image Anchoring: Detecting and tracking known 2D images (like posters) in the world.
- Object Anchoring: Detecting and tracking known 3D objects.
- Raycasting: Sending a ray from the screen into the world to find intersections with detected planes or the scene mesh, used for precise object placement.
- Light Estimation: Approximating the environment's ambient light intensity and color temperature to light virtual objects realistically.
Face Tracking & ARFaceAnchor
ARKit's Face Tracking uses the TrueDepth camera system (on supported iPhones and iPads) to create a detailed 3D model of a user's face in real time. It provides an ARFaceAnchor object that contains:
- Geometry: A topology of 3D vertices representing the face's shape.
- Blend Shapes: A set of coefficients (over 50) that parameterize specific facial expressions like blinking or smiling.
- Transform: The position, orientation, and scale of the face relative to the world. This enables applications like animated Memoji, virtual makeup try-ons, and facial performance capture for avatars.
Collaborative Sessions & Multipeer Connectivity
ARKit supports collaborative sessions where multiple users in the same physical space can share a synchronized AR experience. This is built on two key components:
- ARKit's World Mapping: Each device creates a local map; ARKit merges these into a shared world map when a common frame of reference is established.
- Multipeer Connectivity: Apple's framework for discovering nearby devices and exchanging data over Wi-Fi or Bluetooth. Developers use this to transmit ARWorldMap data and synchronize the state (position, animation) of virtual objects across all connected devices, enabling true multi-user interaction.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us