How does Drik process 200+ camera feeds in real time?

Drik uses on-premises GPU infrastructure with RTSP ingest, NVDEC hardware decoding, CUDA preprocessing, and a TensorRT-optimized inference cluster (Dhruva) running YOLOv8+ detection at FP4/INT8 precision with sub-17ms latency across 200+ simultaneous camera feeds on RTX PRO 5000 x4 GPUs.

What are the five levels of visual reasoning?

The five levels are: L1 Detection (object detection and classification), L2 Recognition (license plate reading and vehicle re-identification), L3 Description (natural language scene understanding), L4 Reasoning (cause-effect analysis and violation detection), and L5 Prediction (anticipating future events using the CVWM vision world model).

What hardware does Drik run on?

Drik runs on on-premises GPU server racks with RTX PRO 5000 x4 GPUs, using NVDEC hardware decoding, fused CUDA preprocessing kernels, and TensorRT for optimized inference at FP4 and INT8 precision. Data is stored on PostgreSQL with Redis caching and a 100TB NAS archive.

How does the license plate recognition (ANPR) work?

DrikNetra, the ANPR system, uses YOLOv8-nano for plate detection, multi-frame super-resolution (5 crops with homography alignment and ESRGAN 4x upscaling), CRNN+CTC OCR for text extraction, and Vahan API lookup for vehicle owner, registration, and insurance details.

Architecture Documentation

The world has a billion cameras. None of them can reason.

Drik processes 200+ camera feeds simultaneously on on-premises GPU infrastructure, generating legally admissible violation evidence and city-scale traffic intelligence in real time.

200+ cameras

< 17ms latency

FP4/INT8 inference

On-Premises deployment

Reasoning Progress

Five Levels of Visual Reasoning

We measure our AI across five levels of visual understanding — from basic detection to predictive reasoning. Here's where we stand today.

L1 Detect

L2 Recognize

L3 Describe

L4 Reason

L5 Predict

Level 3 achieved — Level 4 in progress

1 LIVE

Detect — It sees every object

Every vehicle, every person, every object — detected, classified, and counted in real time across every frame.

2 LIVE

Recognize — It reads every plate, every face

License plates extracted. Vehicle makes and models identified. Colors, modifications, damage — all catalogued instantly.

3 LIVE

Describe — It describes what it sees in words

Not just labels. Full natural-language descriptions of every scene, every event, every anomaly. Searchable. Queryable.

4 IN PROGRESS

Reason — It understands cause and effect

Why did traffic stop? What caused the accident? Which vehicle triggered the chain reaction? The AI builds causal chains.

5

Predict — It anticipates what happens next

Pattern recognition across time and space. Congestion forecasts. Accident risk zones. Before it happens, the system sees it forming.

Product Universe

The Drik Ecosystem

All products are named after Sanskrit stars and cosmic concepts. Dhruva, the Pole Star, powers everything.

ध्रुव

Dhruva

Core Engine LIVE

Pole Star — always present, always on

The foundational inference engine powering all products. SSM-based continuous video reasoning across 200+ camera feeds simultaneously.

अश्विनी

Ashvinī

Traffic Intelligence LIVE

Twin Horsemen of dawn — swift justice

Traffic violation detection, ANPR, challan generation, and enforcement intelligence built on top of Dhruva.

चित्रा

Chitrā

Annotation Tool LIVE

Brightest star Spica — precision in labeling

AI-assisted video annotation with active learning propagation. Open-sourced as DrikLabel.

माया

Māyā

Synthetic Data IN DEV

Cosmic illusion — creates training worlds

Unreal Engine pipeline generating photorealistic synthetic training data with perfect ground truth labels.

अग्नि

Agni

Edge Deploy LIVE

Sacred fire — forges edge engines

Edge deployment kit with TensorRT optimization, FP4/INT8 quantization, and OTA model updates.

Future Horizons

Rohiṇī Smart City Vishākhā Manufacturing Jyeshthā Security Brahmāṇḍa World Model Ākāsha Cloud API

System Architecture

L1 Container Architecture

Six deployable containers running on-premises — from RTSP ingest to challan delivery, all on a single GPU server rack.

Ingest Inference Events + ANPR Delivery

Under the Hood

Low-Level Architecture

Seven core modules power the Drik pipeline — from raw video ingestion to predictive intelligence.

01

Ingestion Layer

LIVE

RTSP PoolCUDA DecodeAdaptive Sampler

02

Detection Engine

LIVE

YOLOv8+TensorRT FP4Feature Pyramid

03

Multi-Object Tracking

LIVE

ByteTrackKalman FilterRe-ID

04

License Plate Recognition

LIVE

DrikNetraSuper-ResPaddleOCRVahan API

05

Violation Rule Engine

LIVE

Rule EngineZone GeometrySignal State

06

Evidence Pipeline

IN DEV

Evidence PackagerChallan GenVehicle ReID

07

CVWM (Vision World Model)

PLANNED

Mamba-2 SSMFastViT Encoder~63M params

Performance

330x Faster Than Naive

Five layers of optimization turn a 50ms-per-frame baseline into 0.15ms amortized — enough to handle 200+ cameras on a single GPU server.

2.5x frame reduction

Adaptive Sampling

200 cameras at 25 FPS = 5,000 frames/sec reduced to 2,000 via 10 FPS sampling. Motion-adaptive: static scenes drop to 2 FPS.

16x throughput gain

Dynamic Batching

Collate 32 frames from different cameras into a single GPU batch. GPU utilization jumps from ~20% to saturated.

8x speed improvement

FP4 Quantization

Native FP4 Tensor Core support on RTX PRO 5000. Mixed-precision: FP4 backbone + FP16 heads. Less than 2% accuracy loss.

2-5x additional speedup

TensorRT Compilation

Layer fusion (Conv+BN+ReLU), kernel auto-tuning, memory planning, and multi-stream execution.

33% latency reduction

Pipeline Parallelism

Three-stage pipeline on separate CUDA streams: preprocess, infer, and postprocess overlap in time.

End-to-End Latency Budget

Full path with ANPR + ReID — per violation event

3

8

3

Decode

1ms

Preprocess

0.5ms

Backbone

3ms

NMS

0.5ms

Tracking

0.3ms

Rules

0.2ms

ANPR

8ms

ReID

3ms

Output

0.5ms

Total (full path) ~17ms

Fast path (no violation) ~5.5ms

Per-frame amortized (B=32) ~0.17ms

Naive ~50ms/frame 1 camera

Optimized ~0.15ms/frame 200+ cameras

How It Works

Your cameras. Our intelligence. Any scale.

Deploy Drik on-premise, in the cloud, or both. Our nodes scale with your camera network — from a single building to an entire city.

CAM-001

CAM-002

CAM-003

CAM-004

CAM-005

CAM-006

CAM-007

CAM-008

CAM-009

CAM-010

CAM-011

CAM-012

DRIK NODE

< 17ms Full Latency

4x RTX PRO 5000 GPU Cluster

200+ Cameras

On-Premises Deployment

Open Source

Built in the Open

Our tools, models, datasets, and benchmarks are open-source. We build for the community and with the community.

DrikLabel

tools

AI-assisted video annotation tool with active learning propagation.

TypeScript

DrikSynth

tools

Unreal Engine based synthetic video data generator for training and validation.

C++ / Python

DrikNetra

models

Multi-frame license plate super-resolution from video clips.

Python

drik-bench

benchmarks

Benchmark suite for evaluating AI models on Indian traffic scenarios.

Python

indian-plate-dataset

datasets

Diverse Indian license plate dataset with varied formats, fonts, and languages.

Dataset

traffic-chaos-100

datasets

100 challenging Indian traffic scenes for model evaluation.

Dataset

drik-track

models

Vehicle tracking toolkit optimized for Indian traffic conditions.

Python

drik-detect

models

Detection model configs and weights for Indian traffic objects.

Python

View all projects

Stack

Built With

From GPU kernels to dashboards — the full production and training stack.

ML & Training

PyTorch 2.x PyTorch FSDP Weights & Biases DVC Unreal Engine 5

Inference

TensorRT CUDA C++ ONNX Runtime FP4 / INT8 NVDEC

Video & Vision

FFmpeg GStreamer OpenCV PaddleOCR ByteTrack

Backend

Python 3.11+ FastAPI Redis Streams gRPC MQTT

Frontend

Next.js React React Native Tailwind CSS

Infrastructure

Docker PostgreSQL Redis Prometheus Grafana Nginx

Hardware

RTX PRO 5000 AMD EPYC NVLink 10GbE