Case Study

Contech Media Video Processing Platform

Built a sophisticated, multi-tiered video processing platform with AI-powered OCR, SDH, and Audio Description services using modern cloud architecture.

3
Processing Phases
12
Microservices
21
Firebase Functions
Multi-Cloud
Storage Strategy

Project Overview

Contech Media (also called Phonetic) needed a sophisticated video processing platform that could handle multiple types of video enhancement services with AI-powered automation. The platform needed to process videos for OCR text detection, SDH subtitles, and Audio Description generation.

Multi-Phase Processing

Three distinct processing phases with specialized AI services for each requirement

Real-time Updates

Live processing status updates with Firebase real-time communication

Multi-Cloud Architecture

Intelligent routing across AWS S3, Google Cloud Storage, and Azure Blob

Technical Stack

Frontend Angular 15
Backend AWS Lambda
Database Firebase Firestore
AI Services OpenAI + PyAnnote
Storage Multi-Cloud

System Architecture

A sophisticated, multi-tiered system that integrates modern frontend technologies, microservices architecture, AI capabilities, and multi-cloud strategies.

Frontend Layer

Angular 15 Single Page Application
117KB+ components with comprehensive functionality
Real-time Firebase listeners for live updates
Multi-cloud upload support (AWS S3, GCP, Azure)

Backend Microservices

12 microservices across 3 processing phases
Node.js 20 & Python 3.9 runtime
AI Integration: OpenAI, AWS Rekognition, PyAnnote.ai
FFmpeg for video and audio processing

Bridge Services

21 Firebase cloud functions
Payment processing with Stripe integration
File transfers and AI operations
44KB, 1482 lines of serverless code

Processing Phases

Three distinct processing phases, each with specialized AI services and microservices architecture.

1

OCR (Optical Character Recognition)

Phase 1 - Text Detection in Video Frames

Technology Stack

Node.js 20 runtime
AWS Rekognition for text detection
SRT subtitle file generation

Process Flow

S3 upload trigger
Video metadata extraction
Text detection in frames
SRT generation & upload
2

SDH (Subtitles for Deaf and Hard of Hearing)

Phase 2 - Audio Transcription & Speaker Diarization

Service 1: Audio Transcription
• Python 3.9 runtime
• OpenAI Whisper integration
• PyAnnote.ai diarization
Service 2: Speaker Processing
• Speaker assignment
• Data combination
• PyAnnote webhook processing
Service 3: Enhanced SRT
• Sound classification
• Speaker video extraction
• Enhanced subtitle generation
3

AD (Audio Description)

Phase 3 - Scene Analysis & Audio Description Generation

AI-Powered Services

Video Segmentation

Intelligent video segmentation with non-dialogue detection

Frame Extraction

Frame extraction at 1 FPS for scene analysis

GPT-4 Vision

Scene analysis and description generation

Final Integration

Audio Integration

Text-to-speech audio descriptions integrated with original video

Final Video Generation

FFmpeg filter complex for audio replacement and final video output

Technical Highlights

Advanced technical implementation ensuring high availability, scalability, and user satisfaction.

Frontend Excellence

Responsive Design

Bootstrap 5.3.3 and Material Design for modern, mobile-first UI

Advanced Video Player

VideoGular 7.0.1 with synchronized subtitle support

Payment Integration

Stripe.js 1.54.2 for seamless payment processing

Multi-Cloud Support

AWS SDK, Google Cloud Storage, Azure Blob integration

Backend Architecture

Serverless Microservices

AWS Lambda with independent scaling and resource allocation

AI Service Integration

OpenAI GPT-4, Whisper, PyAnnote.ai, AWS Rekognition

Real-time Database

Firebase Firestore with live listeners and updates

Security & Performance

Firebase Authentication, CORS, encryption, rate limiting

Results & Impact

The platform successfully delivered enterprise-grade video processing capabilities with AI-powered automation.

Scalable Architecture

Multi-tiered system that scales independently based on demand

AI-Powered Processing

Multiple AI providers for optimal results across all processing phases

Real-time Updates

Immediate status updates and feedback through Firebase integration

Multi-Cloud Strategy

Flexible, resilient storage options with intelligent routing

Technical Excellence Achieved

Performance Metrics

  • 1GB memory allocation per Lambda function
  • 15-minute timeout for processing operations
  • Independent scaling based on processing demand

Quality Assurance

  • Comprehensive error handling and logging
  • Real-time monitoring and alert systems
  • Graceful degradation for service failures

Ready to Build Your Next Big Idea?

Let's turn your vision into a working product. From MVPs to AI tools, I help founders and creators ship fast without the chaos.