Contech Media Case Study - Video Processing Platform

Project Overview

Contech Media (also called Phonetic) needed a sophisticated video processing platform that could handle multiple types of video enhancement services with AI-powered automation. The platform needed to process videos for OCR text detection, SDH subtitles, and Audio Description generation.

Multi-Phase Processing

Three distinct processing phases with specialized AI services for each requirement

Real-time Updates

Live processing status updates with Firebase real-time communication

Multi-Cloud Architecture

Intelligent routing across AWS S3, Google Cloud Storage, and Azure Blob

Technical Stack

Frontend Angular 15

Backend AWS Lambda

Database Firebase Firestore

AI Services OpenAI + PyAnnote

Storage Multi-Cloud

System Architecture

A sophisticated, multi-tiered system that integrates modern frontend technologies, microservices architecture, AI capabilities, and multi-cloud strategies.

Frontend Layer

Angular 15 Single Page Application

117KB+ components with comprehensive functionality

Real-time Firebase listeners for live updates

Multi-cloud upload support (AWS S3, GCP, Azure)

Backend Microservices

12 microservices across 3 processing phases

Node.js 20 & Python 3.9 runtime

AI Integration: OpenAI, AWS Rekognition, PyAnnote.ai

FFmpeg for video and audio processing

Bridge Services

21 Firebase cloud functions

Payment processing with Stripe integration

File transfers and AI operations

44KB, 1482 lines of serverless code

Processing Phases

Three distinct processing phases, each with specialized AI services and microservices architecture.

1

OCR (Optical Character Recognition)

Phase 1 - Text Detection in Video Frames

Technology Stack

Node.js 20 runtime

AWS Rekognition for text detection

SRT subtitle file generation

Process Flow

S3 upload trigger

Video metadata extraction

Text detection in frames

SRT generation & upload

2

SDH (Subtitles for Deaf and Hard of Hearing)

Phase 2 - Audio Transcription & Speaker Diarization

Service 1: Audio Transcription

• Python 3.9 runtime

• OpenAI Whisper integration

• PyAnnote.ai diarization

Service 2: Speaker Processing

• Speaker assignment

• Data combination

• PyAnnote webhook processing

Service 3: Enhanced SRT

• Sound classification

• Speaker video extraction

• Enhanced subtitle generation

3

AD (Audio Description)

Phase 3 - Scene Analysis & Audio Description Generation

AI-Powered Services

Video Segmentation

Intelligent video segmentation with non-dialogue detection

Frame Extraction

Frame extraction at 1 FPS for scene analysis

GPT-4 Vision

Scene analysis and description generation

Final Integration

Audio Integration

Text-to-speech audio descriptions integrated with original video

Final Video Generation

FFmpeg filter complex for audio replacement and final video output

Technical Highlights

Advanced technical implementation ensuring high availability, scalability, and user satisfaction.

Frontend Excellence

Responsive Design

Bootstrap 5.3.3 and Material Design for modern, mobile-first UI

Advanced Video Player

VideoGular 7.0.1 with synchronized subtitle support

Payment Integration

Stripe.js 1.54.2 for seamless payment processing

Multi-Cloud Support

AWS SDK, Google Cloud Storage, Azure Blob integration

Backend Architecture

Serverless Microservices

AWS Lambda with independent scaling and resource allocation

AI Service Integration

OpenAI GPT-4, Whisper, PyAnnote.ai, AWS Rekognition

Real-time Database

Firebase Firestore with live listeners and updates

Security & Performance

Firebase Authentication, CORS, encryption, rate limiting

Results & Impact

The platform successfully delivered enterprise-grade video processing capabilities with AI-powered automation.

Scalable Architecture

Multi-tiered system that scales independently based on demand

AI-Powered Processing

Multiple AI providers for optimal results across all processing phases

Real-time Updates

Immediate status updates and feedback through Firebase integration

Multi-Cloud Strategy

Flexible, resilient storage options with intelligent routing

Technical Excellence Achieved

Performance Metrics

1GB memory allocation per Lambda function
15-minute timeout for processing operations
Independent scaling based on processing demand

Quality Assurance

Comprehensive error handling and logging
Real-time monitoring and alert systems
Graceful degradation for service failures

Ready to Build Your Next Big Idea?

Let's turn your vision into a working product. From MVPs to AI tools, I help founders and creators ship fast without the chaos.

View Services Start Building

Contech Media Video Processing Platform

Project Overview

Multi-Phase Processing

Real-time Updates

Multi-Cloud Architecture

Technical Stack

System Architecture

Frontend Layer

Backend Microservices

Bridge Services

Processing Phases

OCR (Optical Character Recognition)

Technology Stack

Process Flow

SDH (Subtitles for Deaf and Hard of Hearing)

Service 1: Audio Transcription

Service 2: Speaker Processing

Service 3: Enhanced SRT

AD (Audio Description)

AI-Powered Services

Final Integration

Technical Highlights

Frontend Excellence

Responsive Design

Advanced Video Player

Payment Integration

Multi-Cloud Support

Backend Architecture

Serverless Microservices

AI Service Integration

Real-time Database

Security & Performance

Results & Impact

Scalable Architecture

AI-Powered Processing

Real-time Updates

Multi-Cloud Strategy

Technical Excellence Achieved

Performance Metrics

Quality Assurance

Ready to Build Your Next Big Idea?