Design YouTube using IcePanel

📝 Introduction

In this post, we’ll share an example architecture for YouTube – the largest video-sharing and streaming platform where users can watch and upload videos. We’ll design this as software architects and create four hierarchical diagrams: Context, Container, Component, and Code (not sure what these are? Read this). We’ll annotate the building blocks and highlight user interaction (data flow) within the system using IcePanel Flows.

Let’s first set the scope by defining the behaviour of the system (functional requirements) and the qualities the system should have (non-functional requirements). Afterwards, we’ll go through each layer of the C4 model.

You can view the final architecture at this link: https://s.icepanel.io/DWnaysJ3cbCQqg/qmVa

🔎 Scope

We have two simple requirements for a video-sharing platform, users should be able to:

Watch videos
Upload videos

For non-functional requirements, the system should be:

99.99% available for watching videos.
Eventually consistent for uploads (will handle video processing asynchronously).
Highly performant with different resolutions for video streaming.

Let’s start with the first diagram, Context.

Level 1 - Context

This high-level view defines the main actors interacting with our internal system (YouTube). In this case, we have two:

Viewer: End user who watches videos on the platform.
Content Creator: End user who uploads videos on the platform.

Level 1 - Context diagram

Given our defined scope, we don’t have any external systems in this view. The core business logic is running within the YouTube system.

Level 2 - Container

This diagram is where we model a collection of independently deployable or runnable applications or data stores that are essential for the overall software system to function. This could be a web application, server, datastore, or a serverless function like AWS lambda.

We’ll design this system as an event-based architecture. This approach gives us several benefits that align with our non-functional requirements, such as:

Asynchronous processing: Long-running tasks like video transcoding and analysis can be processed in the background without blocking user requests or other services.
Improved scalability: Each service can scale independently based on its specific workload, and message queues naturally handle load buffering during traffic spikes.
Better resource utilization: Workers can process tasks from queues at their own pace, preventing system overload and enabling more efficient use of computing resources.
Easier debugging and monitoring: Event flows create a clear audit trail, making it simpler to track video processing stages and identify bottlenecks.

Our system is composed of the following Containers:

Upload Service: Manages video uploads to pre-processed storage and writing metadata to the database.
Raw Videos / YouTube Videos: S3 buckets for storing raw / processed videos.
Chunker: Event-based worker responsible for segmenting raw videos into 5-10 sec clips for later processing.
Video Processing Service: Orchestrates the video processing workflow by calling validation services and kicking off processing jobs in parallel.
Video Inspection Service: Handle video processing using AWS Rekognition service.
Policy Validation Service: AI-powered solution for inspecting video context (text, audio, etc.) using AWS Bedrock.
Asynchronous Workers: A group of AWS Lambdas that scale up or down based on demand and event traffic. It is a virtual cluster of transcoding, audio processing, and transcript generation jobs.
Watch Service: Handles video playback requests and serves video metadata to viewers.
Video Metadata Database: Stores video metadata (e.g., title, description, thumbnail, date, etc.) and URLs of processed videos at different resolutions (e.g., 420p, 780p, etc.)
Video Metadata Cache: Caches frequently accessed videos for faster load time.

In this architecture, we use the following technologies:

AWS API Gateway: A scalable and secure entry point for all API requests coming from the frontend.
AWS EC2: Web service that provides reliable and secure compute for the different microservices.
AWS Lambda: A lightweight, serverless compute service that is cost-effective, event-driven, and automatically scales on demand.
PostgreSQL: An ACID-compliant relational database that ensures data integrity and strong consistency. Ideal for pre-defined schemas like a video metadata table.
Redis: A fast in-memory cache that improves response times and reduces database load.

Level 2 - App diagram

We’ve designed two common data flows using IcePanel. Check out these flows and play them step by step to see how our system works.

Upload a new YouTube video: https://s.icepanel.io/DWnaysJ3cbCQqg/h1Au
Watch a YouTube video: https://s.icepanel.io/DWnaysJ3cbCQqg/eyRu

Level 3 - Component

In the C4 model, a component is a grouping of related functionality encapsulated behind a well-defined interface. For example, a collection of classes behind an interface. Let’s look at the Components in YouTube.

1. Upload Service

The diagram below shows how a user interacts with the Upload service to upload a new video. When a user sends a POST /video request, the Upload API first creates an entry for the upload status and metadata to the database, then calls the S3 Client to generate a pre-signed URL for secure raw video storage. This pre-signed URL is returned to the user, who uses it to upload their video directly to the Raw Videos S3 bucket. Once the upload is finished, the Upload Repository stores the new S3 url to the Video Metadata Database (PostgreSQL), storing other information like upload date, thumbnail, other metadata, and the initial processing status for subsequent workflow stages.

Level 3 - Component diagram for Upload service

2. Video Processing Service

This service operates as an event-driven orchestrator for video processing workflows. When a new video is uploaded to the Raw Videos S3 bucket, it triggers an event that sends chunked videos (5-10 sec clips) for processing via a message queue. The Processing Worker listens to S3 bucket events and initiates the workflow by calling the Video Orchestrator. The Video Orchestrator manages and coordinates all video processing tasks in a sequential manner, starting with the Video Analyser for content inspection. If the video fails the inspection stage, the workflow is blocked and processing terminates. Upon passing inspection, the orchestrator then calls the Policy Checker for content validation against community guidelines. If the video fails the policy check stage, the workflow is similarly blocked, and no further processing occurs. Only after successfully passing both inspection and policy checks does the orchestrator dispatch jobs to the Job Manager. The Job Manager sends video processing jobs to Asynchronous Workers via message queues, which handle the compute-intensive transcoding, audio processing, and transcript generation tasks without blocking other operations. The processed video is then stored into the YouTube S3 bucket with different resolutions and updated with S3 URLs to the database for serving users.

Level 3 - Component diagram for Video processing service

3. Watch Service

The Watch Service handles all video streaming requests in our YouTube system. When a viewer sends a GET request to watch a video, the Watch API receives the request and first checks if the requested manifest file for the video is already in the cache. If not available in cache, the Watch API calls the Watch Repository to query the manifest files and video metadata from the Video Metadata Database (PostgreSQL). Once the manifest file is retrieved, the user then watches the video by repeatedly requesting video segments in the manifest file from the main S3 bucket through dynamic HTTP streaming protocols (e.g., DASH or HLS). The user does not download the whole video, they download the chunks sequentially as they’re watching.

Level 3 - Component diagram for Watch service

Let’s go one level deeper with the source code in the Code layer.

Level 4 - Code

This is where we can view implementation details at the code level. We don’t recommend creating extensive diagrams for this; instead, we link directly to the code. However, here’s the general structure of these components for reference. We’ll briefly go over two Components: Upload and Watch.

Code Classes in “Upload Component”

This component handles the video upload workflow. The Upload API class provides RESTful endpoints for users to initiate video uploads and obtain secure upload URLs. The S3 Client manages interactions with the S3 bucket, generating pre-signed URLs that allow users to upload videos directly to the Raw Videos bucket without routing the file through the application server. The Upload Repository serves as the data layer for creating and managing video metadata entries in the database.

Upload API

POST /upload - Initiate a new video upload and receive pre-signed URL
GET /upload/:videoId/status - Check upload status for a specific video
POST /upload/:videoId/complete - Notify system that upload is complete

S3 Client

generatePresignedUrl(videoId, expiryTime) - Create secure S3 upload URL with expiration

Upload Repository

createVideoEntry(userId, videoId, metadata) - Create new video record in database
updateVideoStatus(videoId, status) - Update video upload state (pending, completed, failed)

Code Classes in Watch Component

This component manages video streaming and playback requests. The Watch API class exposes RESTful endpoints for retrieving video manifest files and metadata. The API first checks if the requested video files are already cached in Redis before querying the database. The Watch Repository serves as the data layer for fetching video information and streaming URLs from the Video Metadata Database.

These classes have roughly the following methods:

Watch API

GET /watch/:videoId - Retrieve video details (title, description, etc.) and manifest files.
GET /videos/:videoId/:segmentId - Serve individual video segment files directly from S3 (or CDN).

Video Cache Manager

getCachedVideo(videoId) - Retrieve video metadata & manifest file from cache
cacheVideo(videoId, videoData, ttl) - Store video metadata & manifest file with time-to-live

Watch Repository

getVideoById(videoId) - Fetch video metadata from database.
getVideoManifestFile(videoId) - Retrieve manifest file for streaming.

Conclusion

In this post, we designed a video-streaming platform using the C4 model on IcePanel. We began with the core requirements and modeled the system from the top down, starting with the Context layer, followed by the Container, Component, and finally the Code layer.

P.S. We’ve also designed Ticketmaster on IcePanel! Check out: https://icepanel.io/blog/2025-10-20-design-ticketmaster-using-icepanel

Let us know which system you’d like to see modeled next on IcePanel!

Get in touch

Get in touch

Design YouTube using IcePanel

📝 Introduction

🔎 Scope

Level 1 - Context

Level 2 - Container

Level 3 - Component

1. Upload Service

2. Video Processing Service

3. Watch Service

Level 4 - Code

Code Classes in “Upload Component”

Upload API

S3 Client

Upload Repository

Code Classes in Watch Component

Watch API

Video Cache Manager

Watch Repository

Conclusion

📚 Resources

Get cool architecture news

Get in touch