What is BitTorrent?
BitTorrent is a peer-to-peer (P2P) protocol that enables efficient file distribution by allowing users to download pieces from multiple sources simultaneously while uploading to others.
Key insight: The more people download, the faster everyone gets the file (reverse of client-server).
The Problem: Central Server Distribution
Traditional: 100 users download 1GB file from server Server bandwidth: 100GB total Time: Slow ( bottleneck at server) BitTorrent: 100 users download from each other Total bandwidth: 100x distributed Time: Fast (scales with users)
How BitTorrent Works
Architecture
graph TB
T[Tracker] -->|Peer List| P1[Peer 1<br/>Seeder]
T -->|Peer List| P2[Peer 2<br/>Leecher]
T -->|Peer List| P3[Peer 3<br/>Leecher]
P1 -.->|Pieces| P2
P1 -.->|Pieces| P3
P2 -.->|Pieces| P3
P3 -.->|Pieces| P2
Key Components
1. Torrent File (.torrent)
{
"announce": "http://tracker.example.com:6969/announce",
"info": {
"name": "movie.mp4",
"piece_length": 262144, # 256 KB
"pieces": "<20-byte SHA1 hashes concatenated>",
"length": 734003200, # 700 MB
"files": [...] # For multi-file torrents
}
}
2. Tracker
Role: Coordinates peers Knows: Who has the file Doesn't: Store actual file
3. Peers
Seeder: Has complete file, only uploads Leecher: Downloading file, uploads what it has
Protocol Implementation
Torrent File Parsing
import hashlib
import bencodepy # BitTorrent uses Bencode encoding
class TorrentFile:
def __init__(self, torrent_path):
with open(torrent_path, 'rb') as f:
self.data = bencodepy.decode(f.read())
self.tracker_url = self.data[b'announce'].decode()
self.info = self.data[b'info']
self.piece_length = self.info[b'piece length']
self.pieces = self.info[b'pieces']
self.file_length = self.info[b'length']
self.file_name = self.info[b'name'].decode()
# Calculate info_hash (unique identifier)
self.info_hash = hashlib.sha1(
bencodepy.encode(self.info)
).digest()
def get_piece_hashes(self):
"""Extract individual piece SHA1 hashes"""
hashes = []
for i in range(0, len(self.pieces), 20):
hashes.append(self.pieces[i:i+20])
return hashes
def num_pieces(self):
return len(self.pieces) // 20
# Usage
torrent = TorrentFile('movie.torrent')
print(f"File: {torrent.file_name}")
print(f"Size: {torrent.file_length} bytes")
print(f"Pieces: {torrent.num_pieces()}")
print(f"Info Hash: {torrent.info_hash.hex()}")
Tracker Communication
import requests
import urllib.parse
class TrackerClient:
def __init__(self, torrent, peer_id, port=6881):
self.torrent = torrent
self.peer_id = peer_id # Unique 20-byte ID
self.port = port
def announce(self, uploaded=0, downloaded=0, left=None):
"""Announce to tracker, get peer list"""
if left is None:
left = self.torrent.file_length
params = {
'info_hash': self.torrent.info_hash,
'peer_id': self.peer_id,
'port': self.port,
'uploaded': uploaded,
'downloaded': downloaded,
'left': left,
'compact': 1, # Compact peer list format
'event': 'started' # or 'completed', 'stopped'
}
url = f"{self.torrent.tracker_url}?{urllib.parse.urlencode(params, safe='')}"
response = requests.get(url, timeout=10)
data = bencodepy.decode(response.content)
# Parse peers
peers = self.parse_peers(data[b'peers'])
interval = data[b'interval'] # Re-announce interval
return {
'peers': peers,
'interval': interval
}
def parse_peers(self, peers_data):
"""Parse compact peer list (6 bytes per peer)"""
peers = []
for i in range(0, len(peers_data), 6):
ip = '.'.join(str(b) for b in peers_data[i:i+4])
port = int.from_bytes(peers_data[i+4:i+6], 'big')
peers.append({'ip': ip, 'port': port})
return peers
# Usage
tracker = TrackerClient(torrent, peer_id=b'-PY0001-' + os.urandom(12))
response = tracker.announce()
print(f"Found {len(response['peers'])} peers")
Peer Wire Protocol
import socket
import struct
class PeerConnection:
def __init__(self, peer_ip, peer_port, info_hash, peer_id):
self.peer_ip = peer_ip
self.peer_port = peer_port
self.info_hash = info_hash
self.peer_id = peer_id
self.socket = None
self.am_choking = True
self.am_interested = False
self.peer_choking = True
self.peer_interested = False
self.bitfield = None
def connect(self):
"""Establish connection and handshake"""
self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.socket.connect((self.peer_ip, self.peer_port))
# Send handshake
handshake = self.create_handshake()
self.socket.send(handshake)
# Receive handshake
response = self.socket.recv(68)
self.verify_handshake(response)
return True
def create_handshake(self):
"""BitTorrent handshake message"""
protocol = b'BitTorrent protocol'
return struct.pack(
'!B19s8x20s20s',
len(protocol),
protocol,
self.info_hash,
self.peer_id
)
def request_piece(self, piece_index, begin, length):
"""Request a block from peer"""
message = struct.pack(
'!IBIII',
13, # Message length
6, # Request message ID
piece_index,
begin,
length
)
self.socket.send(message)
def send_piece(self, piece_index, begin, block):
"""Send a block to peer"""
message_id = 7 # Piece message
message = struct.pack(
'!IBII',
9 + len(block),
message_id,
piece_index,
begin
) + block
self.socket.send(message)
def send_have(self, piece_index):
"""Announce that we have a piece"""
message = struct.pack('!IBI', 5, 4, piece_index)
self.socket.send(message)
def send_interested(self):
"""Tell peer we're interested"""
self.socket.send(struct.pack('!IB', 1, 2))
self.am_interested = True
def send_unchoke(self):
"""Unchoke peer (allow downloads)"""
self.socket.send(struct.pack('!IB', 1, 1))
self.am_choking = False
Piece Selection Strategies
1. Rarest First
def select_next_piece(self, peer_bitfields, my_bitfield):
"""Select rarest piece among peers"""
piece_counts = {}
# Count availability of each piece
for peer_bf in peer_bitfields:
for piece_idx in range(len(peer_bf)):
if peer_bf[piece_idx] and not my_bitfield[piece_idx]:
piece_counts[piece_idx] = piece_counts.get(piece_idx, 0) + 1
# Select rarest piece
if piece_counts:
rarest = min(piece_counts.items(), key=lambda x: x[1])
return rarest[0]
return None
# Why rarest first?
# - Prevents pieces from becoming unavailable
# - Improves swarm health
# - Seeders can leave earlier
2. Random First Piece
import random
def select_first_piece(self, available_pieces):
"""Random selection for first piece"""
# Get something fast to start uploading
return random.choice(available_pieces)
3. End Game Mode
def end_game_mode(self, pieces_left):
"""When close to completion, request from multiple peers"""
if len(pieces_left) < 5: # Last few pieces
# Request same pieces from multiple peers
# Cancel duplicates when one arrives
for piece in pieces_left:
for peer in self.connected_peers:
peer.request_piece(piece)
Tit-for-Tat Algorithm
Incentivize sharing: Upload to peers who upload to you.
class TitForTat:
def __init__(self):
self.peer_rates = {} # peer -> upload rate
self.unchoked_peers = []
self.optimistic_unchoke_peer = None
def update_rates(self):
"""Every 10 seconds, update who we unchoke"""
# Sort peers by download rate from them
sorted_peers = sorted(
self.peer_rates.items(),
key=lambda x: x[1],
reverse=True
)
# Unchoke top 4 peers
self.unchoked_peers = [p[0] for p in sorted_peers[:4]]
for peer in self.all_peers:
if peer in self.unchoked_peers:
peer.send_unchoke()
else:
peer.send_choke()
def optimistic_unchoke(self):
"""Every 30 seconds, try a random peer"""
# Give newcomers a chance
choked_peers = [p for p in self.all_peers if p.am_choking]
if choked_peers:
self.optimistic_unchoke_peer = random.choice(choked_peers)
self.optimistic_unchoke_peer.send_unchoke()
# Effect:
# - Fast uploaders get fast downloads
# - Prevents freeloading
# - Optimistic unchoke discovers fast new peers
Distributed Hash Table (DHT)
Trackerless torrents using Kademlia DHT.
class DHTNode:
def __init__(self, node_id, ip, port):
self.node_id = node_id # 160-bit ID
self.ip = ip
self.port = port
self.routing_table = {} # Kademlia routing table
self.peer_storage = {} # info_hash -> [peers]
def find_peers(self, info_hash):
"""Find peers for a torrent"""
# 1. Look in local storage
if info_hash in self.peer_storage:
return self.peer_storage[info_hash]
# 2. Query closest nodes
closest_nodes = self.find_closest_nodes(info_hash)
for node in closest_nodes:
response = self.send_get_peers(node, info_hash)
if 'peers' in response:
return response['peers']
elif 'nodes' in response:
# Recursively query closer nodes
closest_nodes.extend(response['nodes'])
return []
def announce_peer(self, info_hash, port):
"""Announce that we have this torrent"""
closest_nodes = self.find_closest_nodes(info_hash)
for node in closest_nodes:
self.send_announce_peer(node, info_hash, port)
def distance(self, id1, id2):
"""XOR distance (Kademlia)"""
return int.from_bytes(id1, 'big') ^ int.from_bytes(id2, 'big')
def find_closest_nodes(self, target_id, count=8):
"""Find K closest nodes to target"""
all_nodes = list(self.routing_table.values())
all_nodes.sort(key=lambda n: self.distance(n.node_id, target_id))
return all_nodes[:count]
# Magnet link format:
# magnet:?xt=urn:btih:<info_hash>&dn=<name>&tr=<tracker>
# DHT eliminates need for tracker!
Real-World Applications
1. Linux Distributions
Ubuntu 22.04 ISO: - Official torrent: 10,000+ seeders - Direct download: Single server bottleneck BitTorrent: - Download speed: 50 MB/s (from multiple peers) - Server load: Minimal - Cost: Free bandwidth from users
2. Blizzard Games
World of Warcraft patches: - 10GB patch to 10M players - Traditional CDN: $$$ - BitTorrent: Players upload to each other Result: - Faster downloads - Lower server costs - Scalable to millions
3. Facebook Live Video
Facebook uses BitTorrent-like P2P: - Users watching same stream share chunks - Reduces CDN bandwidth by 80% - Lower latency
Performance Analysis
Scenario: 100 users download 1GB file Traditional Server: Server bandwidth: 1 Gbps Time to serve 100 users: 100 seconds Server cost: $$$ BitTorrent (ideal): Each user uploads at 50% of download speed Effective bandwidth: compounds Time for 100 users: ~10-15 seconds Server cost: Minimal (initial seed) Formula: Traditional: T = (N * FileSize) / ServerBandwidth BitTorrent: T ≈ FileSize / AveragePeerBandwidth + (log N)
Interview Tips 💡
When discussing BitTorrent in system design interviews:
- Problem: "How to distribute 10GB file to 1M users without expensive CDN?"
- P2P advantage: "Users become servers - bandwidth scales with users..."
- Tit-for-tat: "Prevents freeloading - upload to get downloads..."
- Rarest first: "Ensures all pieces remain available even if seeders leave..."
- DHT: "Modern torrents don't need trackers - fully decentralized..."
- Real examples: "Blizzard uses P2P for game patches, Facebook for live video..."
Related Concepts
- DHT (Distributed Hash Table) — Decentralized peer discovery
- P2P Networks — Peer-to-peer architectures
- CDN — Content delivery comparison
- Gossip Protocol — Similar distribution pattern
- Load Balancing — Contrasting approach
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Apache Kafka Architecture
Understanding the internals of the world's most popular event streaming platform. Topics, Partitions, Offsets, Consumer Groups, and the transition from ZooKeeper to KRaft.
Load Balancing
Layer 4 vs Layer 7 Load Balancing. Algorithms (Round Robin, Least Connections, Consistent Hashing). Health checks and real-world implementation with Nginx.
DNS Architecture
The phonebook of the internet. How Domain Name System works, the hierarchy of Route 53, and recursive vs iterative resolution strategies.