TCP


What is TCP?

TCP = Transmission Control Protocol

TCP is one of two main transport protocols (TCP and UDP). It is one of the foundational transport protocols of the internet, used by Web (HTTP/1.1, HTTP/2), databases, Email (SMTP, IMAP, POP3), Remote access (SSH), File transfers (FTP, SFTP), and many other applications.

Basically, whenever you need data to arrive completely and correctly, TCP is the protocol of choice.


Computers communicate by sending packets (small chunks of data). The internet is like a highway system where data packets travel through many routers to reach their destination.

TCP is a set of rules for how two computers have a reliable, ordered conversation over an unreliable network (the internet).

The core problem TCP solves: IP (Layer 3) just throws packets at a destination with no guarantees. Packets can:

  • Get lost

  • Arrive out of order

  • Get duplicated

  • Get corrupted

TCP makes this chaos look like a clean, ordered stream of bytes to the applications above it.

Think of it this way: If you wanted to send a 1000-page book to a friend and wouldn't want to mail it as one package, you could:

  1. Split it into smaller packages

  2. Number each package

  3. Get confirmation each one arrived

  4. Resend any that got lost

That's exactly what TCP does with your data!

How TCP Works: The Three-Way Handshake

Before sending data, TCP establishes a connection using a "three-way handshake":

This handshake ensures both sides are ready to communicate before any data is sent.

Data Transfer: How TCP Ensures Reliability

Once connected, TCP breaks your data into segments and manages their delivery:

Key TCP Features Explained

1. Sequence Numbers

Every byte of data gets a sequence number. This allows TCP to:

  • Detect missing packets

  • Reorder packets that arrive out of sequence

  • Remove duplicates

Example: If you're downloading a 1MB file, TCP might number bytes 1 to 1,000,000. If packets arrive as 1-500, 1001-1500, 501-1000, TCP reorders them correctly.

2. Acknowledgments (ACKs)

The receiver sends acknowledgments back saying "I got packet #X successfully." If the sender doesn't get an ACK within a timeout period, it resends the packet.

3. Flow Control

TCP prevents the sender from overwhelming the receiver. The receiver tells the sender "I can handle X more bytes right now" using a window size.

Real-world analogy: Like a teacher asking "Can everyone keep up?" before moving to the next topic.

4. Congestion Control

TCP detects network congestion and slows down transmission to avoid making it worse. It starts slow and gradually increases speed until packets start getting lost, then backs off.

Simple Example: Visiting a Website

Let me walk you through what happens when you type www.example.com in your browser:

  1. Three-way handshake: Your computer and the web server establish a TCP connection

  2. HTTP Request: Your browser sends "GET /index.html" broken into TCP segments

  3. Server Response: The server sends back the HTML, CSS, images - all broken into packets

  4. TCP manages: Ordering, retransmission of lost packets, flow control

  5. Connection Close: A four-way handshake closes the connection when done

All of this happens in milliseconds!

The TCP Packet Structure

Here's a simplified view of what a TCP segment contains:


TCP vs UDP: When NOT to Use TCP

TCP has a counterpart called UDP (User Datagram Protocol). UDP is:

  • Faster but unreliable

  • No connection setup

  • No guaranteed delivery

  • No ordering

UDP is used for:

  • Live video/audio streaming (losing a frame is okay)

  • Online gaming (old data is worthless)

  • DNS lookups (one simple request/response)

TCP is used when: You need every byte to arrive correctly - emails, file downloads, web pages.

Practical Analogy: Sending a Book

Imagine you want to send a book to a friend, page by page:

TCP approach:

  1. Handshake: Call your friend: "Ready to receive my book?" They confirm.

  2. Numbering: Number each page (1, 2, 3, ...)

  3. Send & Confirm: Mail page 1, wait for "Got page 1!" text

  4. Lost pages: If page 5 confirmation never comes, send page 5 again

  5. Order matters: Friend waits for page 3 before reading page 4

  6. Flow control: Friend says "Slow down, my mailbox is full!"

  7. Finish: When all pages confirmed, both agree "Book complete!"

UDP approach (for comparison):

  1. No call ahead - just start mailing pages

  2. No tracking - if pages get lost, oh well

  3. Friend reads pages as they arrive, even out of order

  4. Much faster, but might get incomplete book!


TCP/IP

TCP/IP is the fundamental suite of protocols that runs the internet.

  • IP handles addressing + routing (getting packets to the right machine).

  • TCP handles reliable delivery (ordered, no loss).

  • UDP handles fast delivery (no guarantee).

TCP ensures:

  • Three-way handshake (SYN → SYN/ACK → ACK)

  • Retransmits if packets are lost

  • Reorders packets

  • Guarantees delivery

For example, all these run over TCP:

  • Databases (Postgres, MySQL, SQL Server)

  • Kafka

  • Spark internal communication

  • APIs (HTTP)

  • SSH tunnels

  • SFTP transfers

If TCP handshake fails → your pipeline fails.


Last updated