Systems Design 2024-09-08 4 min read/ Naveen RK

How Instagram Handles Massive Spikes

Imagine this: Justin Bieber posts a new photo on Instagram. Within seconds, millions of fans worldwide rush to hit the “like” button. Now, think about the technical challenge this presents. How do…


How Instagram Handles Massive Spikes

Imagine this: Justin Bieber posts a new photo on Instagram. Within seconds, millions of fans worldwide rush to hit the “like” button. Now, think about the technical challenge this presents.

How does Instagram handle this sudden surge without crashing?

Believe it or not, there was one incident of Justin Bieber in 2019, where there were too many likes flooding his posts which almost crashed Instagram. Woah!

How Instagram Handles Massive Spikes

In this article, we dive into the ingenious solutions that Instagram has implemented to tackle this problem.

In the beginning, Instagram had a straightforward approach to tracking likes on posts.

Here’s a simple example of posts and likestable:

/* POSTS TABLE */
CREATE TABLE posts (
  id BIGINT PRIMARY KEY,
  user_id BIGINT,
  image_url TEXT,
  caption TEXT,
  created_at TIMESTAMP
);
/* LIKES TABLE */
CREATE TABLE likes (
  id BIGINT PRIMARY KEY,
  post_id BIGINT,
  user_id BIGINT,
  created_at TIMESTAMP
);

In this schema, the posts table stores post details, while the likes table records every like with a many-to-one relationship to posts. To get the total likes on a post, Instagram would run a query like this:

SELECT COUNT(*) AS total_likes 
FROM likes
WHERE post_id = {post_id};

This query worked fine for everyday posts, but it fell short when celebrities like Bieber posted. The sudden influx of likes would overwhelm the database, leading to slowdowns and crashes.

Instagram’s first significant improvement was the introduction of denormalized counters. Instead of dynamically counting likes, they decided to store the total like count directly in the posts table:

text
ALTER TABLE posts ADD COLUMN like_count BIGINT DEFAULT 0;

With this new structure, every time a user liked a post, two operations occurred:

  • Insert the like into thelikestable:
INSERT INTO likes (post_id, user_id, created_at) VALUES (?, ?, .. , NOW());

2. Increment the like count in thepoststable:

UPDATE posts  SET like_count = like_count + 1 WHERE id = {post_id};

Similarly, when a like was removed, the operations were:

1. Delete the like from thelikestable:

text
DELETE FROM likes WHERE post_id = ? AND user_id = {user_id};

2. Decrement the like count in thepoststable:

UPDATE posts SET like_count = like_count - 1 WHERE id = {post_id};

This approach allowed Instagram to quickly retrieve the total likes for a post without performing expensive COUNT(*) operations, significantly enhancing performance and scalability.

Yes. In simple words, When executing a COUNT(*) query, databases must scan the entire table, which can be very slow for large tables due to sequential scanning(row by row).

In contrast, performing insertions and updates is often more efficient because the database can directly modify the relevant data without scanning the entire table, with the help of indexes.

This makes insertions and updates generally faster and more efficient than counting rows in large datasets.

Handling millions of likes is one thing, but ensuring data consistency across different regions, database overload, and caches is another challenge. For this, Instagram turned to PgQ, a PostgreSQL extension that queues and processes events reliably.

When a like was added or removed, an event was queued in PgQ:

/* An Example */
SELECT pgq.insert_event('likes_queue', 'like_added', ?::TEXT);

A background worker handles these events, thereby updating the like count in the posts table.

1. Fetch the next batch of events:

text
SELECT * FROM pgq.next_batch('likes_queue', 1000);

2. Update the like count for each event:

/* The below logic applies to deleting like
and other operations can be queued as well */
UPDATE posts SET like_count = like_count + 1 WHERE id = event_data::BIGINT;

3. Mark the batch as processed:

text
SELECT pgq.finish_batch();

**Disclaimer:**The above example is a simplified version of a more robust mechanism implemented in Instagram but more than enough to explain the idea behind it.

With these architectural changes, Instagram was able to handle the massive spikes in likes generated by high-profile posts without significant performance issues or outages.

The combination of denormalized counters and reliable event processing with PgQ provided a robust solution to the challenge of managing sudden surges in activity.

By implementing denormalized counters and leveraging tools like PgQ, Instagram can now handle the likes of Bieber’s posts — and many others — with ease.

So, next time you see a celebrity’s post gobbling millions of likes in real-time, you’ll know the tech magic working behind the scenes to keep everything running smoothly!


🕘 Next Read

a-20-tool-vs-a-191000-bill

It started with a phone call no family ever wants to receive. A man was rushed to the hospital after a heart attack. Four hours later, in the emergency room, he passed away. Everything happened too…

AI4 min read
2024-12-01
ai-driven-development

Look, I’m going to be honest with you. If you’re using Cursor or any AI code editor, you’re probably doing it wrong. And I say this as someone who uses it every single day. You know the drill: paste…

AI7 min read
2024-11-19