Skip to content

To be sorted

Calculate network bandwidth requirement

How to calculate network bandwidth requirements for a video sharing site like YouTube. Assume average video size is 100MB and 500000 users uploading 1 video per day. 5 million daily active users seeing 5 videos per day.

You will probably not upload all video to 1 DC and 1 host. How many DC and hosts will you use? Let’s assume we need this globally. So 4 availability zones, 2 DC each, 2 hosts per DC

Design YouTube view history

Design YouTube view history. Discuss system architecture, data storage etc.

What is the storage system you would use, why and how?

Sync data for large number of devices

Design of how data is synced when there are large number of devices and data is updating rapidly. Like we wear fitness bands, data is updated with every step in the band and then synced with the global storage.

What is the point of sync that on every step? There are multiple reasons of not to do that

  1. it's expensive
  2. the device doesn't have an internet connection all the time
  3. it consumes a lot of battery
  4. and probably the main reason: it is not needed to have such an latency on syncing that. it's not a HFT thing

Then how is data synced behind the scenes? What is the architecture? I'd upload the data (whatever precision is required) into a cell phone via bt and just send an butch update to the server. DB depends on requirements, might be wide column, or time series db

Consistent Hashing

Fanout Implementation

What's your thought for fanout implementation - listen change stream + lambda instead of a queue + consumer? consider you have high write throughput. I wanted to understand the limitation of Lambda. I got the answer. Example, timeline update for follower in twitter system. If you want to build listen change stream you need to use a queue internally + consumer.

Lambda

I dont recommend lambda for low latency, HA and high throughput usecases. Lambda cold starts are pretty well knows time consumers and you'd have to account to the latency addition (sometimes in seconds).

https://aws.amazon.com/blogs/compute/new-for-aws-lambda-predictable-start-up-times-with-provisioned-concurrency/

You won't have cold start with Provisioned Concurrency. You are gonna pay for it and it's not cheap. basically what PC does is to warm up some additional containers (more containers, more $), ready to serve a request. also say you have 100 containers ready and you get a burst call of 101 requests which makes the last call a cold-start one.

Other References

Design Instagram vs Twitter

Can I say that system design for twitter vs instagram is the same except that twitter has 140 char limit? Rest seems to be same. user posts txt, img & video. Fanout to followers with celebrity as special case. Read heavy.. say 10 times of write.. eventual consistency is fine.

  • follow part is totaly different in twitter and insta
  • Instagram should be media heavy than Twitter
  • The way news feed is computed is totally different in both. The main idea of any news feed is fanout.

In write heavy scenario. if we have more reads that writes its better to spend more time in write to prerate data so that read will be cheaper.

Kafka

It is worth noting that typically only one consumer within a consumer group is subscribed to a partition... [1] this is how Kafka achieves high message processing throughput. So, even in the case of multiple partitions, messages within a single partition are truly processed in the order they were sent.

Also worth noting is that partitions are specific to Kafka. For example, Google Cloud Pub/Sub doesn’t expose partitions to the user — they are there, but they are behind the scenes. [2].

In a systems design interview, it is sufficient to just talk about topics and subscribers. There is no need to go deeper than topics or to mention specific products. In fact, I am told by an insider that mentioning specific products is frowned upon at Google, which has its own version of all the open-source projects (and more).

[1] “Each partition is connected to at most one consumer from a group.” https://blog.cloudera.com/scalability-of-kafka-messaging-using-consumer-groups/

[2] “Partitions are not exposed to users.” https://medium.com/google-cloud/google-cloud-pub-sub-ordered-delivery-1e4181f60bc8

Distributed Consensus

To get Redis to be fault tolerant for this use case, you need to give up the idea of a partially synchronous system, or you need to sync up state using transactions, or you need a Redis master node. Basically to have consistent state in Redis is difficult and to get it, you need to give up a lot.

So one answer is a distributed consensus algo, where at least the majority of nodes agree on state. This eliminates a lot of the problems and possible bottlenecks that Redis might have.

But Replicated State machines, configuration stores, leader election, distributed locking are all very good use cases for distribute consensus. Distributed concensus algos, in general, allow you to cheat some of the problems that "normal" concensus patterns might have.

There's no free lunch of course, these algos have serious problems with round trip times. and other concerns. It's not some magic solution.

If that data can't change, it reduces the complexity and need for these algos.

https://sre.google/sre-book/managing-critical-state/

Open Questions

  • LLD design for Log4J library

Really Random things