URL Shortener System Design (Explained with Two proposed Designs with Pro and Cons of each)

URL Shortener System Design (Explained with Two proposed Designs with Pro and Cons of each)

20.Jun.2021

what is going on guys my name is russain
and in this video i want to build with
you guys a URL shortener so we're gonna
go through as system design interview
questions or this is one of them most
interesting ones right it's very common
and it's really easy to build but Before
we jump into the video guys I want you
to always for any of these exercises
design system design 
you are the system designer and I want
you to treat whatever this is this is
the blank canvas that you put your own
unique design in okay and be proud of
whatever you put in there because it is
an art right I consider system design as
an art some people might not agree but
there are literally thousands and
thousands and thousands of way to solve
any given problem and today's problem is
URL shortener so whatever I'm gonna come
up with is not gonna be perfect whenever
a 30 year experience professional either
it's not perfect because there is always
pros and cons for every system design
and wherever you come up with own it
enjoy the process because it's gonna be
your baby and we feels damn good if you
designed something your own without
actually looking through the best
practices right before you jump into
best practices just do something do a
bad design and start to improve it keep
in mind that everything can be improved
with that said how about we jump into
the video and start building our all URL
shortener so what is a URL shortener
it's a system that you give me a whole
large URL and I'm making it into a tiny
URL and if you give me that tiny URL I'm
gonna give you or redirect back to the
original long URL very useful for social
media cuz I don't or or very useful for
hey take this URL it's easy to remember
then like you
this big right all right that being said
the first design that comes to my mind
and it's no longer it's not it's not
it's not perfect by any means but the
first design is I'm gonna have a
database here okay and and excuse me
guys as I navigate this thing because
it's the first time I do this but here
there's a database I actually like the
other database oh let's put that I
bought for you guys a special thing here
all right so we have a database guys all
right this is a beautiful database and
alright we have a database how about we
add computer guys right as I had another
computer as the client tried let's do
this sounds good
looks good so we have a database we have
a client but since this is kind of a web
application we kind of need also a web
server correct how about we add a web
server here without a web server
hey go that looks like a web server
right so that's where our application
logic will lie this is my database and
this was the climb that request a URL to
be shortened and the same logic that
revert a short URL back to a long one
will live into the same codebase right
yeah I think so unless you guys disagree
all right so this is a database and
here's the first design that comes to my
mind I am going to use a table like this
in the database right and it's gonna be
like an ID and there will be another
field called URL and pretty much that's
it that's it
now you are the long URL will come here
right that's the long URL right long URL
excuse my handwriting here guys but I'm
still working on the sketching pad thing
and this guy is gonna be a serial
feel and and post because there is a
serial field and essentially I see what
it feels like every time you add a row a
new row here this is will be
auto-generated for you to the enemy one
right or two and three and so on
and then it's gonna be a 64-bit so the
maximum number we're gonna get is 64
minus 1 which is a huge number
how about we actually check how big is
this number that's the biggest 64-bit
engine look at this
it's really really huge right it's
around 20 digits right 20 characters a
few little so obviously it's a huge
number right so this is how many rows I
can have in my table right but it is big
that you can store this much URL right
so what will happen here is if you made
a request right
you will just specify the URL and I
think that's enough we're gonna walk
through that right so let's say I'm
gonna shorten a big URL and you're gonna
start writing this here and you're gonna
get a number right so let's walk through
an example how about that right I'm
gonna make a post request here right
post request slash and you're gonna give
a URL right yeah okay that the post
request will be received by the server
and the server will establish a TCP
connection to the database right I'm
assuming then use my sequel or the post
grace I'm I'm going with relational for
consistency reasons here so now I'm
gonna establish a connection here and
I'm gonna write right so this is an
insert statement immediately I'm gonna
do an insert sequel's insert and I'm
gonna specify the URL and what the
database will give me back is a number
either one to do the give me which
sequence ID and that I'll take that to
sequence ID and I'll return it back to
the client so the URL will be / 1 / 2
theta 2 theta / 17 and so on alright so
that's the original design right let's
go through that and we're going to talk
about the pros and cons of this design
sly guys but that's that's one design
right if you don't really care about
having custom URLs that work right but
let's say break down the database design
a little bit here since this is a
database poster a relational database I
want this to have an index I change this
color yeah I can't change this guy how
about that I'm gonna have this as an
index I need an index on this right
probably a primary key that's that's
gonna be my primary key right and it's
gonna by default your primary key is
gonna have an index so look up on this
guy is gonna be extremely fast because
it's an integer it's only 64-bit so the
index size is so tiny right
plus we don't really make a get request
example showing you a good request so
that's the walk through out a get
request example so the client will make
a request says hey get slash seven right
I want the URL slash stab and it's gonna
be the domain slash seven right and that
server will receive that is gonna make a
query and this time is gonna be select
write to the database and the database
will say select where ID is equals seven
my god how fast is this because this is
an indexed column and it's an integer so
the index is so tiny right so the query
will be extremely fast and we're gonna
get back the results okay okay this is
the URL the door and the the server will
return that and work on the URL to the
user suite
so that's that seems okay that seems
okay right so the only index technically
that we need is just this we don't need
another index and search extremely fast
so let's start to scale this up a little
bit I am writing I'm posting a lot of
URLs right and I'm getting this
incremental IDs right so I'm gonna get
up to like a million or two million
three million Ryan does this database
can handle definitely can't handle their
bases can't handle three four billion
records easily and ROI
tink wool wool writing become slower if
I have a kabillion rose here absolutely
not
well we're going to talk about that
because writing here I'm not querying to
write and I am guaranteed that this idea
is always gonna be unique right I'm not
querying the database to check if it
exists or not
the database I'm only writing so writing
is absolutely fast and every time I
write something I'm gonna always get a
new ID right so I don't really need to
query how awesome is this right
immediately get an ID every time whether
it's a billion row whether it's a three
billion or trillion right it's gonna be
fast because right at that at the end
now if you want to get a little bit
technical here this database if it's a
b-tree right based database like that
the engine the storage engine is a
b-tree then writes are gonna get slower
and slower the larger you get because
this index is a b-tree index and as you
guys know with b-trees we try to balance
those streets right and I don't got to
make a video about beech trees maybe and
as they start writing right and running
those B trees was gonna be a get slower
to rebalance because we need to
reshuffle the trees and move the data
structures around so Ryan is gonna be
slower the more data you have but if
you're using an LSM tree right like a
Cassandra let's take no sequel out of
the equation let's say you're using the
rocks DB right and my sequel persona for
example rocks DB or my rocks right so my
sequel can be configured with an LS
engine and LS engine is in this case is
gonna be the best because writing is
gonna be so fast now let's jump into I
think we talked about the URI will talk
about the table design we talked about
the querying here and looks good
this isn't I'm not interested in any
caching or anything guys right you can't
reduce caching but let's just make it
simple this code is gonna be so short
the the code to generate it and I'm
gonna
show you the code as well hopefully I
don't know if I'm gonna do go through
the Kobus the code is really easy to
write here right so that's the first
design is this perfect
absolutely not because Husain I don't
like this / 7 / arias lot of students
like that because it's very productive
all right that predictable design is
gonna you can predict the next URL that
is simply and that's not a good idea for
hackers like if you if you give don't
ever get a bad a cur the clue of this
incremental predictable IDs because they
can do bad things with this all right
but if you don't really care right and
they use it like this these yours are
public anyway right that's okay but what
they can do is essentially they can scan
your database right because they can
just write a code to loop through all
the IDs from 1 to 1 billion and they can
retrieve all your URLs I don't know why
they're gonna do that but they can write
so the predictability is essentially for
attackers is bad so some people don't
like this I am absolutely fine with it
sometimes especially with the benefits
that gives us like this give us extreme
good reads and extreme good writes as
well right
reads are good and writes are good as
well isn't that fantastic right let's
change the color back to whack all right
so that's what we want to do isn't that
awesome guys so that's the first design
their numbers designer okay the pros we
talked about pros very fast write and
read but their cons is the
predictability right and also they're no
custom URL there is no custom your own
this stuff yeah there's no ideas like oh
give me
I want URL to say oh this is a Kindle
Amazon whatever this is Amazon is an
Amazon one I know this is my
presentation for the executives you can
undo custom URLs
if you don't want to do that if you want
to support custom us this design is
useless right so how about we jump in to
another design let's clear out this mess
guys look you'll see how about that
we're clean
guys we're clean we're back alright so
design number two which is what what are
we designing guys here we are designing
a system to support custom URLs in this
case right and and also we don't want we
don't like these URLs that start with
numbers because I don't want to give the
attackers ideas of that's predictability
right but the original design was good
for like automated Twitter URL that the
only system will auto-generate this URL
and store them in that way and the user
will never actually see them short
they always gonna see them long right so
you can use that original design if the
user is not going to see those right but
it's absolutely beneficial right system
number two is I want to support custom
URL so that means if I want to post
right if I on a post here right make a
post request and then slash this URL and
then I want to also specify the custom
URL also I want to specify the custom
URL and yep custom equal whatever ABC
right so I want to specify how the URL
will be stored in short form right so
this is a post request well how are we
gonna do this let's think of the table
design how are we gonna store this thing
well in this case I still need the URL
the actual long URL and there is no
escape
we need the short URL s URL right
that's how it's gonna look like okay
and I'm not sure gonna need a prime
primary key will come to that but these
are the two columns in this case right
so now if I come in with a URL and I
want a post I mean I want to write I
want to generate take that you are a
long URL and make it into a short URL so
what are you gonna do so many ideas the
first that comes we're going to show it
to hash it right and generate a now to
56 by tor which is like a equal to 32 by
try which is a use your character right
and then take for example the first
eight characters if you want just an
eighth fixed eight byte characters right
and you can do a base64 right so they
look like this YouTube URL right which
is like am let me show you so this is
how base 64-yard look like guys right
it's a characters from a lowercase to Z
lowercase from a capital to Z capital
and then all the numbers and plus signs
and all the garbage right so that's how
a base64
nice and YouTube only use this much but
it can serve them forever really
it can give them a lot of characters
right so yeah let's just enjoy and have
fun guys doing this stuff all right so
we're gonna start what eight characters
right so eight bytes take the first egg
white and make them into base64 so
that's the character so we're storing
eight technically eight bytes fear right
and just like ABC whatever and this is
the long URL okay and technically what I
want to do here is I'm gonna make this
into let's use another color I'm gonna
make this into a primary key right so I
want to prevent multiple same URLs from
getting collide because now in the
previous design you don't have collision
you will never have collision the
problem of collision doesn't exist guys
right but here now you have a problem of
collision because now you Shou in
hashing and you getting taking a huge
string and making in the smaller string
which is the idea of collision always
exists so
two ways of solving this problem okay
take that post request do this operation
give that a character's and attempt to
immediately insert in the database
just insert okay don't query because
query is expensive you insert and the
database will take care of this if the
insert was successful you're gonna get a
a nice short URL right we're gonna get
that obviously you know the short y'all
in this case unlike the first design
which you only go in the number right
and then you will get a return back the
URL for you if that failed right because
it can't fail right the insert can fail
because of the duplicate primary key
then the web server here is responsible
of regenerating a brand-new one and how
do you knew that guys right how do you
generate a new one I just used a URL and
and Shou it and took the base64 and I'm
cool with if I'd regenerate it I'm gonna
get the same one so you need the idea of
salting here right so that's one way of
solving it so you generate a random salt
and insert it with the sha and then
generate a brand new base64 take the
first eight characters and then write it
down but the moment you do that you do
you really need to store the salt I'm
not sure that was it we'll think about
it
so guys I didn't really think all of
this through and I'm prepare the best
design for you I'm just walking through
your shoes and doing it as if it's the
first time I don't want you to just
explore everything that's the fun of
this you're a shortener is or any system
design is a fun activity I want you to
have fun right so let's say I'm not
gonna store the salt because do we
really need to I'm not sure right so I'm
gonna store it and then yeah we can now
have a brand new URL right short URL and
then I'm now returning that short URL to
the user right so there are there could
be let's change the color a little bit
here and to
and there could be some sort of a
feedback loop here right going back and
forth back and forth back and forth
because hey I'll try got a collision
because it failed I'm gonna try again
failed I'm gonna try again fail I'm
gonna try again fail them okay you got
an idea so there could be a failure here
but you're relying on the database to
give you the failures very fast because
this is a primary key and it's obviously
indexed right so you can you can have
this loop of failure and this is good
good take a long time to resolve if you
have if you have a bad salt algorithm
essentially right but that's another
idea and now you can just take that you
can also take that URL right if so
that's the idea of in case of I do not
want I want a random short URL but if
you want to take the users input URL and
hey hey use this as a URL that's easy
right you just inserted it you try to
attempt to answer it if it exists you're
gonna get an error and you in that
particular case you need to return the
error to the user right cuz hey this
user this you supplied a custom URL and
that is already exists but in case if
the user did not specify that you don't
want to error alright so you could just
became more complicated just because you
want to support this custom URL thingy
is it worth it maybe yes I'm saying I'm
sorry but I need to do this I need to
support custom yours so you have to pay
the price of this loop you have to pay
the price of this branching of the code
oh if the custom URL return an error or
else do this right so there's like a lot
of stuff happening here okay plus your
primary key index is now bigger guys
right I mean nobody cares about storage
these days but how big it really it is
but now it's a buddy you can do the math
but your index is larger now your right
is gonna be slower because it's a
strength versus it's eight bytes for
God's sake versus 64-bit Walter de
borbón
but integers versus strings yet and this
is the same same index I so I apologize
for that but yeah it's a character
versus strength versus just numbers
right but now it's a better design isn't
it right
I have characters I have just random
strings and instead of just predictable
numbers that users can scan my database
I don't want them to scan my date it was
to get all the URLs right so this is
also obviously better what do you guys
think let me know in the comment section
you might say hey the first design is
good and the second design is yeah I
know Mortimer say no this first design
definitely does not work in the second
design is better you might say I'm
saying both designs suck here's a better
way of doing it and I would love to hear
those that's why we are here this is a
canvas of artists you is us English you
are an artist paint guys just paint
enjoy this process don't take things
seriously we're all gonna die eventually
so ok ok so that's enjoy this process
all right definitely faster it's a good
process guard so that's these two
designs and now I am supporting custom
URLs with this right and I only needed
two fields do you need another fields
all right guys how about we go through a
read operation right since we went all
right let's do a read this thing is
painful guys by the way just deleting
cleaning things if you have a book a
better way of doing a sketching online
let me know in the comment section right
I'm struggling around so let's write our
table again we have the URL right we
have the short URL which is this 8 byte
thing you write and that's our table I
think that's only thing we need to store
and let's walk through this I want to
make a get request here and this is my
shorty I'm reading those URLs now this
Miami
my shorty Allah says Amazon right and
then we take that character we take this
string and we query the database right
let's bless obviously guys we need to do
as much sanitation as possible do not
like because someone was a Rea Amazon
sequel injection nickel and drop table
URLs they can do that and screw a whole
our database so sanitize your input and
this is easier than ever I check out my
videos on sequel injections and
sanitation all that jazz so yeah we are
do a select statement here guys where
the blablabla equal Amazon and then
we're gonna immediately query that right
and that's fast because there is an
index here right we build an index so so
now this huge 1 billion rows is now
smaller but good I mean people can do
shorting and all that stuff but I really
recommend you not to unless your system
is big enough to really need shorting
and I talked about when to short and
when not to short check out the videos
there well yeah I'm gonna query that if
I find it I'm gonna return the actual
URL which is this guy right and then I'm
gonna return it to the user
and by return I mean literally just do a
301 was it redirect 301 or 302 and do a
location equal des so in order to
immediately redirect that stuff right so
maybe another video I'm gonna actually
sit down actually do that code for you I
just want to make the video real bit
short I don't want to make the video
long you all shortened is that we give
two designs and now I want you to just
while walking through the scenarios what
else can we do can we use no sequel
databases definitely can you use caching
can you use Redis definitely you can do
a lot of stuff to optimize this so
what's the pros of this design the pros
well I have my beautiful custom your
and I don't no longer have this scanning
problem the cones
it's 100% slower than the first design
right at least in the writing process
and you might say I don't care who's
saying I don't care about the writing
process and you're absolutely right
because writing how often writing
happens compared to reads readers gonna
be always more than the right right so I
always think about that how many people
click on shorty others versus how many
people generate short URLs right well
unless your Twitter if you're if you
care about reads writes right like like
for Twitter right every time you go to a
tweet and put a long URL or twitter
automatically shortens it so and Twitter
have a lot of right all right so now in
this case you might prefer the first
design because you're gonna store that
in a short term manner and you're gonna
retrieve it also in a short manner but
you're gonna show the user always the
long URL so that doesn't they don't even
know the short URL actually exists right
so you might remember first design my
work but if you're building an actual
tiny URL custom for the user or bitly
this design applause absolutely works
right and I didn't talk about scaling
per se across the whole world for this
right so let's talk about that a little
bit before we in the video so if I am if
I want to scale this design right let's
talk about this a little bit scaling all
right scaling let's talk about scaling
for both design 1 & 2 it's the same I
think there's the same problem right so
let's talk about they're scaling right
because that's what no no sequel people
have against relational databases that
they don't scale so scaling in this case
what we're gonna do guys and we can
clear all that garbage we don't really
need it but it is the thing we will make
one master database and multiple backup
databases all right
let's change that can I change like a
lot of the stuff I don't think I can't
huh yeah never mind
all right I can't change the color but
here's the thing guys so this thing is
gonna be the back up and this guy is
also gonna be a backup what does that
mean a backup means it's designed just
for reads right because that's why do
you want to scale you want to scale
reads right and the right always goes to
one single database so it's one thing of
database in Central America
I don't know North Virginia right and
there is a process that pushes the
rights to these read replicas okay and
this is called the replication the idea
of replication right and this is like
any relational database supports this by
default right so now you can scale those
guys right there web servers definitely
can be scaled right you can put a web
server here you can put another web
server right here and you can point read
so you can absolutely point reads when I
say reads I mean get request right hey
give me this URL read to this point
reads to this right and point post
request to the server right this is the
master or whatever right and this is the
post request and post of course I mean
what I mean actual rights of short URLs
right so you always configure your
system to have this Ram reads and you
configure it to have rights right here
okay so this can scale very easily right
because you remove the reads and now you
associated this only with the rights
correct
so now this guy the all the reads can go
through this server and all the risk and
go to the server and you can do this all
day right you can do this like if you're
in the u.s. you can do this or the West
and this guy is
the East right and reading is gonna be
fine
right only there will be some sort of an
eventual consistency even yes guys
eventual consistency can happen in
relational databases right not only no
sequel okay but I am guaranteed that my
rights are always consistent here but
there will be a lag where if I write
something right and at the same time I
try to read it it's gonna be a lag right
so sometimes you want this guy to always
accept reads as well if they are they're
coming from the same ip address but
that's a detail that you can finesse
guys right so that's the replication
idea you might say Husain that's
impossible
one database cannot handle all of this
stuff well that's absolutely fine as
well you can now
if your rights can no longer scale then
you can go play with the idea of
shorting right which we discussed right
here go check the video so you can short
this database only if you absolutely
cannot handle the rights guys again you
gotta really be careful when do you want
a short because it's a very complex
operation and you can use my Sequoia you
can use a Oracle Oracle so expensive he
can use my sequel Postgres all those
relational beautiful databases you can
also use mmm no sequel databases but I
don't I do not like the eventual
consistency in mice no sequel databases
especially with rights as that's just
personal opinion because I prefer to
have consistent rights even in a single
instance right but that databases don't
even give you that no sequel database
doesn't give you that no I'll take that
back so yeah you can replicate this with
a MongoDB right with rights your rights
is gonna be fast but it really depends
on how you're storing this information
the indexing or no sequel databases I
don't know anything about now no sequel
I know very little so I cannot possibly
tell you
to use no sequel database like if
I don't actually understand the
underlining functionality right I know
that not all no sequel support indexes
which is some something I just don't
like right if you don't support indexing
that's just a big no for me right I need
an index so I can quickly query my stuff
right I need things that are predictable
right that's just me guys don't let my
opinion force you at all right just I
won't just watched you understand
different thing I'm open to different
opinions right so this is an idea of
scaling what did I say wrong what did I
say right and by the way guys I forgot
to mention something very critical here
there will be a badass proxy right here
Oh about that we do have a proxy nice so
all of this stuff we're gonna be done
with a reverse proxy right that could be
load balanced right to these master
services right so technically all that
stuff right right all of these get
requests are going through the proxy
first rise going through the proxy and
the proxy decides hey go here go here or
go here right so that's the reverse
proxy right and you can make unlimited
copy of those reverse proxies right the
first proxy not act on doesn't only act
like a load balancer purely load
balancer and just do
the funnel the request to all of them no
it actually looks at the de esos
probably a layer seven load balancer
here and it looks at the data this is it
oh this is a good request I'll take you
here oh this is a post request post
request definitely have to be here right
but get requests has to go here and has
to go to this server right and probably
there is a reverse proxy a niche near to
each datacenter like in China probably
gonna have one in Asia and in Central
America and probably in Europe and each
one take it to the nearest server and
maybe the master server is gonna sit in
America right so yeah obviously right
it's gonna be slow for scaling purposes
but maybe you don't really care about
writes match right but now comes to the
shorting yeah I'm gonna show my database
I'm gonna put a shard in Europe
so rights are gonna be faster right Oh
about trade off guys it's all about
trail all right guys I think that's a
I'm gonna stop here what do you think
what do you guys think did you like this
episode how do you like this interaction
of this pen thingy hope you liked it
give me your opinion give me any
critique that you might have in the
comment section below because I love to
be better and do better with you guys
right and then let's have a discussion
what do you guys think is the better way
of doing things right
everything has a pros and cons it's gums
comes back to simplicity simplicity
sometimes rules and maybe simple thing
can create the best design really while
having complexity can having complex
design beliefs to complex implementation
which leads to the things that we all
hate bugs
we hate bugs guys okay so simple design
needs a simple elegant implementation
with code and then once you have a
simple implementation you have a rigid
rigid you have a solid badass system and
when you have a badass system you are
unstopped
unstoppable guys
right so paying it guys sit down and
paint what do you guys think if you know
what to start start somewhere right
something right anything that comes to
your head okay and that's why when I
talk about back-end engineering and
systems I need you to understand how
databases work I need you to understand
how web servers work so read read read
read read keep consuming learn and learn
and learn because we are all learning
guys here I want to know we're not gonna
stop learning all right guys I'm gonna
leave you here see you in the next one
leave a like subscribe and I'm gonna see
you in the next one
you guys stay awesome

We are social