Monolith is not dead and it scales
Find out why Amazon Prime moved to Monolith application and how it helped them scale the app
A few days ago Amazon Prime said that they moved one part of the application from serverless to the monolith. It was a tool that monitors and fixes live stream videos, they call it Video Quality Analysis.
It wasn’t designed from the start to handle a high load on the application. So it’s kind of expected that it failed?
Well no, as we all know everybody says that you should use serverless if you want high scale. So what went wrong?
Running it at a high scale was too expensive for them. Yes, you read it correctly, it was expensive for the Amazon Prime team to run something on AWS.
They had scalability issues, they could scale only up to 5% of predicted traffic!
Orchestration costs
What it initially used was the Step Functions, and in case you don’t know. They allow you to orchestrate workflows on AWS using Lambda Functions.
You define your state machine and run it when you need it. It was a good choice initially because with Step Functions you can build quickly workflows and orchestrate them.
With Step Functions, you also can see all workflows which got executed, and it gets visualization for you of how your workflow looks like.
Now that we know what is Step Functions, here is where the problem was. Now imagine when you have a Live Stream and each second of the video has to be analyzed.
There are certain limits to how many invocations of Step Functions you can have. Yes, they could increase it obviously since they are from Amazon but that would eat resources and it would cost a lot.
And even if you are Amazon, going for a solution that is not cost-effective is not an option.
Caching costs
Because during these processes, they had to convert the stream into something that can be analyzed.
The initial step in this process was to convert video into images and store data into S3. After that, they would spin up the workflow on Step Functions which would go in parallel and analyze the frames.
Caching makes sense in this situation because you don’t want to go all the time and extract data from the video.
And we usually think that’s the best approach, but the problem was how many calls to S3 bucket were going to happen after. At scale, these calls got too expensive.
Going for ECS
What they ended up doing is packaging these workflows in one container service and deploying it as ECS Task. They went from multiple decoupled services into one coupled service.
ECS is a service that allows you to scale your application using EC2 servers. With it, you can also leverage Saving Plans and buy EC2 instances for a period of time. In the end, this helped to reduce the cost of running this service.
What is interesting to me, is that if you asked any engineer if caching makes sense at early in the stage. All of us would go for that for sure, why not? All of us are thought that caching is good, but it depends on how it will be used.
Architecture
But there was a “battle” on Twitter and people keep saying that serverless is dead and that Microservices are bad architecture.
My two cents on this are that people tend to mix what architecture is. Yes, serverless/microservices didn’t work in that case, but not everything should be either serverless/microservices.
The same thing is with everything in engineering, you won’t throw all Design Patterns on one problem even if you don’t need them.
You don’t use async communication for everything, instead, you understand what you are trying to achieve and then use proper communication protocol.
You don’t use CQRS on all applications. If you do that, good luck.
But people tend sometimes to take new and shiny things because it seems like all the cool kids in the block are using them. Architecture has to be revisited as the project and business needs change, that’s the only way. And that’s what the Amazon Prime team did, that’s what great teams do.
Imagine if you are in the early stage startup, with 5 engineers, and you go for microservices. What kind of benefits do you get except complexity? You don’t even have product market fit, but you choose to go for Microservices and Kubernetes for deployment?
In the industry, there are also a lot of incentives to go for microservices and complex systems. It makes you employable. Because a lot of the companies require them on job interviews, and if you tell them that you worked on Monolith they don’t think you are an amazing engineer, worthy of their company. It’s just plain stupid.
Architectures are used in the wrong way, rather than that architecture is a problem. Every architecture is an iteration, evolving over time into something that can fit the current business needs. Something that solves user problems, is efficient, allows teams to move and ship fast, and doesn’t take a fortune to run.
So please, let your architecture evolve and open to different architectures. Monolith applications are far from dead.
Stop taking debates if it’s bad or good architecture, and stop trying to kill it, we still didn’t kill jQuery on frontend but you want to kill Microservices and Serverless? C’mon, let’s be adults.