How to Design Software — Image Uploaders
Learn how to design a scalable image uploading system!
In many startups I’ve worked with, image uploading was a part of their web application’s workflow. From user avatars to uploadable inventory pictures, it was a common-enough feature to be present in almost every system.
Rather unsurprisingly, many of those startups’ solutions to their uploading suffered from the same issues. Logic to handle image uploads was ad-hoc. File processing happened at the same time as the request, causing the application server’s request queue to back up.
In short, it wasn’t so much a system as a bunch of disparate workflows cobbled together (typical of most startups). The result? A lot of bugs in image handling, mysterious crashes that can’t be traced, and random timeouts.
I’m here to show you a better way.
Understanding the technical concerns
Image uploading can be complex, and use-cases vary depending on the system.
There’s a lot of concerns with images in general that most people don’t think of when they are diving into building it out.
Images in a web application can easily touch on the following technical concerns:
Displaying the image on the front-end
Authorizing users to download the image
Authorizing users to upload the image
Uploading the image to your server in a scalable way
Validating the image data
Processing the image, performing cropping, optimization, and other tasks
Creating image variants, such as banners and thumbnails
Storing the images
Associating the images to whatever records you’re uploading them for (such as user avatars or campaign banners).
While not all of these would be present, a solid system will be able to flex and adapt to support these use cases with minimal changes.
The Tradeoffs
When thinking of a solution that can address these concerns, it quickly becomes evident that the final solution will be more complex than adding a column to your user model and an endpoint to upload the image, and calling it a day.
It’ll be helpful to dive into the tradeoffs to consider in a solution.
Scaling
Image uploads are notorious for crashing servers or causing timeouts. If a user attempts to upload a 10 megabyte image, that’s a lot of resource usage:
10 megabytes of your server‘s memory tied up
A request handler being tied up for the entire amount of time it takes to upload 10 megabytes
CPU usage to deal with the image upload
If you’re dealing with 1 or 10 users uploading images, it’s not a big deal. However, if your system is actually used by users, it’ll quickly go out of control.
As a result, the architecture is required to be focused on reducing the amount of time your server is actually handling an image uploading request to almost nothing, and offloading the actual upload to another service (such as S3 or an in-house service dedicated to uploads). You don’t want image uploading taking up all of your web server’s capacity.
Security
Image uploads and processing are a massive source of security holes. Any endpoint that lets you tie up a massive amount of resources is vulnerable to denial-of-service attacks (intentional or not).
On an even more worrisome note, image processing is a poorly understood aspect of engineering that has led to some tragic security flaws, providing random users the ability to execute arbitrary commands as a root-level user.
Our system architecture has to handle images in a secure way that promotes availability but also provides integrity of user data.
Authorization
On the security front, there’s also business logic specific to our system. Not all images should be publicly available. Perhaps there are situations where users should not be able to download images uploaded by other users. Perhaps there might be rules surrounding who can upload an image.
Our system has to be able to handle these domain-specific authorization cases easily.
Consistency
I’ve written about the value of consistency in the past. We don’t want a different way to upload an image for every kind of image we want to upload. Image upload use-cases like user avatars and campaign banners should all flow through the same image uploading workflow and should require minimal or no code change to support.
Variants
When an image is uploaded, there might be multiple places and ways it is used. Places like thumbnails, backgrounds, and profile images might use the same image in different ways. We might serve a lower resolution image to mobile users to save on bandwidth.
Whatever the case, we have to support the creation and usage of variants in our upload system. Creating variants can take a lot of processing power, and it’s not something we want our primary web server to do. The architecture has to offload that to a separate, asynchronous service.
Growth
All subsystems should be able to grow independently of the rest of the system. You never know when you’ll see an increased demand in usage. By keeping well defined boundaries in your system , you can easily convert them into micro-services or a separate deployment that scales horizontally.
The Architecture
With these trade-offs in mind, let’s now take a look at an architecture for uploading images that fulfills the criteria.
Keep reading with a 7-day free trial
Subscribe to Joseph Gefroh to keep reading this post and get 7 days of free access to the full post archives.