How to Design Software — Tags and Groups
A feature you often need in web applications is the ability to order, filter, group, or organize records based on some arbitrary…
Learn how to design a tagging system and provide the ability to order, filter, group, or organize records based on some arbitrary properties.
A feature you often need in web applications is the ability to order, filter, group, or organize records based on some arbitrary properties. In many situations, it’s simply called a “tag”, but the purpose and behavior is the same whether it’s called categories, metadata, groups, or descriptors.
Take a look at the storefront for the popular PC gaming platform STEAM. STEAM allows users to add text tags to games. Users can add any tag to a game, marking it as a “strategy” game, “painful” to play, or even just “goat”. Users can use these tags to filter their search to only show similarly tagged games, making it much easier to find the games they want.
Let’s take a look at several ways to approach tagging.
The Naive Approach
The naive approach you can do is to store the tags in as a comma separated string within a single database column, as shown below:
PRO: It’s dead simple.
Let’s face it. Adding a single column is really the easiest thing you can do.
CON: Querying is a pain.
How do you find potatoes that are delicious and warm, but not buttery or just delicious or just warm? Writing the query to do that would be difficult and cumbersome, especially as the number of possible tags increased.
CON: Prone to formatting errors
If a user adds a comma to their tag, you suddenly have a problem — that tag gets split into two tags.
CON: Limited by field length
Database columns have a maximum length and you might hit it.
Separate Columns
An evolution of the first approach is to store each tag of a record in its own column, conveniently labeled tag1
, tag2
, tag3
, and so on.
PRO: It looks right.
Having multiple columns for multiple values seems to be following sound database practices…right? (not in this case).
CON: Querying might actually be harder.
It might look like a good idea on the surface, but querying is actually harder. Now you have to include every single column in the query. Additionally, your code would have to check for whether a tag existed in every possible tag column since there’s no guarantee that a tag would be in any specific column for any record. tag1
might have “buttery” in Record 1, but Record 2 might have “warm” in tag1
and “buttery” in tag2
.
Keep reading with a 7-day free trial
Subscribe to Joseph Gefroh to keep reading this post and get 7 days of free access to the full post archives.