Caching an API with +50mil users

For the past few years, I have worked for a company that created and maintained APIs serving data from a database with +50 million users. Caching took a big part in making all that work, so here’s what I’ve learned from that experience.

Why use cache

  1. Fetching data on the database is usually slow (performance)

  2. Handling multiple requests per second may put pressure on the database (resilience)

  3. If the database is down your application will stop working completly (availability)

The problem is that caching is not that simple. You have to decide on what data to cache, what caching system to use and when to use it. All of that depends on your application, the data you are working with and how it is used.

Caching systems

First, I will talk about the caching systems I have used and my thoughts on each.

1) go-cache

For simple applications with small databases, go-cache is an in-memory caching system that is very efficient.

Pros:

  • Since it is in-memory it doesn’t need to make requests through network to fetch data;
  • Using it is as simple as go-geting the lib;

Cons:

  • Not scalable (if you have more than one instance of your app);

2) Memcached

Memcached is a simple, yet robust solution. It’s free, open source and very high-performance (also used by Netflix ๐Ÿ‘€).

Pros:

  • High performance;
  • Open source;
  • Very mature and used by a lot of big companies;

Cons:

  • Not good for small apps;

For scalability, it is possible to run multiple instances of Memcached. To make them communicate we use McRouter, which is a protocol router created by Facebook focused in performance.

3) Redis

Redis is the most complex of them all, thus making it the most resource expensive. It has support for data storage, caching, and message broker with various data structures.

Pros:

  • High performance;
  • Open source;
  • Very mature and used by a lot of big companies;

Cons:

  • It’s overkill for small apps;

What to set on cache

Now that you have chosen the caching system that works best for your app, we have to decide what to save on cache.

  • All queries results?
  • Entities?

It is important to try to maximize the hit rate, that is where the use cases come in play. If we set every request on cache, for example, our hit rate will be very low, that is why we have to try to use caching strategies.

Problems and solutions

Here are some examples of problems I’ve faced and how I tried to solve them.

  1. Very specific keys

It is not a good idea to get very specific while caching. One example is filtering user’s most watched videos by category, tags and location, that would lead to a lot of keys and low hit rate.

Solution:

Filtering queries only fetch object ids and set them on cache, then the ids are used on a generic query to fetch the objects by id, which is also set on cache. Though that is counterintuitive to have two queries, that way the cache keys would be reused more often, increasing hit rate.

Also, another good optimization is to set a bigger filter on cache and filter some objects in code, after they’ve been fetched. That can also sound worse, but it’s a lot faster to receive objects from cache and filter them, than to fetch them from the database.

  1. Invalidating caches after changes on code have been made

Imagine you have a user set on cache with id and name, than you deploy your app with changes including email to the user. How do you invalidate that cache?

Solution:

It is a good practice to include on the cache key a hash of the object structure, that way the keys are automatically invalidated when something changes, and dies after it’s TTL.

  1. Empty cache

Sometimes there is a problem with the server and the cache gets empty. That could be dangerous, because a lot of uncached requests could potentially overload your app.

Solution:

With big apps it is good to have a cache-warming routine to run and fill the cache with most used queries.

What if there is a problem with the database?

One big advantage of having cache on your app is that, when there is a problem with the database most of the app could keep working and part of the users wouldn’t even be affected.

Conclusions

  • Caching is great to help your app be more resilient and fast
  • Caching should be used wisely
  • Always remember to invalidate caches

Read more