AWS Serverless - Feature Flags

AWS Serverless - Feature Flags

Background of this project

Lately, I was implementing a feature flags functionality to enable us selectively deliver new features to users. Also, the whole system uses AWS serverless architecture, including API Gateway, Lambda, and DynamoDB.

I assume you are familiar with API Gateway custom authroiser and DynamoDB.

There are many options you can implement feature flags, we explore two approaches in this article and compare them.

Solutions

We have two approaches to implementing feature flags. Let's start with the easiest one.

1. Retrieve and verify in custom authroiser

The simplest solution is, we retrieve and verify the user's feature flags in the custom authroiser. With this approach, each time a user access an API, we validate the request with the latest feature flags. That means when we turn on a feature flag for a user, that user can access the new feature instantly.

The architecture as follows:
ff-general

Pros:

  • Turning off/on feature flags always reflects instantly.
  • The implementation is straight-forward.

Cons:

  • Adding an extra latency in the custom authroiser to retrieve feature flags.
  • The latency in custom authroiser impacts all APIs.

2. Retrieve from JWT and verify in customer authroiser

Wicked! This is the most interesting part. In order to overcome the extra latency in the customer authoriser, we embed the feature flags in JWT which users got when they logged in. When a user access an API, we validate the feature flags in JWT. By doing so, we only need to retrieve the feature flags from the database when a user login. That means we remove the extra latency in the custom authoriser. But, the disadvantage is if we turn on/off a feature flag and want it to take effect immediately, we need to invalidate the JWT of the user.

The architecture as follows:
ff-with-jwt
Pros:

  • No extra latency in the custom authroiser.
  • Less performance overhead.

Cons:

  • The implementation is slightly complicated since you need to invalidate the JWT of a user when you want to refresh the feature flags.
  • Turning off/on feature flags doesn't reflect directly. However, the user must log in again and get the new JWT.

Database Design

We use DynamoDB as our database to store feature flags. Below is the schema of the FeatureFlags DynamoDB table.

user_id (PK) feature_flag(SK) is_enabled updated_at created_at
system_global_admin new_feature_A false 1687882000 1687882000
user_A new_feature_A true 1687882010 1687882010

The above table contains two feature flags. The one with system_global_admin is a global feature flag, this feature flag is applied to all the users by default. The one with user_A is a user-specific feature flag. The a user-specific feature flag has higher priority.

In this article, we use a dedicated table for feature flags. However, it is a good practice to store your feature flags with other data you need. You can read through The What, Why, and When of Single-Table Design with DynamoDB to find out more detail.

API Design

You probably want to create an API for the frontend to retrieve feature flags. This API should combine the global feature flags and user-specific feature flags in one response.

In the following example, if a user user_A has a feature flag:

{
  "new_feature_A": true
}

and we have two global feature flags:

{
  "new_feature_A": false,
  "new_feature_B": false
}

the API of retrieving user_A feature flags should return the following response:

{
  "new_feature_A": true,
  "new_feature_B": false
}

That means we can toggle feature flags with the global scope or per-user scope!

Conclusion

In the early stage, it should be fine to go with the first solution. You implement the straightforward solution and get the benefit of feature flags as soon as possible. However, if you want to reduce the latency, it's worth looking into the second solution.

Embedding feature flags in JWT is an inspiring idea for me. We use the nature of JWT to carry the information we want. Hope this inspires you as well.

The feature flags can include two scopes: global and user-specific. This can give you more control over the whole system.