FinTech - EC2 deployment

Background of this solution

We need a method to deploy EC2 (VM) based applications safely.
I come up two approaches:
- Progressive deployment which splits instances to many batches.
- Blue/Green deployment which switch traffic between two Auto Scaling Groups.

Version 1 - progressive deployment

High level of workflow

deployment1

Users submit change via PR. After the PR was merged, the workflow analyze the change and find the corresponding Ansible Inventory. Then it separate the Ansible Inventory to multiple deployment groups by Auto Scaling Group.

Workflow deploys each deployment group parallelly. Within each deployment group, the workflow splits instances in the Auto Scaling Group to multiple batches. By doing so, we can deploy each batch with a configurable interval.

The flow looks like:

get Auto Scaling Group in the specific environment.
get instances in the specific Auto Scaling Group.
split instances to multiple batches.
deploy each batch.

How to use this workflow?

Users can use title to specify interval between each batches.
If you specify [interval: 300, 180, 60, 10] in the PR title, workflow deploy your instances with 5 batches. It start with first batch and wait for 300 seconds, then it goes to second batch and wait for 180 seconds, and so on.
Users can add [revert] or [hotfix] to title, and it will skip progressive deployment in an emergency.
After merged to master branch, it triggers deployment

Use unit test to ensure the workflow result

I use JavaScript and TypeScript to implement most of the logic of the workflow.

By doing so, we can easily use jest to write unit test. High coverage is the key point for me to release new feature of the workflow with confidence.
workflow-result

I use jest and mock fs , child_process to simulate the desired output for each test case. An example jest test file looks like:

describe('group_inventory', () => {
  // Because JavaScript in GitHub Action workflow uses environment to pass variables
  // we'll set environment variables for each test case too.
  const OLD_ENV = process.env;

  beforeEach(() => {
    jest.resetModules(); // clears the cache
    process.env = { ...OLD_ENV }; // Make a copy
  });

  afterAll(() => {
    process.env = OLD_ENV; // Restore old environment
  });

  // test case 1
  test('createInventoryGroup read a asg with 10 hosts, hosts per group is 10%,10%,20%,20%,40%', async () => {
    // set up
    jest.mock('fs', () => {
      ... truncated ...
    });
    jest.mock('child_process', () => {
      ... truncated ...
    });
    process.env.ANSIBLE_INVENTORY = "inventory/aws_ec2.yml";
    process.env.ANSIBLE_HOSTS_PER_GROUP = '10%,10%,20%,20%,40%';
    ... truncated ...

    // import
    // here is the function we want to test
    const group_inventory = require('../group_inventory');

    // invoke
    const resp = await group_inventory({ github: {}, context: {} });
    const shouldBe = [
      {
        asg_group: 'nginx_asg',
        asg_group_with_ranges: [
          'nginx_asg[9:9]',
          'nginx_asg[8:8]',
          'nginx_asg[6:7]',
          'nginx_asg[4:5]',
          'nginx_asg[0:3]',
        ],
      },
    ];
    expect(resp).toStrictEqual(shouldBe);
  });
});

Version 2 - Blue/Green deployment

High level of workflow

nginx-blue-green

After I implemented the version 1 workflow, I think the solution can be even better. For a EC2-based deployment, using Blue/Green deployment can provide the ability to rollback service within seconds. That's the reason I implemented version 2 workflow.

This workflow use Blue/Green deployment approach to deploy changes to instances.

We create one additional ASG for the existing ASG, and we group these two ASGs into one Blue/Green deployment group. Because every ASG has its own Target Group, we can control how many traffic go to each Target Group by weight of ALB rule.

This approach enhances the version 1 workflow with following perspectives:

We can update EC2 AMI automatically.
We can rollback service within 1 minutes.