Step Functions in the Wild

I was glad to see Step Functions getting some love at re:Invent this year, and thought I’d share some of my favourite things about them. Step Functions are a workflow orchestrator, and while AWS markets their visual aspect quite heavily, it’s really the variety of state types and combinations that make them so powerful.

As a recap, the states on offer are:

  • Pass
  • Task
  • Choice
  • Wait
  • Succeed
  • Fail
  • Parallel
  • Map

I won't go into detail about what most of them do - the 2 I’m going to focus on here are the Parallel/Map states and the Wait state.

Wait

wait-state

The Wait state does what it says on the tin - pauses your workflow until either a specified timestamp is reached or until a specified duration has elapsed. This may sound simple, but it abstracts away a lot of lifting that you'd otherwise have to do yourself with polling or waiting in a loop.

An example for a Wait state could be when you collect information in real time but can only act on them at a specific time. In this case, we would wait for the specific time, but if your business process required waiting a duration instead (such as a cooling-off period), that could be achieved similarly by specifying a duration in the Wait state.

Parallel/Map

These 2 states are conceptually similar - they let you take a single branch of the workflow and split it into multiple ones. The difference is that the Map state will split your workflow into n different branches each doing the same thing, whereas the Parallel state will split your workflow into n different distinct branches of logic. A Map state is therefore fed by an array input, creating as many branches as elements in the array and running the same sub flow. A Parallel state need not be fed by an array input - in its most simple form it will simply pass the previous output to the n distinct sub flows you define.

An example use case for a Map state would be if you have a list of items (say financial transactions) and you need each of them to be processed the same way. You define the iterator of the Map state once, and start the workflow with a list of transactions. Each transaction will be processed by the iterator.

map-state

A similar example use case for a Parallel state would be if you have a single financial transaction, and you need to perform several different checks on it which may be done concurrently. You define each check as a sub flow, and set up a Parallel state with each sub flow being a branch of the Parallel state.

The best thing about these states in the context of Step Functions is that you can easily delay the next step until all the branches have completed. In our Parallel example, you may need to perform the aforementioned checks and ensure they pass before proceeding. This is trivial in Step Functions, since by definition the Parallel step follows the rules of the workflow and holds itself until all its branches have completed.

Map and Wait: An Example

I have implemented a workflow that benefits greatly from both Map and Wait states to simplify the implementation. In this workflow, we are processing mutual fund trades for a number of users. Trades are grouped by user and order - an order being a group of trades made by a single user.

For each order, we submit each trade to the market for execution, before waiting for the trades to settle (a process which runs the order of magnitude of days, with possibly different settlement dates for each trade). A specific requirement here is that redemption orders (where the user is selling their holdings for cash) must be paid out in a single payment when all trades for that order have settled.

To achieve this, we define the list of trades as a property of an order, and define a Map state that operates on that list. We then define the iterator, which is a set of steps that we will run for each trade. Within the iterator, we submit the trade for execution to a third party platform, and receive back a value representing the settlement date. We pass the settlement date into a Wait state (remember we are still within the map iterator at this stage), and the workflow pauses until the settlement date is reached. Once the settlement date for all trades in the order have been reached, the Map state automatically concludes, and we can proceed to pay the user.

Conclusion

What would have otherwise been a very tiresome workflow to build is simplified greatly by the use of Step Functions, and in particular the Map and Wait states. The workflow is flexible with regard to the number of trades per order as well as the varying settlement dates. The specific value of both these variables can take on pretty much any value without a change in the workflow definition. In other words we are able to handle orders with varying numbers of trades, as well as arbitrary settlement dates (within the quotas of Step Functions).

I find Step Functions somewhat underrated, and love the ability to implement complex workflows without having to build the plumbing to support them. I am a big advocate of focusing engineering effort on value-generating aspects of the business as opposed to necessary but ultimately generic functionality such as what we've discussed here.

As I mentioned at the start of this post, it’s great to see AWS continuing to invest in them, especially introducing the ability to redrive failed executions from the point of failure, and I'm looking forward to watching the service continue to evolve.

© 2023 Thomas Chia 🇸🇬