Wait for callback : Step Functions
The simple art of orchestrating your workflow
In my previous article, I talked about how Prime Video went from serverless — Microservice architecture (primarily using step functions) to a monolith architecture which reduced their billing by 90%, but as I always say, in this world of tech, everything is about trade-off, what may not be useful for you in a particular context, can be useful for someone else in another particular situation, thus we never should exclude a service or a tool or any technology based on a specific experience. We should always value the pros and cons, embrace the trade-offs resulting from our choices, and deal with them. We don’t have too much choice here 🤷♂️
This being said step functions remain an interesting service that has a lot of cool features. Today we are going to explore one of these features.
Wait for the callback feature
I) What the hell are step functions, in the first place
In simple terms, AWS Step Functions help you automate and coordinate tasks across different AWS services, making it easier to build and manage complex applications. It enables you to design, execute, and visualize workflows that involve various AWS services, such as Lambda functions, ECS tasks, SNS notifications, and more. This makes it particularly useful for orchestrating tasks in applications, data processing pipelines, and other distributed systems.
To take a concrete example, you are selling books, and you want to handle the order workflow from client order to shipping. You can design and define this simply compared to the classic way in which you have 1000 lines of code with multiple if-else statements, which you Know … very hard to maintain and can get messy pretty quickly 😵💫
II) Step functions features.
Step functions offer many features, just by taking a look at the workflow above we can see that :
• There’s a state-handling mechanism. Define what action to take based on the state (success or failure)
• Choice state. Based on the output of a process we can define what step to take next
• Parallel state, execute steps in a parallel way, and more ...
Another feature that step functions offer, which is not mentioned in the workflow, is “Wait for the callback”. In the next part, we are going to deep dive into this feature and break it down.
III) Wait for callback feature
Let’s go back to our book-ordering workflow and add to it a payment part that requires validation from the user on their phone or something. In this case, we need to wait for the user's confirmation, for 2 minutes let’s say, to finish the process and ship the book right? How can we cleanly implement this? How do we wait for the user’s validation?
IV) Technical implementation — Architecture
Since with step functions we can interact with every service we want, the number of ways to implement this is huge. One way to implement this is to have an SQS topic, that triggers a lambda function (which here will replace the user and execute the process). If the process is successfully done we can proceed with executing the rest of the process, otherwise we abort it. The code for this is implemented in this repo, but for now, let’s implement this together
IV.1) Implementing this together
- Add SQS step, check the wait for call back checkbox (this is where the magic happens). In the Message section, select enter message and past down this JSON.
{
"myTaskToken.$": "$$.Task.Token",
"Input.$": "$"
}
What this JSON means: Send the task token (will come back to it later and explain it). as well as the input of the user.
2. Add Error handling: This is to tell what step to execute if the user doesn’t respond in 2 minutes.
3. In the error handling, add a catcher, choose “states.Timeout” in the Error section, and Fail in the fallback state. Scroll down the timeout section, and go to HeartbeatSeconds to put 120 seconds (2 minutes, you can put whatever you want). Finally, come back to configuration, scroll down to the next state, and put “Go to end”
Congrats 🥳 We have now implemented the call-back logic. Let’s test what we have created till now. Click on save and click on execute. Keep everything as default and click on start execution.
At this stage, we should have a pending stage. This is what we are expecting as a result.
Resume the process:
- Now that we have seen that our process is working as expected, the question left is, how can I resume this process?
To answer this question we should look for the “TaskScheduled” event at the bottom and look for the “myTaskToken” attribute.
- Copy that token, open Cloudshell, or use your Cli and copy past this command. Replace YOUR_TASK_TOKEN with your actual token.
aws stepfunctions send-task-success --task-output '' --task-token YOUR_TASK_TOKEN
- The process now (of course if you have done this process in less than 2 minutes 😅) should resume and pass to a green state.
IV.2) One last touch
Now we have seen how to trigger a call back, we will try to make this process a little bit more realistic because we know our user is not going to look for this task token in the events 😅
What I will do now is to make the SQS queue trigger a Lambda function whenever it gets a message. This Lambda function should receive the task token, do some logic, and if everything is fine, sends an API call the same way we did with the CLI to resume the process. The Lambda code will be described in the repo.
Please when creating the Lambda function don’t forget to give it the appropriate roles for SQS and step Functions.
IV.3) Final execution
If we execute our workflow again, we should see the process going green. And if we take a look at the cloudwatch logs of our Lambda function, we are supposed to see an execution of our lambda as well as the myTaskToken printed out. This means that our Lambda function successfully received the task token and made the API call for the step function.
V) Conclusion
Step Functions is an interesting service, it helps orchestrate complex workflow cleanly and simply. However, we should also be aware of the way we are going to use it otherwise our billing can skyrocket pretty quickly. Check my last article for more details about this.