When you’re dealing with a real-world project rather than documentation,
Many systems in healthtech and fintech face the same challenge: processes that don’t finish within 15 minutes. An application waits for approval from a compliance officer. A transaction must be rolled back because a downstream API has failed. User onboarding stretches across weeks. Until now, the typical solutions involved Step Functions, database polling, or heavyweight orchestration services. AWS Lambda Durable Functions, announced in late 2025, change that equation.
What is AWS Lambda and why is this reminder important?
Lambda is a serverless service that runs code in response to events within the AWS architecture. A simplified definition I often use during onboarding sessions is this: Lambda is code that wakes up, does its job, and shuts down. You don’t have to worry about scaling—whether you’re handling one request or a million, AWS automatically adjusts the number of executions without any intervention on your part.
Typical use cases include processing documents after they are uploaded to an S3 bucket, asynchronous processing of SQS queues (the classic example: a new order arrives in a queue and Lambda handles it in the background), or building serverless REST APIs using the API Gateway + Lambda architecture, where Lambda replaces a traditional application server.
The timeline of this service is relatively short but packed with innovation. In November 2014, AWS announced Lambda in preview at re:Invent. In 2018, SQS became a native event source, allowing Lambda to evolve beyond being just a backend for APIs. In 2022, SnapStart was introduced, significantly reducing cold start times for Java-based functions. Then, at re:Invent 2025, AWS unveiled Durable Functionsa feature that takes Lambda beyond the traditional “fire-and-forget” model.
Standard Lambda vs. Lambda Durable: what’s the difference?
A standard Lambda function does not maintain state between executions. If something interrupts it, progress is lost and the process must start from scratch. It also has a hard execution limit of 15 minutes.
Lambda Durable works differently. It uses a context mechanism (DurableContext) that records the result of every step. When the workflow resumes, the platform replays the code from the beginning, but instead of re-executing completed steps, it injects their previously stored results from the checkpoint log and skips ahead. New computation begins only from the first unfinished step.
This has one major practical consequence: Lambda Durable can enter a suspended state, wait for an external signal for weeks or even months, and then resume exactly where it left off.
Equally important from a serverless cost perspective, you do not pay for compute time while a Lambda Durable workflow is idle and not executing code. Additional costs, such as durable operations and checkpoint storage, are typically negligible in real-world workloads. With relatively short retention periods and small payload sizes, they usually account for only a tiny fraction of the total solution cost.
The maximum execution lifespan of a Lambda Durable workflow is nearly one year.
What does this look like in code?
There are two changes compared to a standard handler. The handler is wrapped with withDurableExecution, imported from the new SDK, available for Python, JavaScript, and Java. The second argument of the handler is DurableContext, which manages state between steps and is responsible for storing results.
The saga pattern, including rolling back steps after an error, which is essential in distributed transactions, can be built directly with nested try-catch blocks. The catch block calls compensating steps in reverse order. Business logic stays where it belongs: in the application code, not in a separate configuration file.
For processes that require a human decision or a response from an external system, the callback ID and callback URL mechanism comes into play. Lambda generates a unique identifier for the suspended invocation, the callback token, and an endpoint, the callback URL, which the external system must call to wake it up. Without this, it would not be possible to resume a specific waiting instance.
Working example: semi-automated approval of training requests with AI
A few weeks ago, I attended AWS Summit Warsaw, where Durable Functions were one of the main topics. The subject intrigued me enough to build a POC and present it during our internal FireLab, a recurring series of meetings where Fireup.pro engineers showcase technologies they are currently exploring.
I chose a scenario that can be easily translated into real-world use cases: a semi-automated system for handling training requests. An employee uploads a request to an S3 bucket. This triggers Lambda Durable, which sends the document to Amazon Bedrock using the Claude Sonnet 4.6 model. AI evaluates how well the training matches the applicant’s role and returns a percentage score.
The decision logic has three paths: below 50% means automatic rejection, so Lambda does not wait, continues execution, and closes the case. Above 76% means automatic approval, following the same flow. The challenge lies in the grey area: 50–75%, where the model is not confident enough to act independently.
This is where Lambda Durable pauses execution. It generates a callback ID and callback URL, stores the state, and goes to sleep: resources are released, and compute costs drop to zero. The HR team receives a notification, reviews the request, and calls either the sendSuccess or sendFailure endpoint. Lambda wakes up exactly where it left off.
The default timeout for the HR decision in this POC is 72 hours, but it can be configured for any duration up to one year. In the demo, I submitted six fictional requests: hiring specialist, junior developer, driver, accountant, marketing specialist, and senior developer. Five were completed automatically. One went to manual review and remained in the running state for several minutes during the presentation until I manually approved it. Lambda came back to life, approved the request, and the result landed in the bucket. The entire cycle was visible in CloudWatch.
This pattern, AI as the first filter and a human as the decision-maker for ambiguous cases, can be directly applied to clinical processes, underwriting in insurtech, or compliance workflows in regulated fintech environments. If you’re interested in how fireup.pro designs similar architectures for healthcare and financial clients, contact us directly.
Lambda Durable vs. Step Functions: when should you choose each?
Step Functions is a mature, battle-tested tool. I say that without irony. For complex pipelines involving multiple services, requiring rich workflow visualization and independent infrastructure-level orchestration, Step Functions continues to be an excellent choice.
The difference is structural. Step Functions requires you to define the workflow as a separate configuration file using Amazon States Language—a layer of abstraction that exists outside the application code. Lambda Durable keeps the entire orchestration logic inside the function itself. One codebase, one deployment, one source of truth.
For scenarios where orchestration is tightly intertwined with domain logic—which is the case for most healthcare and fintech systems I build—Lambda Durable removes the configuration overhead and maintains consistency between what you see in the code and what actually happens at runtime.
A similar philosophy of “keeping logic in code rather than configuration files” was something we discussed in our article on project-oriented versus process-oriented approaches to software development, particularly in the context of designing complex workflows.
