Netflix Conductor | The tip of the Iceberg – Marcos Soares

The world of microservices has come to stay and application development trends has shown the usage of Microservices is predominant on the market. Eventually on almost every project, scalability and resilience becomes a challenge. There are several solutions to address these challenges and today we are exploring one of them – Netflix Conductor.

Netflix has been becoming well know for publishing open-source software, such project entitled as Netflix OSS (Open Source Software). Netflix Conductor is one of its solutions.

Goal

Show bare minimum of Netflix Conductor capabilities by learning its definitions and having exposure through hands-on experience. With tons of useful screenshots!

Starting point

Conductor is a Workflow Orchestration engine that runs in the cloud – by Netflix documentation

Using my own words: It allows managing all required steps (microservices/APIs) for a given business workflow (e.g. Create order) end-to-end, with scalability, traceability and resiliency.

Conductor Terminology

Understanding its basic terminologies is crucial before advancing on the post:

Definitions (aka Metadata or Blueprints)
Like classes in OOP paradigm, or templates. You define this once, and re-use on every “instance”.
Workflow Definition
“Class/template” for business process (e.g. Fulfill order), containing list of all Tasks, their configurations and how they interact with each other in order to complete a business flow.
Workflow
It is the “instance of a Workflow definition”, the actual work being done, step by step.
Task Definition
“Class/template” for a single step (e.g. Send an e-mail), it contains configurations like inputs and outputs, timeouts, retries etc.
Tasks
It is the “instance of a Task definition” – it shows the actual work being done. Tasks can be categorized into two types:
- Systems tasks
  Predefined tasks completely executed by Conductor. Examples: IF Conditions, HTTP Requests, Joins, Forks, etc.
- Worker tasks
  Finally your subscribers, these are tasks that must be executed by a “worker/machine/computer”, it can be anything as per definition of your business processes. Examples: Create Order, Send E-mails, Generate Invoice, etc.

Now we understand basic Conductor terminology, let’s see a high level Diagram:

There are many other components within this architecture that is not needed in order to understand “bare minimum” so we will skip them for now.

High level Conductor Process

Some outside entity request a workflow to Start, then Conductor “Schedule a Task”. For now let’s think it can be an API request or a Message like RabbitMQ that triggers a Workflow – there are many possibilities though.
A Worker polls a “Scheduled Task” so that the worker can execute it.
The Worker comes back to conductor disclosing the its results, example being marking task as COMPLETED.
Profit – That’s it, the basic process 🙂

Conductor provides an UI, making it easy to interact with the tool. This is where most of Engineer’s time are spent by developing, analyzing inputs and outputs, troubleshooting, etc.

Hands-on experience

Pre-requisites: Git & Docker Desktop installed.

Clone repo:

git clone https://github.com/Netflix/conductor.git

Within folder: conductor/docker

/d/Projects/conductor/docker (master)
$ docker-compose up

Your goal is having below 4 docker containers to be at running state. For now we will be focusing only on docker_conductor-ui and docker_conductor-server

❗ Common issue: If you are running on Windows and facing the error on docker-compose up:

 ...
---> b2b927953ec5
Step 5/16 : RUN ./gradlew build -x test
 ---> Running in c49714c655e0
/bin/sh: 1: ./gradlew: not found
The command '/bin/sh -c ./gradlew build -x test' returned a non-zero code: 127
...

💡 Error summary: It is a Git configuration on new line endings that are different on Linux vs Windows. Solution is delete the repo, run below git command and clone it again:

git config --global core.autocrlf false

Full details posted on this link, I’ve participated on the thread as well 😉 => https://github.com/Netflix/conductor/issues/1814

At this point, you should be able to open two URLs via Browser:

http://localhost:5000/ – Conductor UI on Workflows executions screen

http://localhost:8080/ – Conductor APIs – Swagger with all APIs available

Feel free to explore the UI and APIs for a couple of minutes first before moving on to next step.

Creating the first Workflow

Our Workflow goal will be:

Create E-commerce customer order (Workflow definition)
1. Fulfill Customer Order (Worker Task)
  Updating inventory, generating invoice, check if customer is entitled for next order discount, etc.
2. Check if customer is entitled to a next order discount (System Task)
  - If Yes => Send e-mail discount on next order (Worker Task)
  - If No => End of workflow.

Postman will be used to emulate all steps within this Lab as currently Netflix Conductor UI lacks capabilities such as creating new Definitions, etc. As a matter of fact most of UI functionality is view only, very few action capabilities – later described on this post. Feel free to download collections, then import to your local Postman:

Conductor-Postman Download

Creating Task definitions

POST
http://localhost:8080/api/metadata/taskdefs 
[
  {
    "name": "Slipmp_Fulfill_Customer_Order",
    "retryCount": 3,
    "retryLogic": "FIXED",
    "retryDelaySeconds": 10,
    "timeoutSeconds": 300,
    "timeoutPolicy": "TIME_OUT_WF",
    "responseTimeoutSeconds": 180,
    "ownerEmail":"[email protected]"
  },
    {
    "name": "Slipmp_Send_Email_Discount_Next_Order",
    "retryCount": 3,
    "retryLogic": "FIXED",
    "retryDelaySeconds": 10,
    "timeoutSeconds": 300,
    "timeoutPolicy": "TIME_OUT_WF",
    "responseTimeoutSeconds": 180,
    "ownerEmail":"[email protected]"
  }
]

Creating two Tasks Definitions via Postman

Viewing on Conductor UI two recently created Tasks Definitions

Please notice System Tasks are not defined on the list, in our case “Check if customer is entitled to a next order discount“. These are created during Workflow definition creation.

Creating Workflow definitions

Shall we take a look first on what we are about to create? Certainly will be more didactic 🙂

Left => Visual representation of W. Def. Please note how System Task generates a familiar IF statement. | Right => Conductor stores everything as JSON DSL

Postman HTTP request to generate our first W. Def:

POST
http://localhost:8080/api/metadata/workflow
{
    "name": "Slipmp_Create_Ecommerce_Customer_Order",
    "description": "Imagine this is an Order creation process for an E-commerce application. Updating inventory, generating invoice, check if customer is entitled for next order discount, etc.",
    "ownerEmail":"[email protected]",
    "version": 2,
    "schemaVersion": 2,
    "tasks": [
        {
            "name": "Slipmp_Fulfill_Customer_Order",
            "taskReferenceName": "t1_fulfill_order",
            "inputParameters": {
                "orderId": "${workflow.input.orderId}"
            },
            "type": "SIMPLE"
        },
        {
            "name": "Slipmp_Check_Discount_Elegibility",
            "taskReferenceName": "t2_check_discount_elegibility",
            "inputParameters": {
                "case_value_param": "${t1_fulfill_order.output.is_entitled_result}"
            },
            "type": "DECISION",
            "caseValueParam": "case_value_param",
            "decisionCases": {
                "true": [
                    {
                        "name": "Slipmp_Send_Email_Discount_Next_Order",
                        "taskReferenceName": "t3_sending_next_order_discount_email",
                        "inputParameters": {
                            "orderId": "${workflow.input.orderId}",
                            "discountAmount": "${t1_fulfill_order.output.discountAmount}"
                        },
                        "type": "SIMPLE"
                    }
                ]
            }
        }
    ]
}

Successfully created a new Workflow definition

There are a few not so intuitive self-explanatory points that is indeed worth to be discussed:

name: It is the actual Definition origin name, either Workflow or Task. Example: Slipmp_Fulfill_Customer_Order must be the same name as defined on Task Definition here when being used.
taskReferenceName: It is how you reference tasks within your workflow, like when retrieving a value from previous task, a variable name, as such taskReferenceName must be unique within workflow.
Workflow input: First input for first task within workflow is:
“orderId”: “${workflow.input.orderId}”
This value will come when workflow starts.
System Task: What we are using on this particular example is a DECISION system task – It is an IF statement – it uses a variable case_value_param retrieved from first Worker Tasks via “${t1_create_order.output.is_entitled_result}”.
Output reuse: There are a few examples illustrating capabilities on receiving values from different Tasks, e.g. note on Slipmp_Send_Email_Discount_Next_Order it uses “discountAmount”: “${t1_fulfill_order.output.discountAmount}”, that essentially is a result from first Worker Task.

Triggering a Workflow

Conductor is now ready to orchestrate this Workflow. As stated on high level process, a workflow can be started through multiple sources. For didatical purposes, let’s imagine somewhere a business process within this virtual fake E-commerce has called a Conductor API desiring to trigger Fulfillment order workflow.

{
    "name": "Slipmp_Create_Ecommerce_Customer_Order",
    "version": 2,
    "correlationId": "Slipmp_meaningful_unique_correlationId",
    "input": {
        "orderId": "987654321"
    }
}

workflowInstanceId has been auto-generated, representing this work.

List of Running Workfows. Click on a workflow to view details.

Now Conductor has 1 task to be completed, just waiting a willingly worker to pick the work and complete it.

On below HTTP Request, we are impersonating a Worker, that theoretically speaking is constantly polling tasks to be completed from Conductor. Worker needs to indicate what the type of work its seeking to complete by providing Task definition name:

GET
http://localhost:8080/api/tasks/poll/Slipmp_Fulfill_Customer_Order

One of most important body results are: Workflow inputData, WorkflowId and TaskId

Impersonating that Worker eventually complete the task, it must provide a signal to Conductor it is COMPLETED:

{
  "workflowInstanceId": "013ae212-f953-4b22-97f2-e494378d79e5",
  "taskId": "7f6b1079-f65d-4696-a847-9830a9e7fd6f",
  "reasonForIncompletion": "",
  "callbackAfterSeconds": 0,
  "workerId": "Marcos-Worker-2020",
  "status": "COMPLETED",
  "outputData": {
    "is_entitled_result": true,
    "discountAmount": "25%"
  }
}

Please note taskId and workflowInstanceId must be accurate

As per Workflow definition, first task Slipmp_Fulfill_Customer_Order has as result “is_entitled_result”: true, instructing Conductor through DECISION task (IF Statement) to execute Slipmp_Send_Email_Discount_Next_Order, which is now SCHEDULED:

You know the drill, let’s have a “virtual fake worker” poll this task:

GET
http://localhost:8080/api/tasks/poll/Slipmp_Send_Email_Discount_Next_Order

Finally, worker notifies Conductor Workflow last Task is completed:

{
  "workflowInstanceId": "013ae212-f953-4b22-97f2-e494378d79e5",
  "taskId": "c6015dc0-c788-4ee6-b9b5-805715ac6b81",
  "reasonForIncompletion": "",
  "callbackAfterSeconds": 0,
  "workerId": "Marcos-Worker-2020",
  "status": "COMPLETED",
  "outputData": {
    "message": "E-mail with a next order 25% discount was sent successfully to customer"
  }
}

Property “message” is an output from last Task, it is meant for viewing and logging purposes only, as it will not consumed since workflow is completed.

You’ve been challenged

A Happy Path has been explored, but how about testing false on IF condition DECISION System Task? Explore more System Tasks? Marking Tasks as FAILURE or TIMED OUT tasks? etc.

Spend some time exploring UI and Swagger capabilities – witnessing Conductor powerful perks:

What else are there, until the bottom of Iceberg?

Remember, what was presented on this post is just The tip of the Iceberg as Conductor is so much more. Here are some subject examples, but not limited to, that should be covered on future sessions:

Conductor Extensibility: Extend or Modify any detail of conductor there is. e. Data sources, Security components, instrumentation, creating your own System tasks, etc.
Scalability: What are the components that are scalable and how easy is it to implement it?
Event Handlers: Ability to trigger workflows through external events such as SQS, RabbitMQ, etc.
Deployment: How can we deploy Conductor and have a well defined CICD DevOps deployment pipeline?
Deep dive on Workers: How workers are setup, polling configurations, thread management, what are the programming languages available for Clients? (it is Java and Python), can we develop our own worker, like in .NET Core? (Yes), etc.
Bold idea of not using Workers: Wait, what?! => What if instead of having Workers constantly polling tasks, we leverage System Task HTTP to enable Conductor to call internal APIs instead? So each step could be an HTTP call made to multiple systems internally – That could be a massive Software Architecture advantage.
Production Engineering: What are the best ways to deploy Conductor and its components? VMs vs Cloud vs Docker, and Load balancing?, etc.
Software Development Community support: Last but not least, an adoption decision factor => What is the community support? Is it active? Is it worth investing ? Other than Netflix, what are the know companies out there using it?
Endless inquiries…

Take action, enrich yourself and keep learning about this marvelous tool. Please share? 🙂

References

Netflix OSS (Open Source Software)
https://netflix.github.io/
Netflix Conductor documentation
https://netflix.github.io/conductor/
Netflix Conductor Github
https://github.com/Netflix/conductor

4 Comments

Add yours

abhishek mandal
November 18, 2022 at 3:23 am


Hey, you did not define workers then how are you generating the output of a task.

- Marcos Soares
  November 20, 2022 at 5:14 pm
  
  
  Hi Abhishek, thank you for sending this message!
  
  For this exercise, I am using Postman to Impersonate a Worker, search for “On below HTTP Request, we are impersonating a Worker, that theoretically speaking is constantly polling tasks to be completed from Conductor. Worker needs to indicate what the type of work its seeking to complete by providing Task…”
  
carlos de la penha
March 5, 2023 at 12:10 pm


the son of a bitch who invented this conductor needs to review the performance part for complex flows, it’s horrible and I stopped using the conductor because of that

- Marcos Soares
  March 6, 2023 at 12:44 pm
  
  
  This is sad to hear Carlos. Netflix Conductor is a well maintained project, new features being released constantly.
  When I’ve used it, there was no performance problems, to be fair, the throughput was not as high.
  The main advantage I see is visualizing a workflow diagram, I would use Conductor when each task may take whatever time needed. If performance is crucial, I would try to leverage another tool, like Kafka, distributed applications, etc.
  
  Maybe create an Issue on Netflix conductor (https://github.com/Netflix/conductor/issues) provide details about your benchmark, maybe then can provide feedback?

“Walking on water and developing software from a specification are easy if both are frozen.”

Netflix Conductor | The tip of the Iceberg