Advanced System Architecture for n8n Hosting: Engineering Considerations
Introduction
Deploying large-scale automation workloads introduces a range of engineering challenges that extend beyond application logic and workflow design. When orchestrating enterprise-grade processes with n8n, infrastructure decisions impact throughput, fault tolerance, observability, and compliance. Achieving predictable performance at scale requires a holistic understanding of distributed systems design, resource orchestration, and workload isolation.
This article provides a deep technical examination of what it takes to implement robust n8n hosting in production environments, focusing on concurrency models, state persistence, cluster coordination, and runtime observability.
Execution Model and Event Loop Mechanics
At its core, n8n is a Node.js application driven by an asynchronous event loop. Execution of workflows leverages non-blocking I/O, callback queues, and Promise chains. At high concurrency, the event loop becomes a bottleneck if not backed by proper resource allocation.
Concurrency in n8n workflows must be managed with attention to:
Event loop saturation
Microtask queue backpressure
Threadpool utilization (libuv default of 4 threads)
Worker lifecycles for parallel execution
In expert deployments, n8n instances are monitored for event loop latency (e.g., using histogram timers or low-latency monitoring hooks), ensuring that asynchronous operations do not starve the loop and cause unpredictable backpressure.
With n8n hosting, a common practice is to decouple workflow triggers from execution processes using dedicated worker services or a worker pool that scales independently of the main event listener. This prevents near-synchronous workloads — such as webhook floods — from overwhelming the scheduler.
Process Isolation and Containerization
Given the single-threaded nature of Node.js, horizontal scaling at the process level is mandatory for high throughput. Experts diverge from monolithic n8n processes and adopt one of the following:
Cluster mode: Multiple Node.js worker processes under a process manager
Process per workflow type: Isolated containers for CPU-intensive or long-running flows
Worker pools with message brokers: Using dedicated queues (e.g., Redis) to distribute executions
In a containerized orchestration platform (e.g., Kubernetes), n8n hosting should consider:
Pod anti-affinity to reduce noisy neighbor effects
CPU pinning for predictable compute slices
Network policy enforcement at CNI layer
Node-level taints to isolate automation traffic
Failure to implement robust process isolation can lead to cascading failures when a single workflow type monopolizes resources.
Distributed Workflow Execution and Queuing
High-velocity workloads demand a decoupled architecture where event ingestion — such as webhooks or cron triggers — is separated from execution engines. Utilizing message brokers with queue semantics enables:
Reliable retries
Backpressure management
Prioritized execution
Graceful throttling
In expertly configured n8n hosting systems, queues such as Redis Streams, RabbitMQ, or Kafka are chosen based on throughput and semantics. Redis Streams works well for smaller clusters due to its in-memory performance, while Kafka is preferred for persistent, high-throughput ecosystems.
Execution workers poll queues and use distributed locks or partition assignments to prevent duplicate execution across nodes. This approach mitigates race conditions and ensures idempotent behavior in distributed states.
State Persistence and Database Optimization
n8n persists workflow metadata, credentials, execution logs, and retry states in a relational database. SQLite is inadequate beyond minimal experimentation; production systems require:
ACID-compliant engines like PostgreSQL or MariaDB
Connection pooling (PgBouncer)
Schema versioning
Index optimization on execution history tables
Database performance directly influences workflow latency — inefficient join paths or missing indices cause exponential slowdowns under load. For expert deployments, they implement:
Partitioned execution logs
Normalized credential vaults
Sharded tables for high ingest rates
Connection pool sizing tailored to worker concurrency
With n8n hosting, database tuning is a continuous activity, as schema expansion and high cardinality execution logs can induce lock contention if not carefully managed.
Observability and Performance Telemetry
True observability in automation infrastructure requires metrics at every layer — not just application logs. Observability stacks integrate:
Event loop latency histograms
CPU and memory profiles per container
Distributed tracing (OpenTelemetry)
Queue lag metrics
Database slow query logs
Expert implementers adopt telemetry aggregation backends like Prometheus and Grafana or datastores like ClickHouse for long-term trend analysis. Alerts are tied to thresholds that indicate:
Backpressure buildup
Memory exhaustion
High queue residency
Event loop stalls
For n8n hosting, correlation between traces (workflow step timing) and infrastructure metrics (CPU steal, memory saturation) is indispensable for diagnosing complex failure modes.
Security Hardening and Credential Vaulting
Automation workflows often interact with sensitive infrastructure and third-party APIs. Security policies must ensure that secrets never exist in plain text:
Environment variables are scoped and encrypted
Credential storage is backed by HSM or KMS
Role-based access control at Kubernetes or VM level
Network policies prevent lateral movement
When configuring n8n hosting, experts integrate secret management frameworks such as Vault, AWS KMS, or GCP Secret Manager. These systems ensure that credential encryption adheres to compliance standards and reduces blast radius in the event of a breach.
High Availability and Fault Domains
Distributed workflows require resilient infrastructure. High availability is typically achieved through:
Stateful sets with persistent volumes
Multi-AZ deployments
Automatic failover for database replicas
Circuit breakers for external dependencies
Unlike stateless services, n8n workflows that interact with long-running external systems must be designed with idempotency, retries, and checkpointing in mind. Without this, partial failures induce inconsistent execution states.
With n8n hosting, architectures often include active-active clusters with health probes and immutable rollout strategies to minimize downtime.
Conclusion
Implementing production-grade n8n infrastructure is a multidisciplinary challenge that spans event loop behavior, process isolation, distributed queuing, database tuning, observability, and security hardening. Simply deploying workflows on shared infrastructure is insufficient when performance, reliability, and compliance are required.
Expert architects approaching n8n hosting must treat it as a distributed system with stateful execution paths, real-time performance constraints, and complex failure modes. Only by addressing these areas through engineering discipline can automation scale with robustness and efficiency in mission-critical environments.
Strangulating bare-metal infrastructure to Containers
Change is inevitable. Change for the better is a full-time job ~ Adlai Stevenson I
We run a successful digital platform for one of our clients. It manages huge amounts of data aggregation and analysis in Out of Home advertising domain.
The platform had been running successfully for a while. Our original implementation was focused on time to market. As it expanded across geographies and impact, we decided to shift our infrastructure to containers for reasons outlined later in the post. Our day to day operations and release cadence needed to remain unaffected during this migration. To ensure those goals, we chose an approach of incremental strangulation to make the shift.
Strangler pattern is an established pattern that has been used in the software industry at various levels of abstraction. Documented by Microsoft and talked about by Martin Fowler are just two examples. The basic premise is to build an incremental replacement for an existing system or sub-system. The approach often involves creating a Strangler Facade that abstracts both existing and new implementations consistently. As features are re-implemented with improvements behind the facade, the traffic or calls are incrementally routed via new implementation. This approach is taken until all the traffic/calls go only via new implementation and old implementation can be deprecated. We applied the same approach to gradually rebuild the infrastructure in a fundamentally different way. Because of the approach taken our production disruption was under a few minutes.
This writeup will explore some of the scaffolding we did to enable the transition and the approach leading to a quick switch over with confidence. We will also talk about tech stack from an infrastructure point of view and the shift that we brought in. We believe the approach is generic enough to be applied across a wide array of deployments.
The as-is
###Infrastructure
We rely on Amazon Web Service to do the heavy lifting for infrastructure. At the same time, we try to stay away from cloud-provider lock-in by using components that are open source or can be hosted independently if needed. Our infrastructure consisted of services in double digits, at least 3 different data stores, messaging queues, an elaborate centralized logging setup (Elastic-search, Logstash and Kibana) as well as monitoring cluster with (Grafana and Prometheus). The provisioning and deployments were automated with Ansible. A combination of queues and load balancers provided us with the capability to scale services. Databases were configured with replica sets with automated failovers. The service deployment topology across servers was pre-determined and configured manually in Ansible config. Auto-scaling was not built into the design because our traffic and user-base are pretty stable and we have reasonable forewarning for a capacity change. All machines were bare-metal machines and multiple services co-existed on each machine. All servers were organized across various VPCs and subnets for security fencing and were accessible only via bastion instance.
###Release cadence
Delivering code to production early and frequently is core to the way we work. All the code added within a sprint is released to production at the end. Some features can span across sprints. The feature toggle service allows features to be enabled/disable in various environments. We are a fairly large team divided into small cohesive streams. To manage release cadence across all streams, we trigger an auto-release to our UAT environment at a fixed schedule at the end of the sprint. The point-in-time snapshot of the git master is released. We do a subsequent automated deploy to production that is triggered manually.
CI and release pipelines
Code and release pipelines are managed in Gitlab. Each service has GitLab pipelines to test, build, package and deploy. Before the infrastructure migration, the deployment folder was co-located with source code to tag/version deployment and code together. The deploy pipelines in GitLab triggered Ansible deployment that deployed binary to various environments.
Figure 1 — The as-is release process with Ansible + BareMetal combination
The gaps
While we had a very stable infrastructure and matured deployment process, we had aspirations which required some changes to the existing infrastructure. This section will outline some of the gaps and aspirations.
Cost of adding a new service
Adding a new service meant that we needed to replicate and setup deployment scripts for the service. We also needed to plan deployment topology. This planning required taking into account the existing machine loads, resource requirements as well as the resource needs of the new service. When required new hardware was provisioned. Even with that, we couldn’t dynamically optimize infrastructure use. All of this required precious time to be spent planning the deployment structure and changes to the configuration.
Lack of service isolation
Multiple services ran on each box without any isolation or sandboxing. A bug in service could fill up the disk with logs and have a cascading effect on other services. We addressed these issues with automated checks both at package time and runtime however our services were always susceptible to noisy neighbour issue without service sandboxing.
Multi-AZ deployments
High availability setup required meticulous planning. While we had a multi-node deployment for each component, we did not have a safeguard against an availability zone failure. Planning for an availability zone required leveraging Amazon Web Service’s constructs which would have locked us in deeper into the AWS infrastructure. We wanted to address this without a significant lock-in.
Lack of artefact promotion
Our release process was centred around branches, not artefacts. Every auto-release created a branch called RELEASE that was promoted across environments. Artefacts were rebuilt on the branch. This isn’t ideal as a change in an external dependency within the same version can cause a failure in a rare scenario. Artefact versioning and promotion are more ideal in our opinion. There is higher confidence attached to releasing a tested binary.
Need for a low-cost spin-up of environment
As we expanded into more geographical regions rapidly, spinning up full-fledged environments quickly became crucial. In addition to that without infrastructure optimization, the cost continued to mount up, leaving a lot of room for optimization. If we could re-use the underlying hardware across environments, we could reduce operational costs.
Provisioning cost at deployment time
Any significant changes to the underlying machine were made during deployment time. This effectively meant that we paid the cost of provisioning during deployments. This led to longer deployment downtime in some cases.
Considering containers & Kubernetes
It was possible to address most of the existing gaps in the infrastructure with additional changes. For instance, Route53 would have allowed us to set up services for high availability across AZs, extending Ansible would have enabled multi-AZ support and changing build pipelines and scripts could have brought in artefact promotion.
However, containers, specifically Kubernetes solved a lot of those issues either out of the box or with small effort. Using KOps also allowed us to remained cloud-agnostic for a large part. We decided that moving to containers will provide the much-needed service isolation as well as other benefits including lower cost of operation with higher availability.
Since containers differ significantly in how they are packaged and deployed. We needed an approach that had a minimum or zero impact to the day to day operations and ongoing production releases. This required some thinking and planning. Rest of the post covers an overview of our thinking, approach and the results.
The infrastructure strangulation
A big change like this warrants experimentation and confidence that it will meet all our needs with reasonable trade-offs. So we decided to adopt the process incrementally. The strangulation approach was a great fit for an incremental rollout. It helped in assessing all the aspects early on. It also gave us enough time to get everyone on the team up to speed. Having a good operating knowledge of deployment and infrastructure concerns across the team is crucial for us. The whole team collectively owns the production, deployments and infrastructure setup. We rotate on responsibilities and production support.
Our plan was a multi-step process. Each step was designed to give us more confidence and incremental improvement without disrupting the existing deployment and release process. We also prioritized the most uncertain areas first to ensure that we address the biggest issues at the start itself.
We chose Helm as the Kubernetes package manager to help us with the deployments and image management. The images were stored and scanned in AWS ECR.
The first service
We picked the most complicated service as the first candidate for migration. A change was required to augment the packaging step. In addition to the existing binary file, we added a step to generate a docker image as well. Once the service was packaged and ready to be deployed, we provisioned the underlying Kubernetes infrastructure to deploy our containers. We could deploy only one service at this point but that was ok to prove the correctness of the approach. We updated GitLab pipelines to enable dual deploy. Upon code check-in, the binary would get deployed to existing test environments as well as to new Kubernetes setup.
Some of the things we gained out of these steps were the confidence of reliably converting our services into Docker images and the fact that dual deploy could work automatically without any disruption to existing work.
Migrating logging & monitoring
The second step was to prove that our logging and monitoring stack could continue to work with containers. To address this, we provisioned new servers for both logging and monitoring. We also evaluated Loki to see if we could converge tooling for logging and monitoring. However, due to various gaps in Loki given our need, we stayed with ElasticSearch stack. We did replace logstash and filebeat with Fluentd. This helped us address some of the issues that we had seen with filebeat our old infrastructure. Monitoring had new dashboards for the Kubernetes setup as we now cared about both pods as well in addition to host machine health.
At the end of the step, we had a functioning logging and monitoring stack which could show data for a single Kubernetes service container as well across logical service/component. It made us confident about the observability of our infrastructure. We kept new and old logging & monitoring infrastructure separate to keep the migration overhead out of the picture. Our approach was to keep both of them alive in parallel until the end of the data retention period.
Addressing stateful components
One of the key ingredients for strangulation was to make any changes to stateful components post initial migration. This way, both the new and old infrastructure can point to the same data stores and reflect/update data state uniformly.
So as part of this step, we configured newly deployed service to point to existing data stores and ensure that all read/writes worked seamlessly and reflected on both infrastructures.
Deployment repository and pipeline replication
With one service and support system ready, we extracted out a generic way to build images with docker files and deployment to new infrastructure. These steps could be used to add dual-deployment to all services. We also changed our deployment approach. In a new setup, the deployment code lived in a separate repository where each environment and region was represented by a branch example uk-qa,uk-prod or in-qa etc. These branches carried the variables for the region + environment. In addition to that, we provisioned a Hashicorp Vault to manage secrets and introduced structure to retrieve them by region + environment combination. We introduced namespaces to accommodate multiple environments over the same underlying hardware.
Crowd-sourced migration of services
Once we had basic building blocks ready, the next big step was to convert all our remaining services to have a dual deployment step for new infrastructure. This was an opportunity to familiarize the team with new infrastructure. So we organized a session where people paired up to migrate one service per pair. This introduced everyone to docker files, new deployment pipelines and infrastructure setup.
Because the process was jointly driven by the whole team, we migrated all the services to have dual deployment path in a couple of days. At the end of the process, we had all services ready to be deployed across two environments concurrently.
Test environment migration
At this point, we did a shift and updated the Nameservers with updated DNS for our QA and UAT environments. The existing domain started pointing to Kubernetes setup. Once the setup was stable, we decommissioned the old infrastructure. We also removed old GitLab pipelines. Forcing only Kubernetes setup for all test environments forced us to address the issues promptly.
In a couple of days, we were running all our test environments across Kubernetes. Each team member stepped up to address the fault lines that surfaced. Running this only on test environments for a couple of sprints gave us enough feedback and confidence in our ability to understand and handle issues.
Establishing dual deployment cadence
While we were running Kubernetes on the test environment, the production was still on old infrastructure and dual deployments were working as expected. We continued to release to production in the old style.
We would generate images that could be deployed to production but they were not deployed and merely archived.
Figure 2 — Using Dual deployment to toggle deployment path to new infrastructure
As the test environment ran on Kubernetes and got stabilized, we used the time to establish dual deployment cadence across all non-prod environments.
Troubleshooting and strengthening
Before migrating to the production we spent time addressing and assessing a few things.
We updated the liveness and readiness probes for various services with the right values to ensure that long-running DB migrations don’t cause container shutdown/respawn. We eventually pulled out migrations into separate containers which could run as a job in Kubernetes rather than as a service.
We spent time establishing the right container sizing. This was driven by data from our old monitoring dashboards and the resource peaks from the past gave us a good idea of the ceiling in terms of the baseline of resources needed. We planned enough headroom considering the roll out updates for services.
We setup ECR scanning to ensure that we get notified about any vulnerabilities in our images in time so that we can address them promptly.
We ran security scans to ensure that the new infrastructure is not vulnerable to attacks that we might have overlooked.
We addressed a few performance and application issues. Particularly for batch processes, which were split across servers running the same component. This wasn’t possible in Kubernetes setup, as each instance of a service container feeds off the same central config. So we generated multiple images that were responsible for part of batch jobs and they were identified and deployed as separate containers.
Upgrading production passively
Finally, with all the testing we were confident about rolling out Kubernetes setup to the production environment. We provisioned all the underlying infrastructure across multiple availability zones and deployed services to them. The infrastructure ran in parallel and connected to all the production data stores but it did not have a public domain configured to access it. Days before going live the TTL for our DNS records was reduced to a few minutes. Next 72 hours gave us enough time to refresh this across all DNS servers.
Meanwhile, we tested and ensured that things worked as expected using an alternate hostname. Once everything was ready, we were ready for DNS switchover without any user disruption or impact.
DNS record update
The go-live switch-over involved updating the nameservers’ DNS record to point to the API gateway fronting Kubernetes infrastructure. An alternate domain name continued to point to the old infrastructure to preserve access. It remained on standby for two weeks to provide a fallback option. However, with all the testing and setup, the switch over went smooth. Eventually, the old infrastructure was decommissioned and old GitLab pipelines deleted.
Figure 3 — DNS record update to toggle from legacy infrastructure to containerized setup
We kept old logs and monitoring data stores until the end of the retention period to be able to query them in case of a need. Post-go-live the new monitoring and logging stack continued to provide needed support capabilities and visibility.
Observations and results
Post-migration, time to create environments has reduced drastically and we can reuse the underlying hardware more optimally. Our production runs all services in HA mode without an increase in the cost. We are set up across multiple availability zones. Our data stores are replicated across AZs as well although they are managed outside the Kubernetes setup. Kubernetes had a learning curve and it required a few significant architectural changes, however, because we planned for an incremental rollout with coexistence in mind, we could take our time to change, test and build confidence across the team. While it may be a bit early to conclude, the transition has been seamless and benefits are evident.
Quick Links: GitHub | Documentation A few weeks ago we open sourced Faust, a Python stream processing library that we built at Robinhood to make it extremely easy to build and deploy traditionally complex streaming architectures. As Robinhood has grown and we have added more and more functionality to our product, our infrastructure has also evolved. We have added numerous internal services and technologies to help us solve different problems. This has resulted in a typical application often needing to interact with one or many different services. Typical streaming frameworks such as Spark require external dependencies to be packaged with the app in specific ways, and submitted into the Yarn/Mesos cluster that is running the application. This is usually a detour from how Python applications typically manage dependencies — virtualenv and pip. We built Faust as a library to allow for it to be used with any existing tools you may be using. Simply install Faust, and use it to develop Python applications as you typically would. We use Python Asyncio to achieve high performance I/O. In this blog post we will walk through some examples of how we use Faust to interact with various different services using off-the-shelf libraries. Faust + Redis Redis has established itself as an in-memory data store of choice owing to its data structures, amazing query speeds and simplicity. We use Redis on Robinhood’s Data team across a variety of use cases. Following is an example, showing how we use Redis to cache messages on the Robinhood Feed. We can install aredis and Faust using pip:pip install aredis
pip install faust Upon installing the dependencies, let’s first define our Faust application, Kafka topic and models:import datetime
import faustclass Activity(faust.Record, isodates=True): user: str message: str timestamp: datetime.datetimeapp = faust.App("redis_example", broker="kafka://localhost:9092")
activities_topic = app.topic("feed_activities", value_type=Activity) We can now create an agent which reads feed activities coming in through this topic, and adds the messages to the user’s Redis sorted set as follows:import [email protected](activities_topic)
async def save_activities(activities): async for activity in activities: client = aredis.StrictRedis(host="localhost", port=6379) await client.zadd(activity.user, activity.timestamp, activity.message) As shown above, we use Redis as you would use it in any app. Faust doesn’t require any special drivers or modes for using Redis. All it needs is a Redis library that’s compatible with Python Asyncio. Faust + HTTP We often use streaming apps that need to talk to other services over HTTP. Below is an example of how we use the Python aiohttp library from a Faust streaming app for one of our use cases at Robinhood. First, let us install the Python library we will use for HTTP requests:pip install aiohttp We skip the app and model definition which is similar to the previous, and straightaway look at how we would design our agent. We create an agent which processes orders and uses a third part HTTP API to send order confirmation emails to our customers:import aiohttpasync def send_confirmation(order): url = f"https://emailer.robinhood.com/" data = { "user": order.user_id, "subject": "Order Confirmation", "body" f"Order: {order.quantity} shares of {order.symbol}", } async with aiohttp.ClientSession() as session: await session.post(url, json=data)@app.agent(orders_topic)
async def add_symbol(orders): async for order in orders: await send_confirmation(order) A lot of our internal services export REST APIs. The ability to easily integrate aiohttp with Faust apps allows us to break down a variety of our backend systems into simple and isolated streaming apps. Faust + InfluxDB Robinhood operates on large amounts of time series data such as tick by tick price data for each stock symbol. We use InfluxDB to store some of these time series. Below is an example of how we query InfluxDB from a Faust application. Again, as before, let us install the Python library we will use to query InfluxDB:pip install aioinflux We now create an agent which looks at the orders topic from above and looks at the time series in InfluxDB for the particular stock to get the price at which the order executed was the price in the market at the time. We do this to ensure that we are giving the best quality of executions to our customers.import [email protected](orders_topic)
async def add_symbol(orders): async for order in orders: client = aioinflux.InfluxDBClient() query = f"SELECT price FROM marketdata WHERE symbol = {order.symbol} AND timestamp
Microservices is an architecture paradigm. In this architectural style, small and independent components work together as a system. Despite its higher operational complexity, the paradigm has seen a rapid adoption. It is because it helps break down a complex system into manageable services. The services embrace micro-level concerns like single responsibility, separation of concerns, modularity, etc.
Patterns for microservices is a series of blogs. Each blog will focus on an architectural pattern of microservices. It will reason about the possibilities and outline situations where they are applicable. All that while keeping in mind various system design constraints that tug at each other.
Inter-service communication and execution flow is a foundational decision for a distributed system. It can be synchronous or asynchronous in nature. Both the approaches have their trade-offs and strengths. This blog attempts to dissect various choices in detail and understand their implications.
Dimensions
Each implementation style has trade-offs. At the same time, there can be various dimensions to a system under consideration. Evaluating trade-offs against these constraints can help us reason about approaches and applicability. There are various dimensions of a system that impact the execution flow and the communication style of a system. Let’s look at some of them.
Consumers
Consumers of a system can be external programs, web/mobile interfaces, IoT devices etc. Consumer applications often deal with the server synchronously and expect the interface to support that. It is also desirable to mask the complexity of a distributed system with a unified interface for consumers. So it is imperative that our communication style allows us to facilitate it.
Workflow management
With many participating services, the management of a business-workflow is crucial. It can be implicit and can happen at each service and therefore remain distributed across services. Alternatively, it can be explicit. An orchestrator service can own up the responsibility for orchestrating the business-flows. The orchestration is a combination of two things. A workflow specification, that lays out the sequence of execution and the actual calls to the services. The latter is tightly bound to the communication paradigm that the participating services follow. Communication style and execution flow drive the implementation of an orchestrator.
A third option is an event-choreography based design. This substitutes an orchestrator via an event bus that each service binds to.
All these are mechanisms to manage a workflow in a system. We will cover workflow management in detail, later in this series. However, we will consider constraints associated with them in the current context as we evaluate and select a communication paradigm.
Read/Write frequency bias
Read/Write frequency of the system can be a crucial factor in its architecture. A read-heavy system expects a majority of operations to complete synchronously. A good example would be a public API for a weather forecast service that operates at scale. Alternatively, a write-heavy system benefits from asynchronous execution. An example would be a platform where numerous IoT devices are constantly reporting data. And of course, there are systems in between. Sometimes it is useful to favor a style because of the read-write skew. At other times, it may make sense to split reads and writes into separate components.
As we look through various approaches we need to keep these constraints in perspective. These dimensions will help us distill the applicability of each style of implementation.
Synchronous
Synchronous communication is a style of communication where the caller waits until a response is available. It is a prominent and widely used approach. Its conceptual simplicity allows for a straightforward implementation making it a good fit for most of the situations.
Synchronous communication is closely associated with HTTP protocol. However, other protocols remain an equally reasonable way to implement synchronous communication. A good example of an alternative is RPC calls. Each component exposes a synchronous interface that other services call.
An interceptor near the entry point intercepts the business flow request. It then pushes the request to downstream services. All the subsequent calls are synchronous in nature. These calls can be parallel or sequential until processing is complete. Handling of the calls within the system can vary in style. An orchestrator can explicitly orchestrate all the calls. Or calls can percolate organically across components. Let’s look at few possible mechanisms.
Variations
Within synchronous systems, there are several approaches that an architecture can take. Here is a quick rundown of the possibilities.
De-centralized and synchronous
A de-centralized and synchronous communication style intercepts a flow at the entry point. The interceptor forwards the request to the next step and awaits a response. This cycle continues downstream until all services have completed their execution. Each service can execute one or more downstream service sequentially or in parallel. While the implementation is straightforward, the flow details remain distributed in the system. This results in coupling between components to execute a flow.
The calls remain synchronous throughout the system. Thus, the communication style can fulfill the expectations of a synchronous consumer. Because of distributed workflow nature, the approach doesn’t allow room for flexibility. It is not well suited for a complex workflow that is susceptible to change. Since each request to the system can block services simultaneously, it is not ideal for a system with high read/write frequency.
Orchestrated, synchronous and sequential
A variation of a synchronous communication is with a central orchestrator. The orchestrator remains the intercepting service. It processes the incoming request with workflow definition and forwards it to downstream services. Each service, in turn, responds back to the orchestrator. Until the processing of a request, orchestrator keeps making calls to services.
Among the constraints listed at the beginning, the workflow management is more flexible in this approach. The workflow changes remain local to orchestrator and allow for flexibility. Since the communication is synchronous, synchronous consumers can communicate without a mediating component. However, orchestrator continues to hold all active requests. This burdens orchestrator more than other services. It is also susceptible to being a single point of failure. This style of architecture is still suitable for a read-heavy system.
Orchestrated, synchronous and parallel
A small improvement on the previous approach is to make independent requests parallel. This leads to higher efficiency and performance. Since this responsibility falls within the realms of orchestration, it is easy to do. Workflow management is already centralized. It only requires changes in the declaration to distinguish between parallel and sequential calls.
This can allow for faster execution of a flow. With shorter response times, orchestrator can have a higher throughput.
Workflow management is more complex than the previous approach. It still might be a reasonable trade off since it improves both throughput and performance. All that, while keeping the communication synchronous for consumers. Due to its synchronous nature, the system is still better for a read-heavy architecture.
Trade offs
Although, synchronous calls are simpler to grasp, debug and implement, there are certain trade-offs which are worth acknowledging in a distributed setup.
Balanced capacity
It requires a deliberate balancing of the capacity for all the services. A temporary burst at one component can flood other services with requests. In asynchronous style, queues can mitigate temporary bursts. Synchronous communication lacks this mediation and requires service capacity to match up during bursts. Failing this, a cascading failure is possible. Alternatively, resilience paradigms like circuit breakers can help mitigate a traffic burst in a synchronous system.
Risk of cascading failures
Synchronous communication leaves upstream services susceptible to cascading failure in a microservices architecture. If downstream service fail or worst yet, take too long to respond back, the resources can deplete quickly. This can cause a domino effect for the system. A possible mitigation strategy can involve consistent error handling, sensible timeouts for connections and enforcing SLAs. In a synchronous environment, the impact of a deteriorating service ripple through other services immediately. As mentioned previously, prevention of cascading errors can happen by implementing a bulkhead architecture or with circuit breakers.
Increased load balancing & service discovery overhead
The redundancy and availability needs for a participating service can be addressed by setting them up behind a load balancer. This adds a level of indirection per service. Additionally, each service needs to participate in a central service discovery setup. This allows it to push its own address and resolve the address of the downstream services.
Coupling
A synchronous system can exhibit much tighter coupling over a period of time. Without abstractions in between, services bind directly to the contracts of the other services. This develops a strong coupling over a period of time. For simple changes in the contract, the owning service is forced to adopt versioning early on. Thereby increasing the system complexity. Or it trickles down a change to all consumer services which are coupled to the contract.
With emerging architectural paradigms like service mesh, it is possible to address some of the stated issues. Tools like Istio, Linkerd, Envoy etc. allow for a service mesh creation. This space is maturing and remains promising. It can help build systems that are synchronous, more decoupled and fault tolerant.
Asynchronous
Asynchronous communication is well suited for a distributed architecture. It removes the need to wait for a response thereby decoupling the execution of two or more services. Implementation of asynchronous communication is possible with several variations. Direct calls to a remote service over RPC (for instance grpc) or via a mediating message bus are few examples. Both orchestrated message passing and event choreography use message bus as a channel.
One of the advantages of a central message bus is consistent communication and message delivery semantics. This can be a huge benefit over direct asynchronous communication between services. It is common to use a medium like a message bus that facilitates communication consistently across services. The variations of asynchronous communications discussed below will assume a central message pipeline.
Variations
The asynchronous communication deals better with sporadic bursts of traffic. Each service in the architecture either produces messages, consumes messages or does both. Let’s look at different structural styles of this paradigm.
Choreographed asynchronous events
In this approach, each component listens to a central message bus and awaits an event. The arrival of an event is a signal for execution. Any context needed by execution is part of the event payload. Triggering of downstream events is a responsibility that each service owns. One of the goals in event-based architecture is to decouple the components. Unfortunately, the design needs to be responsible to cater to this need.
A notification component may expect an event to trigger an email or SMS. It may seem pretty decoupled since all that the other services need to do is produce the event. However, someone does need to own the responsibility of deciding type of notification and content. Either notification can make that decision based on an incoming event info. If that happens then we have established a coupling between notifications and upstream services. If upstream services include this as part of the payload, then they remain aware of flows downstream.
Even so, event choreography is a good fit for implicit actions that need to happen. Error handling, notifications, search-indexing etc. It follows a decentralized workflow management. The architecture scales well for a write-heavy system. The downside is that synchronous reads need mediation and workflow is spread through the system.
Orchestrated, asynchronous and sequential
We can borrow a little from our approach in orchestrated synchronous communication. We can build an asynchronous communication with orchestrator at the center.
Each service is a producer and consumer to the central message bus. Responsibilities of orchestrator involve routing messages to their corresponding services. Each component consumes an incoming event or message and produces the response back on the message queue. Orchestrator consumes this response and does transformation before routing ahead to next step. This cycle continues until the specified workflow has reached its last state in the system.
In this style, the workflow management is local to the orchestrator. The system fares well with write-heavy traffic. And mediation is necessary for synchronous consumers. This is something that is prevalent in all asynchronous variations.
The solution to choreography coupling problem is more elegant in the orchestrated system. The workflow is with orchestrator in this case. A rich workflow specification can capture information like notification type and content template. Any changes to workflow remain with orchestrator service.
Hybrid with orchestration and event choreography
Another successful variation is hybrid systems with orchestration and event choreography both. The orchestration is excellent for explicit flow execution, while choreography can handle implicit execution. Execution of leaf nodes in a workflow can be implicit. Workflow specification can facilitate emanation of events at specific steps. This can result in the execution of tasks like notifications, indexing, et-cetera. The orchestration can continue to drive explicit execution.
This amalgamation of two approaches provides best of both worlds. Although, there is a need for precaution to ensure they don’t overlap responsibilities and clear boundaries dictate their functioning.
Overview
Asynchronous style of architecture addresses some of the pitfalls that synchronous systems have. An asynchronous set-up fares better with temporary bursts of requests. Central queues allow services to catch up with a reasonable backlog of requests. This is useful both when a lot of requests come in a short span of time or when a service goes down momentarily.
Each service connects to a message queue as a consumer or producer. Only the message queue requires service discovery. So the need for a central service discovery solution is less pressing. Additionally, since multiple instances of a service are connected to a queue, external load balancing is not required. This prevents another level of indirection that otherwise, load balancer introduces. It also allows services to linear scale seamlessly.
Trade Offs
Service flows that are asynchronous in nature can be hard to follow through the system. There are some trade-offs that a system adopting asynchronous communication will make. Let’s look at some of them.
Higher system complexity
Asynchronous systems tend to be significantly more complex than synchronous ones. However, the complexity of system and demands of performance and scale are justified for the overhead. Once adopted both orchestrator and individual components need to embrace the asynchronous execution.
Reads/Queries require mediation
Unless handled specifically synchronous consumers are most affected by an asynchronous architecture. Either the consumers need to adapt to work with an asynchronous system, or the system should present a synchronous interface for the consumers. Asynchronous architecture is a natural fit for the write-heavy system. However, it needs mediation for synchronous reads/queries. There are several ways to manage this need. Each one has certain complexity associated with it.
Sync wrapper
Simplest of all approaches is building a sync wrapper over an async system. This is an entry point that can invoke asynchronous flows downstream. At the same time, it holds the request awaiting until the response returns or a timeout occurs. A synchronous wrapper is a stateful component. An incoming request ties itself to the server it lands on. The response from downstream services needs to arrive at the server where original request is waiting. This isn’t ideal for a distributed system, especially one that operates at scale. However, it is simple to write and easy to manage. For a system with reasonable scaling and performance needs it can fit the bill. A sync wrapper should be a consideration before a more drastic restructuring.
CQRS
CQRS is an architectural style that separates reads from writes. CQRS brings a significant amount of risk and complexity to a system. It is a good fit for systems that operate at scale and requite heavy reads and writes. In CQRS architecture, data from write database streams to a read database. Queries run on a read-optimized database. Read/Write layers are separate and the system remains eventually consistent. Optimization of both the layers is independent. A system like this is far more complex in structure but it scales better. Moreover, the components can remain stateless (unlike sync wrappers).
Dual support
There is a middle ground here between a sync wrapper and a CQRS implementation. Each service/component can support synchronous queries and asynchronous writes. This works well for a system which is operating at a medium scale. So read queries can hop between components to finish reads synchronously. Writes to the system, on the other hand, will flow down asynchronous channels. There is a trade-off here though. The optimization of a system for both reads and writes independently is not possible. Something, that is beneficial for a system operating at high traffic.
Message bus is a central point of failure
This is not a trade-off, but a precaution. In the asynchronous communication style, message bus is the backbone of the system. All services constantly produce-to and consume-from the message bus. This makes the message bus the Achilles heel of the system as it remains a central point of failure. It is important for a message bus to support horizontal scaling otherwise it can work against the goals of a distributed system.
Eventual consistency
An asynchronous system can be eventually consistent. It means that results in queries may not be latest, even though the system has issued the writes. While this trade-off allows the system to scale better, it is something to factor-in into system’s design and user experience both.
Hybrid
It is possible to use both asynchronous and synchronous communication together. When done the trade-offs of both approaches overpower their advantages. The system has to deal with two communication styles interchangeably. The synchronous calls can cascade degradation and failures. On the other hand, the asynchronous communication will add complexity to the design. In my experience, choosing one approach in isolation is more fruitful for a system design.
Verdict
Martin Fowler has a great blog on approaching the decision to build microservices. Once decided, a microservice architecture requires careful deliberation around its execution flow style. For a write, heavy system asynchronous is the best bet with a sync-over-async wrapper. Whereas, for a read-heavy system, synchronous communication works well.
For a system that is both read and write heavy, but has moderate scale requirements, a synchronous design will go a long way in keeping the design simple. If a system has significant scale and performance needs, asynchronous design with CQRS pattern might be the way to go.
Lets say that we have a cluster of nodes with out SPOF. There should be a node in the cluster ready to receive the value from client and save it (disk or memory). How do we find that node,hence we need to select a node from the cluster as Master and that in turn replicates to other nodes. Instead of us making the decision of hand-pick the node, this alogrithm is going to pick node as master, this process is called Leader Election and is a common use case in Distributed Systems.
I had implemented a simple Leader Election algorithm that does this using Akka. This post discusses the implementation.
Algorithm
Each node is considered as a State machine in this algorithm where in it could be in any of the following state
Idle - node start with this state.
Candidate - when node becomes ready for election.
Leader - the master of the cluster.
Idle to Candidate
on start of each node it should be aware ofthe other nodes
a timer executes and it checks if master is elected
if elected it updates its reg to the master
Else it starts the election processs where in it initaites "Election" msg which of type
// Election(ActorRef , Long)
is sent to other nodes here by it (and other nodes) un-become "Idle" i.e it goes back to the Idle state.
Candidate to Leader
While its candidate it also receives election message from other nodes and it stores all such messages in a cache.
// List(Election)
At this state another scheduler kicks in to elect the leader such that all nodes finds the actor ref for corresponding oldest timestamp basically a simple search in the cache and is sent a Leader Elected message.
The node which gets the Leader elected message is the new Leader thus it becomes "Leader" now the leader send other nodes Leader message.
Other nodes remains as candidate and update their master reference to the sender of the "Leader" message.
When Master node goes down
Now that master is elected and oher nodes acting in Idle state. Its also possible hat Master node might go down in such situations and other nodes should be aware of the state master in equal intervals of time.
Each node sends a Heart-Beat message to the master for every 2 secs and receive a Ack msg from the master within 2 secs.
If the Ack is received the node considers that master is alive if not the node gets a time out or something it starts a election thus becomes candidate.
More steps
So far I have implmented only the alorithm and its not complete with a implementation.
Code
Below is the complete implmentation of the algorithm in Akka. This could be run in a cluster of nodes using Akka cluster.
class Node extends Actor { val cluster = Cluster(context.system) val peersBuffer = ListBuffer[ActorRef]() val nodeData = new NodeData() var masterElected = false var electedMaster: ActorRef = null var lastTimeStamp = 0l // subscribe to cluster changes, MemberUp // re-subscribe when restart override def preStart(): Unit = { cluster.subscribe(self, classOf[MemberUp]) cluster.subscribe(self, classOf[UnreachableMember]) } override def postStop(): Unit = cluster.unsubscribe(self) def receive = idle import DurationImplicits._ import scala.concurrent.ExecutionContext.Implicits._ context.system.scheduler.schedule(2.toSecs, 2.toSecs, self, IsMasterElected) context.system.scheduler.schedule(4.toSecs, 10.toSecs, self, ElectionOver) context.system.scheduler.schedule(4.toSecs, 2.toSecs, self, CheckMasterHB) /** * this stage node collects its peers before it can conduct election * when the node */ def idle: Receive = { case MemberUp(m) => register(m) case UnreachableMember(x) => case IndexerNodeUp => if (masterElected) { sender() ! PreElectedMaster(electedMaster) } peersBuffer.+=(sender()) println(s"A Indexer node is brought up now the peers are $peersBuffer") case a: PreElectedMaster => masterElected = true electedMaster = a.actorRef case IsMasterElected => println("schduler kicked in to elect master") if (!masterElected) { println("Cluster's master does not exist , we may need election") lastTimeStamp = System.currentTimeMillis() println(s"Sending Election message to peers $lastTimeStamp -> $peersBuffer") println("Iam a candidate now contesting on election") context.become(candidate) peersBuffer // .filter(ar => ar.path != self.path) .foreach(ar => ar ! Election(lastTimeStamp)) } else context.become(candidate) case DataRequest => println("back to Idle state") case CheckMasterHB => try { Await.result(electedMaster ? HBMaster, 2.toSecs) } catch { case e: Exception => // this means master has not responsed withtin 2 secs hence master failure println("master failed time for the node to become candidate and contest election") } finally {} } def candidate: Receive = { case IndexerNodeUp => if (masterElected) { sender() ! PreElectedMaster(electedMaster) } peersBuffer.+=(sender()) println(s"A Indexer node is brought up now the peers are $peersBuffer") case election: Election => println("Received vote from peer") nodeData.addVote(sender(), election.ts) case ElectionOver => println(s"schduler kicked in to announce election over master $masterElected") if (!masterElected) nodeData.findOldest ! LeaderElected case LeaderElected => println("Oh my god ... Iam elected as leader ") masterElected = true println("Sending the new leader") peersBuffer.foreach(ar => ar ! NewLeader) case NewLeader => println("Welcome new leader") masterElected = true electedMaster = sender() if (sender().path == self.path) { println("Iam elected Let me sworn in as leader ") context.become(leader) self ! "First Msg" } nodeData.invalidateVotes() context.unbecome() // self ! DataRequest } def leader: Receive = { case s: String => println(s"After elected as a leader $s") case HBMaster => sender() ! MasterHBAck case IndexerNodeUp => if (masterElected) { sender() ! PreElectedMaster(electedMaster) } peersBuffer.+=(sender()) println(s"A Indexer node is brought up now the peers are $peersBuffer") } def register(member: Member): Unit = if (member.hasRole("indexer")) context.actorSelection(RootActorPath(member.address) / "user" / "indexBackend") ! IndexerNodeUp } /** * Class to hold election data for a node */ class NodeData { private val votes = scala.collection.mutable.Map[Long, ActorRef]() def addVote(actorRef: ActorRef, ts: Long) = votes.+=(ts -> actorRef) def findOldest = { val oldest = votes.keySet.toSeq.sortWith((a, b) => a < b).head votes(oldest) } def invalidateVotes() = votes.clear() }
The messages handled by the Actor are
import akka.actor.ActorRef case object IndexerNodeUp case object IsMasterElected case class Election(ts: Long) case object ElectionOver case object LeaderElected case object NewLeader case class PreElectedMaster(actorRef: ActorRef) case object DataRequest case object CheckMasterHB case object HBMaster case object MasterHBAck
The entire code is hosted in github . Suggestions & PRs are welcome !!!!!
More to come
Now that we have the algorithm and a skelteton implmentation. I will further extend this post by explaining the Akka code and there by write a small cluster to build a distributed index.
Serf is a service discovery and orchestration tool that is decentralized, highly available, and fault tolerant. Serf runs on every major platform: Linux, Mac OS X, and Windows. It is extremely lightweight: it uses 5 to 10 MB of resident memory and primarily communicates using infrequent UDP messages.
> To provide the best possible streaming experience for our members, it is critical for us to keep the API online and serving traffic at all times. Maintaining high availability and resiliency for a system that handles a billion requests a day is one of the goals of the API team, and we have made great progress toward achieving this goal over the last few months.