Deploying Infisical on-premise with high availability requires deep knowledge in areas like networking, container orchestration, and database management. This guide presents a reference architecture that outlines how to achieve such a deployment effectively. For organizations that do not have the necessary resources or expertise, we recommend opting for managed, dedicated Infisical instances or engaging professional services to mitigate the complexities.

System Overview

On premise architecture

The architecture above utilizes a combination of Kubernetes for orchestrating stateless components and virtual machines (VMs) or bare metal for stateful components. The infrastructure spans multiple data centers for redundancy and load distribution, enhancing availability and disaster recovery capabilities. You may duplicate the architecture in multiple data centers and join them via Consul to increase availability. This way, if one data center is out of order, active data centers will take over workloads.

Stateful vs stateless workloads

To reduce the challenges of managing state within Kubernetes, including storage provisioning, persistent volume management, and intricate data backup and recovery processes, we strongly recommend deploying stateful components on Virtual Machines (VMs) or bare metal. As depicted in the architecture, Infisical is intentionally deployed on Kubernetes to leverage its strengths in managing stateless applications. Being stateless, Infisical fully benefits from Kubernetes’ features like horizontal scaling, self-healing, and rolling updates and rollbacks.

Core Components

Kubernetes Cluster

Infisical is deployed on a Kubernetes cluster, which allows for container management, auto-scaling, and self-healing capabilities. A load balancer sits in front of the Kubernetes cluster, directing traffic and ensuring even load distribution across the application nodes. This is the entry point where all other services will interact with Infisical.

Consul as the Networking Backbone

Consul is an critical component in the reference architecture, serving as a unified service networking layer that links and controls services across different environments and data centers. It functions as the common communication channel between data centers for stateless applications on Kubernetes and stateful services such as databases on dedicated VMs or bare metal.

Postgres with Patroni

The database layer is powered by Postgres, with Patroni providing automated management to create a high availability setup. Patroni leverages Consul for several critical operations:

  • Redundancy: By managing a cluster of one primary and multiple secondary Postgres nodes, the architecture ensures redundancy. The primary node handles all the write operations, and secondary nodes handle read operations and are prepared to step up in case of primary failure.

  • Failover and Service Discovery: Consul is integrated with Patroni for service discovery and health checks. When Patroni detects that the primary node is unhealthy, it uses Consul to elect a new primary node from the secondaries, thereby ensuring that the database service remains available.

  • Data Center Awareness: Patroni configured with Consul is aware of the multi-data center setup and can handle failover across data centers if necessary, which further enhances the system’s availability.

Redis with Redis Sentinel

For caching and message brokering:

  • Redis is deployed with a primary-replica setup.
  • Redis Sentinel monitors the Redis nodes, providing automatic failover and service discovery.
  • Write operations go to the primary node, and replicas serve read operations, ensuring data integrity and availability.

Multi data center deployment

Infisical can be deployed across a number of data centers to both increase performance and resiliency to disaster scenarios. For mission critical deployment of Infisical, we recommend deploying Infisical on at least 3 data centers to reduce downtime in the event of complete data center malfunction.

Data Center A

Data Center A houses the primary nodes of both Postgres and Redis, which handle all write operations. The secondary nodes and replicas serve as hot standbys for failover. Consul servers maintain the state of the cluster, elect a leader, and facilitate service discovery.

nthn^{th} data center

The nthn^{th} data center acts as a performance and disaster recovery site, featuring a mesh gateway that enables cross-data center service discovery and configuration. It houses additional secondary nodes for Postgres and Redis replicas, which are ready to be promoted in case the primary data center fails. Additionally, this data center can reduce the latency of applications that need to interact with Infisical, particularly if those applications or services are geographically closer to this data center.

Considerations

The complexity of an on-premise deployment scales with the level of availability required. This reference architecture provides a robust framework for organizations aiming for high availability and disaster resilience. However, it’s important to recognize that this is not a one-size-fits-all solution.

Organizations with less stringent Recovery Time Objectives (RTO) might find that simpler deployments methods using tools such as Docker Compose are adequate. Such setups can still provide a reasonable level of service continuity without the complexities involved in managing a multi-data center environment with Kubernetes, Consul, and other high-availability components.

Ultimately, the choice of architecture should be guided by a thorough analysis of business needs, available resources, and expertise.