How can manufacturers affordably ensure the availability of ERP systems and databases without compromising performance?
High availability clustering guarantees at least 99.99% uptime for just-in-time inventory management and shipping systems.
By Ian Allton, Solutions Architect at SIOS Technology
In manufacturing, ERP systems built on technologies such as SAP, SAP HANA, Oracle, and SQL Server are essential for centralized management of processes and operations—including production scheduling, just-in-time inventory management, shipping, accounting, and many more. Indeed, the interconnected nature of these ERPs and databases means that a few minutes of downtime for one can be highly disruptive and very costly.
But protecting these applications from disruption raises a variety of questions. Would moving these workloads to a public cloud—AWS EC2, Microsoft Azure, or Google Cloud Platform (GCP)—achieve the desired level of high availability (HA), which is defined as being available 99.99% of the time? Would a configuration designed to ensure HA reduce or compromise application performance? Would running a complex SAP or HANA landscape in an HA environment introduce configuration limitations or add risk in an unforeseen dimension?
The short answer is that poor planning can lead to undesirable outcomes in all these areas. But if a manufacturer understands how best to configure an ERP system and its databases for HA, a cloud-based ERP system can perform as well as—or better than—its on-premises counterparts. And it can do so flexibly, reliably, and without any critical limitations.
Silver Linings in the Cloud
Whether a company has an on-premises IT infrastructure or uses the public cloud, the most common way to protect critical applications, databases, and ERPs is to run them in a failover cluster configuration. In this configuration, the critical workload is run on a primary node and connected to a secondary node or nodes to eliminate single points of failure. Clustering software monitors the “health” of the critical workload and, in the event of a failure, moves operation of that workload to one of the secondary nodes, restoring operation to end users. To provide full HA protection, the nodes of a failover cluster must be spread across at least two separate data centers to protect operation from local and sitewide interruptions—and across two separate geographic regions to create a Disaster Recovery (DR) solution that can protect against regional catastrophes. In cloud environments, this means locating the cluster nodes in different cloud availability zones and regions.
While some companies opt to build out their own multi-data center HA and DR infrastructures, others use AWS, Azure, and GCP in full-cloud or hybrid cloud environments. Note that a cloud SLA of 99.99% availability covers only the high availability of the hardware in the cluster. It does not guarantee that the application or ERP running on that VM will be operational. To achieve operational HA, your cloud infrastructure needs failover clustering software that, in the event of an outage, will automate and orchestrate the immediate failover of your active workloads to the secondary infrastructure.
Failing over critical applications and ERP systems from the primary node to the secondary node is a complex undertaking – requiring orchestration of application-specific requirements for boot order, networking, location of database services, and a variety of best practices unique to each workload.
This kind of automated, orchestrated failover is critical when configuring infrastructure to support complex ERP systems. When clustering software detects that the primary VM is unresponsive, it automatically orchestrates failover to the secondary infrastructure far faster than any human could do manually.
Ensuring Access to Your Data
If you use local storage instead of shared storage for your cluster, you will need data replication services to synchronize data across all nodes. From an HA standpoint, look for replication software that is SAP-certified and supports failover across both AZs and cloud regions. This ensures that data written on the active node is also written immediately to storage attached to the secondary node(s), enabling operations to continue on a secondary node with no data loss.
Tips on Avoiding Failover Altogether
Finally, look for failover clustering software that can help you avoid even needing to failover. Most incidents that prompt a failover have their roots in application and operating system errors (and not the sudden failure of infrastructure). Certain failover clustering solutions are application-aware. That is, they include application and database-specific modules that automate configuration steps to ensure failovers are reliable and maintain application best practices. They are designed to validate configuration inputs, prevent human error, and detect and proactively resolve common issues (such as a stalled process or queue). Proactively preventing these issues from becoming larger issues can prevent them from leading to an issue that triggers a failover.
While deploying a clustering software that will ensure rapid failover is critical, deploying a solution that includes automated application recovery services can increase availability while decreasing the likelihood of downtime and of failover itself.
About the Author
Ian Allton, Solutions Architect at SIOS Technology, has more than 20 years of experience in helping enterprise IT teams implement high availability and disaster protection strategies to protect their mission-critical Linux applications from downtime. Before joining SIOS, Ian worked for Hitachi where he served as principal Technical Consultant in the America’s File and Content Global Services Practice and as Master Pre-Sales Technical Consultant.