The niche for RAC

Recently, I received the following question from a respected colleague about Oracle’s Real Application Clusters (RAC) software…

I found out from Oracle reps that we have a license for RAC but are not using it. Our DBA confirmed we have no RAC databases.  All 11gR2, but non-RAC.

So what are the primary reasons you would use RAC anyway?

My response was lengthy, which wouldn’t surprise anyone who knows me, but in summary I concluded…

Unless they are encountering the scenario where their current server platform cannot scale to the needs of the application, convert the RAC licenses into more licenses for other Oracle products, if possible.

If not possible, leave the RAC licenses unused, write them off.  The least cost-effective course of action would be to employ the RAC licenses just because they’re there, as that would be throwing more good money after bad.

I know it sounds like I’m a real hater, but is there any validity behind that assessment?

Oracle Corporation markets Real Application Clusters (RAC) as both a high-availability (HA) solution and a high-performance solution.  Let’s address both of those sales points…

The reality is that RAC is not high-availability.  With a non-redundant copy of the database, it wasn’t designed for high-availability.  Any HA capabilities that Oracle RAC might seem to provide are better provided using Oracle Data Guard, which unlike RAC is designed with complete database redundancy for high-availability. So, HA as a reason for using RAC must come off the table completely.  If you’re serious about HA, use Oracle Data Guard.

That leaves performance, for which RAC was obviously designed by employing the resources of two or more servers on a single database. However, let’s define what we mean by “performance”.  It is easy to demonstrate that RAC does not increase the performance of an individual transactions or queries; on the contrary, it reduces the performance of an individual transaction by adding waits on “global cache” (a.k.a. “gc” or “gcs”) class events on top of I/O events, and adding waits on “global enqueue” (a.k.a. “ges”) class events to certain concurrency events. It is quite easy to visualize this RAC overhead on standard non-RAC operations by simply looking at the “Performance Graph” of Oracle Enterprise Manager, and noticing the “gray” representing events of class “Cluster” correlating closely to the “blue” representing events in class “User I/O” and “red” representing events in class “Concurrency” in the stacked chart on that page.

While RAC decreases performance by adding overhead in the form of “cluster” waits, it does permit scalability of many more transactions and queries accessing the database by increasing the pool of server resources, by permitting operations to execute on more than one server.

And this is the niche for RAC.

RAC is not HA, RAC decreases the performance of individual operations, yet RAC enables more operations to execute by scaling across individual servers.

Therefore, the *only* technical reason I would ever employ RAC is when my database workload exceeds the maximum capacity of the resources of a single server of my hardware platform, whether that resource is CPU, I/O, memory, or kernel parameters.

The hey-day of RAC was the era of the commodity 32-bit platforms of Windows and Linux, through the turn of the century up to about 2004.  With drastic limitations on the size and configuration of these 32-bit platforms, clustering servers using RAC was a viable method of employing these commodity servers for larger database workloads.  But with the advent of 64-bit Windows and Linux, these servers can scale larger, providing more CPU and memory, reducing the need for clustering dramatically.

Yet, Oracle continues to promote RAC as a general-purpose solution, striving to convince the market that running Oracle database without RAC is entry-level, while RAC is the advanced solution for visionaries to optimize both availability and performance, obtaining total data processing creaminess, avoiding the chewy chunks of degradation.

So, characterizing the employment of unused RAC licenses as “throwing good money after bad” is not negative hyperbole or product bashing, just common-sense if the technical reason for using RAC isn’t relevant.  In addition to the 50% upllift in licensing and support costs, the total cost of ownership (TCO) for RAC includes the additional hardware resources to compensate for the waits on “cluster” events, the additional time and effort by not only database administrators but also systems administrators to install and maintain shared storage, additional networks, clusterware, and RAC databases.

RAC is the right tool for the job, but be sure it is the right job.  The number of occasions when it is needed is quite small, and getting smaller as the state of the art moves forward.