Azure host-caching with Oracle database workloads on Azure VMs

There are almost 2,000 Azure customers running Oracle database workloads on Azure VMs, and that number is growing rapidly all the time.  Running Oracle on Azure is not the first idea to occur to many people, but it should be, as there are several types of VMs appropriate for Oracle workloads within Azure, as well as multiple choices for storage, ranging from inexpensive to high-performance.  Overall, Oracle performs very well in Azure, and there are several great features in Azure to make running Oracle even easier, such as Azure VM Backup (i.e. topic for another post).

One of the most common configurations is running one or more Oracle databases on an Azure VM with premium SSD managed disk.  When attaching managed disk to a VM, one has the option of enabling ReadOnly host-caching or ReadWrite host-caching for the managed disk device.

What is host-caching?  Essentially it is locally-attached SSD on the physical host underlying the virtual machines, and it is documented HERE.  When the VM is allocated to a physical host and running, locally-attached SSD is used with a simple LRU1 caching algorithm for I/O between the VM and the virtualized network-attached managed disk device.  ReadOnly host-caching caches only read I/O from the VM to the managed disk device, while ReadWrite host-caching caches both read and write I/O operations from the VM to the managed disk.

Please note that ReadWrite host-caching is intended either as the default setting for the VM OsDisk, or for data disks using filesystems with barrier functionality enabled, as cited HERE.  It is not a great idea to use ReadWrite host-caching with Oracle database files, because of possible data-integrity issues resulting from lost writes should the VM crash.

But the rest of this article pertains to the use of ReadOnly host-caching for managed disk for Oracle database workloads.

If the Oracle online redo log files are isolated from other database files and reside on their own ReadOnly host-cached disk, then…

Generally each block in online redo log files are written once and then read only once.  ReadOnly host-caching ignores the write altogether and does the work to cache the single read, as it is designed.  However, if there is never second, third, or subsequent read, then the maintenance of the cache chains and the cache buffers performed at the first read is entirely wasted.  Worse still, this happens with every single block written to the online redo log files, over and over again;  the cache is constantly manipulated, consuming compute resources, but to no benefit.  I don’t believe that this activity slows I/O operations for the online redo log files significantly, but simply increases the consumption of compute resources which could otherwise be used productively, which of course means that the overall system is capable of accomplishing less work.

If the Oracle online redo log files are not isolated and reside on the same ReadOnly host-cached disk as other database files…

This is where actual performance decreases can result.  Oracle datafiles, tempfiles, controlfiles, BFILEs, files for external tables, block change tracking files, and Flashback log files do in fact use ReadOnly host-caching appropriately, with numerous reads of the same block after each write, not to mention the buffer caching within the RDBMS itself.  If Oracle online redo log files activity is mixed into the same workload as the Oracle datafiles, tempfiles, controlfiles, etc, then ReadOnly host-caching is forced to cache the online redo log files as well, even though there is zero benefit, along with the rest of the database files.  As a result, blocks stay in the cache for less time due to the constant rollover of buffers enforced by the sequential online redo log files workload, making the ReadOnly cache less effective for all files, and causing more “overhead” as pages/blocks needlessly “flushed” from the cache by the sequentially-written redo need to re-acquired.  This is the scenario where the ReadOnly host-caching is noticeably less effective, and the average I/O latency does not improve.

Also, if using the D_v3 or E_v3 or M series of VMs…

ReadOnly host-caching actually provides slower I/O throughput on these VM instance types than uncached I/O, which is impressively fixed with the Dds_v4 and Eds_v4 series.  For example, this table (below) summarizes from the documentation pages the differences in IOPS and I/O throughput limits of the Standard_E32s_v3 series and the Standard_E32ds_v4 instance types, as illustration…

SizevCPUMemory (GB)Max cached and temp storage IOPSMax cached and temp storage throughput (MBPS)Max uncached disk IOPSMax uncached disk throughput (MBps)
Standard_E32s_v3322566400051251200768
Standard_E32ds_v432256308000193651200768
Comparison of cached throughput limits of the same VM instance type in v3 vs v4

As you can see, I/O throughput for cached storage is lower than the I/O throughput for uncached storage for v3, while a more intuitive result is apparent for v4.

The same is true for other size instance types, of course; I just didn’t display them.

In summary, if you wish to reap the expected benefits of ReadOnly host-caching, do not use D_v3 or E_v3 instance or anything in the M-series instance types.

So in summary, I think these are good rules for using Azure VM host-caching with Oracle database workloads…

  1. Don’t use ReadWrite host-caching with Oracle database files
  2. Isolate the transaction logs from other database files to separate managed disk devices and disable host-caching
  3. Use ReadOnly host-caching for all other Oracle database files
  4. Do not use the D_v3, E_v3, or M-series VMs because host-caching and ephemeral SSD is slow compared to Dds_v4 and Eds_v4 series VMs

Oracle dNFS on NFS v4.1

If you search the Oracle Support website or the internet for information on Oracle’s direct NFS (dNFS), you’ll find a decent amount of useful information, including coded examples. However, if you look a little more closely, all of those examples show NFS v3.0 in use, and do not show examples using higher versions of NFS.

Oracle19c documentation states that Direct NFS Client supports NFSv3, NFSv4, NFSv4.1, and pNFS protocols to access the NFS server, but you have to search more deeply to learn that Oracle v11.2 supports only NFS v3.0, while v4.0, v4.1, and parallel NFS (pNFS) are supported from Oracle v12.1 and higher.

The same 19c documentation page also states that Direct NFS Client can obtain NFS mount points either from the operating system mount entries, or from the oranfstab file, but it doesn’t state the important detail that obtaining NFS mount point information is available only with NFS v3.0, while the oranfstab configuration file is required for higher versions of NFS. This is useful information that should be documented, but is not.

Using dNFS with all versions of NFS requires running the command make -f ins_rdbms.mk dnfs_on in the $ORACLE_HOME/rdbms/lib subdirectory, as well as ensuring that the oradism executable in the $ORACLE_HOME/bin subdirectory has is owned by root and has SETUID file permissions.

But creating an oranfstab configuration file is only optional with NFS v3.0; the dNFS client can obtain information about NFS mount points from the standard UNIX/Linux /etc/mtab file. However, for NFS v4.x, it is required to create the oranfstab file, either in the global /etc directory or in the installation-specific $ORACLE_HOME/dbs subdirectory.

Documentation for the contents of the oranfstab file is found HERE, but there is another important detail that is not included in the documentation: the ordering of the attributes, listed as name/value pairs, matters.

For example, if the contents of the oranfstab file looks like this, with the properties ordered as follows, then all works well…

server: anfv41
local:  10.0.0.4
path:   10.1.0.4
export: /ora-anfv41-vol01 mount: /u02
nfs_version: NFSv4.1
security_default: sys

However if, for example, the path and local properties were specified in reversed order, then dNFS does not work properly…

server: anfv41
path:   10.1.0.4
local:  10.0.0.4
export: /ora-anfv41-vol01 mount: /u02
nfs_version: NFSv4.1
security_default: sys

The server attribute must always be specified first as the “header” of the following attributes describing a specific mount-point. Of course, the oranfstab file can contain information about multiple mount points, each of which is initiated by the server attribute.

The local attribute must come next, followed by the path attribute.  The export and mount attributes must be on the same line, following path.  I have tried different variations of ordering for the nfs_version and security_default attributes with no adverse effect, but it seems logical to specify them last, and the order they are specified doesn’t seem to matter.

As you can imagine, this behavior can easily lead to insidious problems. When dNFS is misconfigured, the result is not that dNFS is disabled, but rather that dNFS is enabled but far from optimally. In general, performance is abysmal, resulting in cases I’ve observed where a database startup which normally completes in seconds taking upwards of 30 minutes, effectively rendering the database unusable.

In summary, upgrading from NFS v3.0 to NFS v4.1 is a good thing to do today, especially for the improvements in security (i.e. encryption, authentication, parallel NFS, etc), as well as the improvements in stability (i.e. integrated locking with leases, etc) and simplicity (i.e. more firewall-friendly by using fewer ports, etc).

Best of all, even though Oracle is light on documentation for higher versions of NFS, you don’t have to abandon Oracle’s direct NFS (dNFS) while taking advantages of the improvements offered by NFS v4.1.

But it would be nice if Oracle documented all of this more coherently.

Delphix XPP explained

This article was originally posted on the Delphix Support blog on 15-Nov 2015, but with the deprecation of the XPP feature with the new 6.0 release and higher, it was decided best to remove this article.

So, I have saved it and posted it here instead, almost 4 years to the day after it was originally posted…

The topic of converting Oracle databases from one of the proprietary
UNIX platforms (i.e. Solaris, AIX, or HP-UX) to Linux seems at first pretty
esoteric and far-fetched.

Meh. Right?

Plus, a lot of folks who have used Solaris, AIX, and HP-UX over the years can
argue that those operating systems have far more capabilities and
technical advantages than Linux, and they may be absolutely correct.  Who wants to get caught up in a religious debate?

Meh again.  Right?

But interestingly enough, the big issue here is not the relative technical merit.

The Cloud is about the commoditization of servers and services.  Unless you are using the hosting services or Cloud services offered by the operating system vendor themselves (i.e. Oracle for Solaris, IBM for AIX, or by HP or HP-UX), then the only operating systems being offered by hosting or Cloud vendors are either Linux or Windows on Intel x86 or x86_64.

So, converting an Oracle database from UNIX to Linux is a prerequisite to moving it to the Cloud.  Betcha didn’t think of it that way, eh?

This is why Delphix uses the term modernization to describe the capability of cross-platform provisioning (i.e. XPP), also known as UNIX-to-Linux (i.e. U2L).  It is not because Linux on x86/x86_64 platforms are any more modern than Solaris on SPARC, AIX on Power, or HP-UX on PA-RISC/Itanium, but because modernization involves moving the data center to hosting or the Cloud.

I’m not willing to engage in an argument about which platform has more technical merit than one another.  However, I will assert that if you’re not running Oracle on Linux, then you are not positioned for the future, just based on the economics.  If you really think technical merit beats economics, then I have a nice VAX running VMS for you.

So, Delphix XPP a.k.a. U2L is the on-ramp to the Cloud.  I prefer the term “XPP” rather than “U2L”, so I’m going to use that going forward…

Now let’s be clear on a few things.  Only Oracle can write the software to migrate an existing Oracle database from one operating system to another, from one “endian” chipset to another.  So Delphix has not, and could not, write the basic processes to perform conversion, because Oracle datafile formats are themselves proprietary.  But what Delphix has done is take the processes for conversion created by Oracle and actually made them feasible.

Here’s how…

The capability of cross-platform migration was introduced to Oracle RMAN in Oracle10g database release 1 (i.e. 10.1).  This was the CONVERT DATABASE functionality, and it was only capable of converting same-endian chipsets.

The terms “big endian” and “little endian” were invented in Jonathan Swift’s satirical novel “Gulliver’s Travels”, first published in 1726.

The term refers to a civil war in the novel between the people of Lilliput, some of whom ate their soft-boiled eggs by cracking the blunt or “big” end of an egg, others who ate their soft-boiled eggs by cracking the pointy or “little” end of an egg.  This wickedly sharp satire accurately portrays controversy and conflicts over the choices of ordering of bytes within words by processor manufacturers in the 1960s and 1970s, which have subsequently been propagated forward to this day.

-Jonathan Swift “Gulliver’s Travels”, 1726

All three of the proprietary UNIX variants (i.e. Solaris on SPARC, AIX on Power, and HP-UX on both PA-RISC and Itanium) are big-endian.  All x86 and x86_64 chips are little-endian, so XPP involves converting numerics from big-endian to little-endian.

In Oracle10 release 2 (i.e. 10.2), Oracle added the CONVERT TABLESPACE and CONVERT DATAFILE functionality which permit conversion between endian platforms (i.e. big to little or little to big).

So, at this point in time, XPP is functionality that has existed in Oracle for about 10 years now, or as long as Oracle10gR2 has been available.

But XPP hasn’t been a frequently-used feature over that time, as you are no doubt aware.  I know lots of people have known about it, but very few have actually done it.  And even fewer have done it outside of playgrounds, out in real-life databases.

There are two reasons for this: 1) the trend toward commoditization of x86/x86_64 is only now accelerating with the maturation of the Cloud and 2) there are substantial obstacles.

The most substantial obstacle is the fact that the functionality involving endian conversion (i.e. CONVERT DATAFILE/TABLESPACE) also includes Oracle’s transportable tablespace (a.k.a. TTS) functionality.  TTS requires that the source database be in read-only state during any tablespace transport operations.

Now, if you’re trying to convert a Linux copy of a UNIX production database, you don’t want to have to interrupt service in the production database by making it read-only.  In the IT biz, we call that down time and it is bad.  Instead, what you’re going to have to do is create a full copy of the UNIX production database, and then make that copy read-only.

This sounds innocuous enough, but let’s think that through?  If our production database on UNIX is several terabytes or larger, and we wish to simply test the conversion to Linux using CONVERT DATAFILE / TABLESPACE, then we need enough space for two additional copies of the production database:  one for the source UNIX copy (to be set to read-only) and one for the target converted Linux copy.  To paraphrase an old saying, “A couple terabytes here and a couple terabytes there, and pretty soon you’re talking real storage“.  And of course, it is not just about allocating twice the volume of the production storage for this test, there is also the time involved in copying two complete sets of the production database.  Needless to say, it is not easy to enter into this exercise lightly.  Nor repeatedly.  And so, this 10 year old functionality remains generally a big mystery.

Enter Delphix data virtualization.

Delphix can create a virtual database (VDB) for the read-only UNIX copy of the production database, requiring only a few minutes to completely provision regardless of the size of the database.  Of course, practically no additional storage is required for this virtual database which is set to read-only almost immediately and stays read-only throughout its brief lifetime.

Even better, Delphix can also create a VDB for the converted Linux copy of the database, and because the Delphix file-system is able to transparently deduplicate (or mooch) blocks that contain the same contents but in different endian format, even the converted Linux copy of the database consumes very little storage as a result of the conversion.

To use numbers from actual experience, converting a 5 TB production UNIX database without Delphix requires the allocation and copying of 10 TB of additional storage.  Converting the same 5 TB production UNIX database using Delphix requires less than 12 GB of additional storage.

Please note the change from “TB” to “GB” in that last sentence? That is an order of magnitude difference.

Admittedly, while the storage savings are consistent, the amount of time needed to perform XPP potentially has one or both of two long-running operations, which is one fewer long-running operation than when not using Delphix.  Those long-running operations are…

  1. Copying the production UNIX database to create the non-production UNIX database copy
  2. Copying database metadata for TTS operations from the non-production UNIX database copy to the converted Linux database
  3. Using the RMAN CONVERT DATAFILE command to convert all datafile blocks from UNIX to Linux

Item #1 is a long-running and expensive step only without Delphix XPP; with XPP, it is fast and inexpensive.

Item #2 is only an issue for certain databases supporting enormous applications like Peoplesoft which contain millions of objects like tables and indexes.  For most applications, there are only several thousand objects, so the single-threaded export/import of database metadata is not a concern

Item #3 might be an issue if the database is very large, because the RMAN CONVERT DATAFILE operation is converting every single block in the database.

Both items #2 and #3 could issues regardless of whether Delphix XPP is used or not.

So please understand from the beginning that XPP might be a long-running process because of the one or two long-running steps.  At least, with Delphix XPP there are only the one or two potentially long-running steps, because without Delphix XPP there is always one more long-running step.

Also, please understand that without the orchestration provided by Delphix, there are about a dozen non-trivial smaller steps to be performed manually in addition to the 2-3 long-running steps mentioned above.  For example, after the RMAN CONVERT DATAFILE processes are complete, there are several small post-processing tasks, such as recompiling all PL/SQL objects in the database, and so on.  Doing these steps manually as documented by Oracle is prone to human error, and the additional time and effort of automating these steps is strongly recommended without Delphix.

From a cookbook standpoint, there are two major phases of a Delphix XPP operation:  1) validation and 2) conversion, as documented online here.

The validation step, or validating an Oracle dSource for XPP, is documented online here.  It automates the creation of a UNIX database copy to be set to read-only so that validation procedures like the CHECK_DB and CHECK_EXTERNAL procedures in the DBMS_TDB package and the TRANSPORT_SET_CHECK procedure in the DBMS_TTS package can be executed automatically.  These procedures will fail if the database is not in read-only mode, and will verify whether the database contains any of the many restrictions listed in Oracle documentation for conversions online here and for transporting tablespaces online here.  Validation with Delphix XPP is fast and easy, so it can be performed over and over, to ensure that any restrictions encountered have been addressed and eliminated, one way or the other.  Delphix XPP also allows the upload of a SQL script to fix issues identified during validation, if it is not feasible to fix the problem in the source production database itself.  The creation and upload of scripts to fix identified XPP issues is documented online here.

The conversion step, or provisioning a Linux VDB from a UNIX dSource, is fully automated and is documented online here.  Conversion is possible only after validation is successfully completed.  If a SQL script was uploaded to address restrictions encountered during validation, then that script should continue to be used during conversion.  During conversion, Delphix automates the many small and large steps, including the provisioning of a read-only UNIX database copy, the export/import of database metadata, and the RMAN CONVERT DATAFILE processes.

So you can see that a lot of blood, sweat, and tears have been put into this feature.

XPP is another example of the enabling power of data virtualization.  Without it, Oracle cross-platform migration is a complicated, expensive, and slow process with many manual steps.  With Delphix data virtualization, XPP is simple and inexpensive.  It is still not a fast process due to the long-running steps described earlier, but even so it may be fast or easy enough to become feasible as your organization moves toward migration into the Cloud.

Oak Table World 2018

Oak Table World 2018 (OTW) just completed at the Children’s Creativity Museum in San Francisco.  The event website is “https://otw18.org“.

This year, it was my turn to organize this remarkable little un-conference on behalf of the Oak Table Network, which is often described as a drinking society with an Oracle problem.

What is an “un-conference”?  The dictionary definition is…

un·con·fer·ence   /ˈənkänf(ə)rəns/

noun: unconference; plural noun: unconferences

a loosely structured conference emphasizing the
informal exchange of information and ideas between
participants, rather than following a conventionally
structured program of events. Generally this manifests
as a "participant-driven" event.

OTW started out in 2010 with Mogens Norgaard holding a set of tables in the back room of the Chevy’s Tex-Mex Bar & Grill in San Francisco (since closed), just across 3rd St from the Moscone Center where Oracle OpenWorld was being held.

Rather than drinking from the flood pouring from the corporate marketing machines across the street, friends joined Mogens for beer and Tex-Mex to discuss and argue technical topics and drink beer.

Thus was born “Oracle ClosedWorld” as a true unconference.  The following year in 2011, Oracle ClosedWorld was held in the upstairs room at The Thirsty Bear, a few doors down from Chevy’s on Howard St, this time with an agenda and a more structured format.

However, the ever-vigilant legal department at Oracle Corporation was not in the least amused by the gentle gibe of the name “Oracle ClosedWorld” from a bunch of drunk geeks, and so after the event they quickly regurgitated a letter offering detailed demands and dire consequences, and so of course we sweetly complied, and the unconference was renamed to “Oak Table World” (OTW).  To date, Oracle’s legal team to not impose their will on our continued use of the word “World”, perhaps at least until Oracle achieves actual world domination.  So we wait with bated breath for their next move.

The following year, in 2012, Oak Table World found a new home at the Children’s Creativity Museum, which is located at the corner of 4th Street and Howard Street, smack dab in the middle of the Moscone complex.  The unconference was then organized by Kyle Hailey, who continued to do so in 2013, 2014, and 2015.  In 2016, it was organized by Kellyn Pot’vin-Gorman, in 2017 by Jeremiah Wilton, and in 2018 by Tim Gorman.

The event seems to be slipping away from its original un-conference format, becoming more professionalized, becoming more of a conference-within-a-bigger-conference.  I’m not sure if that is a good thing or a bad thing, but once you start laying out money for a venue and catering, things get more conventional pretty quickly.

Achieving a balance


 

Please think this through carefully?

Cite a single occasion when a woman did not receive shaming, threats of job termination, counter-accusations, death threats, and personal shunning after she finally found herself unable to continue in the abrasive hell created for her, solely because she appealed to (or didn’t appeal to) a self-obsessed self-righteous forever-child?

 

The assertion that women are doing this to “enrich themselves” or “because they want to” is simply ludicrous.  Just look at the facts…

 
  • Who ever benefited financially from bringing harassment charges?  Nobody.
  • Who wins? Nobody.
  • What is gained? Justice, nothing more.
  • Who loses? Everybody, at one time or another.
It has always been easier to just bury it and move on, which has been the time-worn advice always given, and which is why so many of these atrocities have gone unresolved for so long.  Many will remain unresolved forever.  Whether they remain unresolved should be the decision of the victim — not the state, not the employer, and certainly not the perpetrator.
 
The courage required to speak out and live with the consequences is staggering. Those who do so are heroes, nothing less.  These charges are not brought frivolously, but after endless soul-searching.
 
Women simply want to do their jobs just as men are able, without fear or favor, and with recourse other than quitting or moving on when a man targets her.
 
Rape culture, which blames victims as well as perpetrators, forcing women to adapt rather than forcing men to be human, is the reason so many feel outrage at recent developments.
 
Are you an opponent, or an advocate, of rape culture?  Can you prove so by word or deed, and not just by a knee-jerk reaction to the question?
 
What is happening is long overdue, and hopefully will finally achieve a real and lasting balance.

Automating Stored Procedures When Refreshing Virtual SQL Server Databases

So… you’re using a Delphix virtualization engine to provision SQL Server virtual databases (VDBs), but you’re finding that your SQL account passwords are being reset to those used in production after every refresh?  Likewise, all of the settings in the application supported by the database?

Provisioning a new virtual databases is easy.  Go into the Delphix administrative console, find the dSource you want, find the snapshot for the point-in-time you want on that dSource, click the Provision button, specify the database server where the new database should reside, and hey presto!  In 10 minutes or less, a new virtual database is ready for use.

Except not quite…

That new VDB still has all of the data from the source database, and if that source database is production, then that means that it still has production passwords.  And it still has confidential data.  All of that has to be fixed before the database can be used by developers or testers.

So, even though you can now clone from production in minutes instead of hours or days, you still have to do the same post-clone and post-refresh processing tasks you’ve always had to do prior to Delphix.  Now, you just do them immediately rather than later.

If you’re willing to automate those tasks, whether in T-SQL stored procedures or in Windows Powershell scripts, then Delphix can help by embedding them as part of the operations of provision or refresh.

Delphix offers hooks, which are programmatic callouts which fire before and after certain actions by the Delphix virtualization engine, such as as a refresh action…

Hooks provide the ability to execute Powershell code on the target Windows server as the Windows domain account registered with Delphix, either before or after the successful completion of an action.  Here is what the form for adding hooks within a VDB looks like…

Here are the actions for which hooks can be entered, also seen listed in the screenshot above…

  • Provision
    • Configure Clone hook fires after the action completes
  • Refresh

    • Pre-Refresh hook fires before the action begins
    • Post-Refresh hook fires after the action completes successfully
    • Configure Clone hook fires after the Post-Refresh hook completes successfully
  • Rewind

    • Pre-Rewind hook fires before the action begins
    • Post-Rewind hook fires after the action completes successfully
  • Snapshot
    • Pre-Snapshot hook fires before the action begins
    • Post-Snapshot hook fires after the action completes successfully
  • Start
    • Pre-Start hook fires before the action begins
    • Post-Start hook fires after the action completes successfully
  • Stop

    • Pre-Stop hook fires before the action begins
    • Post-Stop hook fires after the action completes successfully

So back to the problem at hand…

We want some actions to take place automatically each time we refresh our virtual database (VDB).  As it turn out, we have two Transact-SQL stored procedures already coded to do the job…

  1. stored procedure MSDB.DBO.CAPTURE_ACCTS_PASSWDS
    • Saves all of the current accounts and account passwords to a set of tables in the MSDB system database
  2. stored procedure MSDB.DBO.REAPPLY_ACCTS_PASSWDS
    • Re-applies all of the current accounts and account passwords from the information previously stored in the MSDB system database

“@DatabaseName”, which is the name of the VDB, is the only parameter for these stored procedures.

I haven’t posted the T-SQL code for these stored procedures, partly because it is always going to be very customized to its environment, but mostly because I am not proficient with T-SQL myself, and I would just be copying someone else’s code for a time.

So looking at our list of Delphix hooks, it should be clear that we need to call the CAPTURE_ACCTS_PASSWDS stored procedure during the Pre-Refresh hook, and call the REAPPLY_ACCTS_PASSWDS stored procedure during the Post-Refresh hook.  Since hooks only call Powershell code and not T-SQL, here is some Powershell code we can use…

#======================================================================
# File:      callsp.ps1
# Type:      powershell script
# Author:    Delphix Professional Services
# Date:      02-Nov 2015
#
# Copyright and license:
#
#       Licensed under the Apache License, Version 2.0 (the "License");
#       you may not use this file except in compliance with the
#       License.
#
#       You may obtain a copy of the License at
#     
#               http://www.apache.org/licenses/LICENSE-2.0
#
#       Unless required by applicable law or agreed to in writing,
#       software distributed under the License is distributed on an
#       "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
#       either express or implied.
#
#       See the License for the specific language governing permissions
#       and limitations under the License.
#     
#       Copyright (c) 2015 by Delphix.  All rights reserved.
#
# Description:
#
#    Call the appropriate stored procedure within the DBO schema in
#    the MSDB database on behalf of the VDB.  The stored procedure
#    name the name of the database as a parameter called
#    "@DatabaseName"..
#
# Command-line parameters:
#
#    $fqSpName        fully-qualified stored procedure name
#
# Environment inputs expected:
#
#    VDB_DATABASE_NAME    SQL Server database name for the VDB
#    VDB_INSTANCE_NAME    SQL Server instance name for the VDB
#    VDB_INSTANCE_PORT    SQL Server instance port number for the VDB
#    VDB_INSTANCE_HOST    SQL Server instance hostname for the VDB
#
# Note:
#
# Modifications:
#    TGorman    02nov15    first version
#======================================================================
param([string]$fqSpName = "~~~")
#
#----------------------------------------------------------------------
# Verify the "$dirPath" and "$fqSpName" command-line parameter
# values...
#----------------------------------------------------------------------
 if ( $fqSpName -eq "~~~" ) {
     throw "Command-line parameter 'fqSpName' not found"
 } 
#
#----------------------------------------------------------------------
# Clean up a log file to capture future output from this script...
#----------------------------------------------------------------------
 $dirPath = [Environment]::GetFolderPath("Desktop")
 $timeStamp = Get-Date -UFormat "%Y%m%d_%H%M%S"
 $logFile = $dirPath + "\" + $env:VDB_DATABASE_NAME + "_" + $timeStamp + "_SP.LOG"
 "logFile is " + $logFile
#
#----------------------------------------------------------------------
# Output the variable names and values to the log file...
#----------------------------------------------------------------------
 "INFO: dirPath = '" + $dirPath + "'" | Out-File $logFile
 "INFO: fqSpName = '" + $fqSpName + "'" | Out-File $logFile -Append
 "INFO: env:VDB_INSTANCE_HOST = '" + $env:VDB_INSTANCE_HOST + "'" | Out-File $logFile -Append
 "INFO: env:VDB_INSTANCE_NAME = '" + $env:VDB_INSTANCE_NAME + "'" | Out-File $logFile -Append
 "INFO: env:VDB_INSTANCE_PORT = '" + $env:VDB_INSTANCE_PORT + "'" | Out-File $logFile -Append
 "INFO: env:VDB_DATABASE_NAME = '" + $env:VDB_DATABASE_NAME + "'" | Out-File $logFile -Append
#
#----------------------------------------------------------------------
# Housekeeping: remove any existing log files older than 15 days...
#----------------------------------------------------------------------
 "INFO: removing log files older than 15 days..." | Out-File $logFile -Append
 $ageLimit = (Get-Date).AddDays(-15)
 $logFilePattern = $env:VDB_DATABASE_NAME + "_*_SP.LOG"
 "INFO: logFilePattern = '" + $logFilePattern + "'" | Out-File $logFile -Append
 Get-ChildItem -Path $dirPath -recurse -include $logFilePattern |
     Where-Object { !$_.PSIsContainer -and $_.CreationTime -lt $ageLimit } |
     Remove-Item
#
#----------------------------------------------------------------------
# Run the stored procedure...
#----------------------------------------------------------------------
 "INFO: Running stored procedure '" + $fqSpName + "' within database '" +
     $env:VDB_DATABASE_NAME + "'..." | Out-File $logFile -Append
 try {
     "INFO: open SQL Server connection..." | Out-File $logFile -Append
     $sqlServer = $env:VDB_INSTANCE_HOST + "\" + $env:VDB_INSTANCE_NAME + ", " + $env:VDB_INSTANCE_PORT
     "INFO: sqlServer = '" + $sqlServer + "'" | Out-File $logFile -Append
#     [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | Out-Null;
     $conn = New-Object System.Data.SqlClient.SqlConnection
     $conn.ConnectionString = "Server=$sqlServer; Database=MSDB; Integrated Security=SSPI;"
     "INFO: conn.ConnectionString = '" + $conn.ConnectionString + "'" | Out-File $logFile -Append
     $conn.Open()
     $cmd1 = New-Object System.Data.SqlClient.SqlCommand($fqSpName, $conn)
     $cmd1.CommandType = [System.Data.CommandType]::StoredProcedure
     $cmd1.Parameters.Add('@DatabaseName', $env:VDB_DATABASE_NAME) | Out-null
     "INFO: calling " + $fqSpName + ", @DatabaseName = " + $env:VDB_DATABASE_NAME | Out-File $logFile -Append
     $exec1 = $cmd1.ExecuteReader()
     $exec1.Close()
     $conn.Close()
 } catch { Throw $Error[0].Exception.Message | Out-File $logFile -Append }
 #
 "INFO: completed stored procedure '" + $fqSpName + "' within database '" +
     $env:VDB_DATABASE_NAME + "' successfully" | Out-File $logFile -Append
#
#----------------------------------------------------------------------
# Exit with success status...
#----------------------------------------------------------------------
exit 0

The formatting of the code is a little constrained by the narrow viewing area, so if you want the actual code correctly formatted, it can be downloaded from HERE.

Of course, this code does more than just calling the stored procedure with the database name as the sole parameter;  it is also qualifying command-line parameters, creating and updating a log file, and handling possible error conditions.

Once this script is saved on the target Windows host server where the VDB resides, then we can call it from the Pre-Refresh and Post-Refresh hooks.

Here we see the Pre-Refresh hook being called with a call to the CALLSP.PS1 powershell script located within the C:\DELPHIX\SCRIPTS directory on the target Windows server, to call the CAPTURE_ACCTS_PASSWDS stored procedure…

Likewise, we see here how the Post-Refresh hook is constructed to call the REAPPLY_ACCTS_PASSWDS stored procedure…

And finally, with both hooks created, we can see them in the Configuration panel together like this…

So now, whenever the refresh operation is performed on the VDB named VV11, these hooks will execute the Powershell script, which will in turn execute the specified stored procedures, and so when the refresh operation is complete, the account and password settings which were present prior to the refresh operation will still be present.

I hope this is useful?


This post can also be found on the Delphix website HERE


If you have any questions about this topic, or any questions about data virtualization or data masking using Delphix, please feel free to contact me at “tim.gorman@delphix.com”.

SQL Saturday 588 – May 20 in NYC

I’m really excited about speaking at SQL Saturday #588 at the Microsoft office in Times Square, I’ll be discussing Accelerating TDM Using Data Virtualization.  This is my favorite topic, because for the 20+ years that I worked as a database administrator (DBA), I ran into the same challenges time and again.

The challenges of data volume.

More than 20 years ago, a friend named Cary Millsap was asked to define the term very large database, because its acronym (VLDB) had become popular.  Knowing that numbers are relative and that they obsolesce rapidly, he refused to cite a number.  Instead, he replied, “A very large database is one that is too large for you to manage comfortably“, which of course states the real problem, as well as the solution.

The problem of data volume has, if anything, become more severe, and there is no sign that it will abate and allow storage and networking technology to catch up.

So it becomes necessary to cease doing things the way we’ve done them for the past several decades, and find a more clever way.

And that is why I work for the solution.  Delphix data virtualization addresses the root of the problem, providing a far faster, less expensive way to provision and refresh huge volumes of data.  The result is revolutionary for the beleaguered software testing industry.

Every mishap or delay in the software development life-cycle means time pressure on the last step:  testing.  Creating full-size and fully-functional copies of the environment to be tested is also time-consuming, putting more time pressure on testing to prevent the slippage of delivery dates.  The end result is untested or poorly tested systems in production, despite heroic efforts by under-appreciated testers.

Constraining the creation of test data is data volume, which is growing beyond the capability of most enterprises.  “Storage is cheap” is not merely a cheap lie to dismiss the problem without solving it, but it is irrelevant and beside the point.

The real issue is time, because it takes a lot of time to push terabytes around from place to place.  It is just physics.  It is time which is most expensive and, by the way, storage really isn’t cheap, especially not in the quality that is demanded, and certainly not in the quantities in which it is being demanded.

Provisioning a full environment for each tester, one for each task, of each project, seems unrealistic when each environment might require terabytes of storage.  As a result, testers are limited to working in shared DEV or TEST environments which are refreshed only every few months, if ever.  Code quality suffers because testing is incomplete, and the pace of delivery fails to live up to business needs.

Server virtualization unleashed a surge of productivity in IT infrastructure over the past ten years.  But each virtual server still requires actual storage, making storage and data the constraint which holds everything back.

Data virtualization is the solution.

Come learn how to accelerate testing and how to accelerate the pace of innovation.

Come learn how to inspire more A-ha! moments by liberating QA and testing, and eliminating the constraint of data under which the software development life-cycle — from waterfall to agile — has labored over the past decades.

Liberate testing now!

SQL Saturday – 25-March 2017 – Colorado Springs CO

I’m really excited about speaking at SQL Saturday on my favorite topic!

SQLSaturday is a free training event for Microsoft Data Platform professionals and those wanting to learn about SQL Server, Business Intelligence and Analytics.  I’m all for that!

At this event, I’ll be discussing Accelerating DevOps and TDM Using Data Virtualization.  This is my favorite topic, because for the 20 years that I worked as a DBA, I ran into the same roadblock time and again.  The roadblock of data volume.

“DevOps” is a conflation of parts of the words “development” and “operations”, and it represents the grass-roots movement to merge application development, application testing, and IT operations into one continuous stream.  All tasks from coding to testing to operations must be automated so that new features and fixes can be delivered on a continual flow.

“TDM” is Test Data Management, the optimization of software quality testing to ensure that applications are built and maintained to operate according to their specifications.

Constraining both these goals is data volume, which is growing beyond the capability of most enterprises.  “Storage is cheap” is not merely a cheap lie to dismiss the problem without solving it, but it is irrelevant and beside the point.

The real issue is time, because it takes a lot of time to push terabytes around from place to place; it is just physics.  It is time which is most expensive and, by the way, storage really isn’t cheap, especially not in the quality that is demanded, and certainly not in the quantities in which it is being demanded.

Provisioning a full environment for each developer or tester, one for each task, of each project, seems unrealistic when each environment might require terabytes of storage.  As a result, developers and testers are limited to working in shared DEV or TEST environments which are refreshed only every few months, if ever.  Code quality suffers, testing suffers, and the pace of delivery fails to live up to business needs.

Server virtualization unleashed a surge of productivity in IT infrastructure over the past ten years.  But each virtual server still requires actual storage, making storage and data the constraint which holds everything back.

Data virtualization is the solution.

Come learn how to accelerate development, how to accelerate testing, and how to accelerate the pace of innovation.

Come learn how to inspire more “a-ha!” moments by eliminating the constraint of data under which the software development lifecycle has labored over the past decades.

Singing The NoCOUG Blues

This is a re-post of an interview between myself and Iggy Fernandez, editor of the Journal of the Northern California Oracle Users Group (NoCOUG), Oracle ACE, OakTable member, blogger, and simply an amazing person.  The interview starts on page 4 of the August 2014 issue of the NoCOUG Journal, and demonstrates how a gifted interviewer can make someone being interviewed more interesting and coherent.

Singing The NoCOUG Blues

You are old, father Gorman (as the young man said) and your hair has become very white. You must have lots of stories. Tell us a story!

Well, in the first place, it is not my hair that is white. In point of fact, I’m as bald as a cue ball, and it is my skin that is pale from a youth misspent in data centers and fluorescent-lit office centers.

It is a mistake to think of wisdom as something that simply accumulates over time. Wisdom accumulates due to one’s passages through the world, and no wisdom accumulates if one remains stationary. It has been said that experience is what one receives soon after they need it, and experience includes both success and failure. So wisdom accumulates with experience, but it accumulates fastest as a result of failure.

About four years ago, or 26 years into my IT career, I dropped an index on a 60 TB table with 24,000 hourly partitions; the index was over 15 TB in size. It was the main table in that production application, of course.
Over a quarter-century of industry experience as a developer, production support, systems administrator, and database administrator: if that’s not enough time to have important lessons pounded into one’s head, then how much time is needed?

My supervisor at the time was amazing. After the shock of watching it all happen and still not quite believing it had happened, I called him at about 9:00 p.m. local time and told him what occurred. I finished speaking and waited for the axe to fall—for the entirely justified anger to crash down on my head. He was silent for about 3 seconds, and then said calmly, “Well, I guess we need to fix it.”

And that was it.

No anger, no recriminations, no humiliating micro-management. We launched straight into planning what needed to happen to fix it.

He got to work notifying the organization about what had happened, and I got started on the rebuild, which eventually took almost 2 weeks to complete.


Side note…

I didn’t explain how we rebuilt the index within the NoCOUG article, but I did so recently in a recent post to the ORACLE-L email forum…

Here is a high-level description of what worked to rebuild it…

1) Run “create partitioned index … unusable” to create the partitioned index with all partitions empty.

2) Create a shell-script to run NN SQL*Plus processes simultaneously, where “NN” is a number of your choice, each process doing the following…

alter index <index-name> rebuild partition <partition-name> parallel <degree> nologging compute statistics

We ordered the SQL*Plus calls inside the shell-script so that the partitions for the most-recent partitions (i.e. the table was partitioned by a DATE column) were populated first, and then let the builds progress back in time.  Depending on the application, you can be doing some or all of the normal activities on the table.  Our assumption (which proved correct) was that all DML occurs against the newest partitions, so those were the partitions that needed to be made “usable” first.

This approach won’t eliminate downtime or performance problems, but it will likely minimize them.


It truly happens to all of us. And anyone who pretends otherwise simply hasn’t been doing anything important.

How did I come to drop this index? Well, I wasn’t trying to drop it; it resulted from an accident. I was processing an approved change during an approved production outage. I was trying to disable a unique constraint that was supported by the index. I wanted to do this so that a system-maintenance package I had written could perform partition exchange operations (which were blocked by an enabled constraint) on the table. When I tested the disabling of the constraint in the development environment, I used the command ALTER TABLE … DISABLE CONSTRAINT and it indeed disabled the unique constraint without affecting the unique index. Then I tested the same operation again in the QA/Test environment successfully. But when it came time to do so in production, it dropped the index as well.

Surprise!

I later learned that the unique constraint and the supporting unique index had been created out of line in the development and QA/test environments. That is, first the table was created, then the unique index was created, and finally the table was altered to create the unique constraint on the already-existing unique index.

But in production, the unique constraint and the supporting unique index had been created in-line. When the table was created, the CREATE TABLE statement had the unique constraint within, along with the USING INDEX clause to create the unique index.

So when I altered the table in production, disabling the constraint also caused the index to be dropped.

After the mishap, I found the additional syntax for KEEP INDEX, which could have been added to the end of the ALTER TABLE … DISABLE CONSTRAINT command because Oracle recognized the difference in default behaviors.

But that was a discovery I experienced after I needed it.

As to why my supervisor was so calm and matter-of-fact throughout this disaster, I was not surprised; he was always that way, it seemed. What I learned over beers long after this incident is that, in his early life, he learned the true meaning of the words “emergency” and “catastrophe.”

He was born in Afghanistan, and he was a young child during the 1980s after the Soviets invaded. His family decided to take refuge in Pakistan, so they sought the help of professional smugglers, similar to what we call “coyotes” on the Mexican-American border. These smugglers moved through the mountains bordering Afghanistan and Pakistan at night on foot, using camels to carry baggage and the very old, the sick and injured, and the very young.

My supervisor was about 9 years old at the time, so the smugglers put him on a camel so he would not slow them down. During the night, as they were crossing a ridge, they were spotted by the Soviets, who opened fire on them using machine guns with tracer bullets. Everyone in the caravan dove to the ground to take cover. Unfortunately, they all forgot about the 9-year-old boy on top of the 8-foot-high camel. My supervisor said he saw the bright tracer bullets arching up toward him from down below in the valley, passing over his head so close that he felt he could just reach up and grab them. He wanted to jump down, but he was so high off the ground he was terrified. Finally, someone realized that he was exposed and they pulled him down off the camel.

As he told this story, he laughed and commented that practically nothing he encountered in IT rose to the level of what he defined as an emergency. The worst that could happen was no more catastrophic than changing a tire on a car.

I’ve not yet been able to reach this level of serenity, but it is still something to which I aspire.

We love stories! Tell us another story!

A little over 10 years ago, I was working in downtown L.A. and arrived in the office early (5:00 a.m.) to start a batch job. I had a key card that got me into the building and into the office during the day, but I was unaware that at night they were locking doors in the elevator lobby. I banged on the doors and tried calling people, to no avail. Finally, after a half-hour, out of frustration, I grabbed one of the door handles and just yanked hard.

It popped open.

I looked at it in surprise, thought “sweet!”, walked in to the cubicle farm, sat down, and started my batch job. All was good.

Around 7:00 a.m., the LAPD showed up. There were about a dozen people in the office now, so the two officers began questioning folks nearest the door. From the opposite side of the room, I stood up, called out “Over here,” and ’fessed up.

They told me that if I hadn’t called them over immediately, they would have arrested me by the time they got to me.

Have a nice day, sir.

The NoCOUG Blues

NoCOUG membership and conference attendance have been declining for years. Are user groups still relevant in the age of Google? Do you see the same trends in other user groups? What are we doing wrong? What can we do to reverse the dismal trend? Give away free stuff like T-shirts and baseball caps? Bigger raffles? Better food?

Yes, the same trends are occurring in other users groups. IT organizations are lean and can’t spare people to go to training. The IT industry is trending older as more and more entry-level functions are sent offshore.
Users groups are about education. Education in general has changed over the past 20 years as online searches, blogs, and webinars have become readily available.

The key to users groups is the quality of educational content that is offered during live events as opposed to online events or written articles.

Although online events are convenient, we all know that we, as attendees, get less from them than we do from face-to-face live events. They’re better than nothing, but communities like NoCOUG have the ability to provide the face-to-face live events that are so effective.

One of the difficulties users groups face is fatigue. It is difficult to organize events month after month, quarter after quarter, year after year. There is a great deal of satisfaction in running such an organization, especially one with the long and rich history enjoyed by NoCOUG. But it is exhausting. Current volunteers have overriding work and life conflicts.

New volunteers are slow to come forward.

One thing to consider is reaching out to the larger national and international Oracle users groups, such as ODTUG, IOUG, OAUG, Quest, and OHUG. These groups have similar missions and most have outreach programs. ODTUG and IOUG in particular organize live onsite events in some cities, and have webinar programs as well. They have content, and NoCOUG has the membership and audience. NoCOUG members should encourage the board to contact these larger Oracle users groups for opportunities to partner locally.

Another growing trend is meet-ups, specifically through Meetup.com. This is a resource that has been embraced by all manner of tech-savvy people, from all points on the spectrum of the IT industry. I strongly urge all NoCOUG members to join Meetup.com, indicate your interests, and watch the flow of announcements visit your inbox. The meet-ups run the gamut from Hadoop to Android to Oracle Exadata to In-Memory to Big Data to Raspberry Pi to vintage Commodore. I think the future of local technical education lies in the intersection of online organization of local face-to-face interaction facilitated by Meetup.com.

Four conferences per year puts a huge burden on volunteers. There have been suggestions from multiple quarters that we organize just one big conference a year like some other user groups. That would involve changing our model from an annual membership fee of less than $100 for four single-day conferences (quarterly) to more than $300 for a single multiple-day conference (annual), but change is scary and success is not guaranteed. What are your thoughts on the quarterly vs. annual models?

I disagree with the idea that changing the conference format requires increasing annual dues. For example, RMOUG in Colorado (http://rmoug.org/) has one large annual conference with three smaller quarterly meetings, and annual dues are $75 and have been so for years. RMOUG uses the annual dues to pay for the three smaller quarterly education workshops (a.k.a. quarterly meetings) and the quarterly newsletter; the single large annual “Training Days” conference pays for itself with its own separate registration fees, which of course are discounted for members.

Think of a large annual event as a self-sufficient, self-sustaining organization within the organization, open to the public with a discount for dues-paying members.

Other Oracle users groups, such as UTOUG in Utah (http://utoug.org/), hold two large conferences annually (in March and November), and this is another way to distribute scarce volunteer resources. This offers a chance for experimentation as well, by hiring one conference-coordinator company to handle one event and another to handle the other, so that not all eggs are in one basket.

The primary goal of larger conferences is ongoing technical education of course, but a secondary goal is to raise funds for the continued existence of the users group and to help subsidize and augment the website, the smaller events, and the newsletter, if necessary.

It costs a fortune to produce and print the NoCOUG Journal, but we take a lot of pride in our unbroken 28-year history, in our tradition of original content, and in being one of the last printed publications by Oracle user groups. Needless to say it also takes a lot of effort. But is there enough value to show for the effort and the cost? We’ve been called a dinosaur. Should we follow the other dinosaurs into oblivion?

I don’t think so. There are all kinds of formats for publication, from tweets to LinkedIn posts to blogs to magazines to books. Magazines like the NoCOUG Journal are an important piece of the educational ecosystem. I don’t think that any of the Oracle users groups who no longer produce newsletters planned to end up this way. They ceased publishing because the organization could no longer sustain them.

I think today the hurdle is that newsletters can no longer be confined within the users group. Both NoCOUG and RMOUG have independently come to the realization that the newsletter must be searchable and findable online by the world, which provides the incentive for authors to submit content. Today, if it cannot be verified online, it isn’t real. If it isn’t real, then there is little incentive for authors to publish.

So making the NoCOUG Journal available online has been key to its own viability, and NoCOUG membership entitles one to a real hard-copy issue, which is a rare and precious bonus in this day and age.

Oracle Database 12c

Mogens Norgaard (the co-founder of the Oak Table Network) claims that “since Oracle 7.3, that fantastic database has had pretty much everything normal customers need,” but the rest of us are not confirmed Luddites. What are the must-have features of Oracle 12c that give customers the incentive to upgrade from 11g to 12c? We’ve heard about pluggable databases and the in-memory option, but they are extra-cost options aren’t they?

I know for a fact that the Automatic Data Optimization (ADO) feature obsolesces about 3,000 lines of complex PL/SQL code that I had written for Oracle 8i, 9i, 10g, and 11g databases. The killer feature within ADO is the ability to move partitions online, without interrupting query operations. Prior to Oracle 12c, accomplishing that alone consumed hundreds of hours of code development, testing, debugging, and release management.

Combining ADO with existing features like OLTP compression and HCC compression truly makes transparent “tiers” of storage within an Oracle database feasible and practical. The ADO feature alone is worth the effort of upgrading to Oracle 12c for an organization with longer data retention requirements for historical analytics or regulatory compliance.

What’s not to love about pluggable databases? How different is the pluggable database architecture from the architecture of SQL Server, DB2, and MySQL?

I think that first, in trying to explain Oracle pluggable databases, most people make it seem more confusing than it should be.

Stop thinking of an Oracle database as consisting of software, a set of processes, and a set of database files.

Instead, think of a database server as consisting of an operating system (OS) and an Oracle 12c container database software; a set of Oracle processes; and the basic control files, log files, and a minimal set of data files. When “gold images” of Oracle database servers are created, whether for jumpstart servers or for virtual machines, the Oracle 12c CDB should be considered part of that base operating system image.

Pluggable databases (PDBs) then are the data files installed along with the application software they support. PDBs are just tablespaces that plug into the working processes and infrastructure of the CDBs.

When PDBs are plugged in, all operational activities involving data protection—such as backups or redundancy like Data Guard replication—are performed at the higher CDB level.

Thus, all operational concerns are handled at the CDBs and the operational infrastructure from the PDBs and the applications.

Once the discussion is shifted at that high level, then the similarities are more visible between the Oracle 12c database and other multitenant databases, such as SQL Server and MySQL. Of course there will always be syntactic and formatting differences, but functionally Oracle 12c has been heavily influenced by its predecessors, such as SQL Server and MySQL.

Bonus Question
Do you have any career advice for the younger people reading this interview so that they can be like you some day? Other than actively participating in user groups!

This sounds corny and trite, but there is no such thing as a useless experience, and while it may be frustrating, it presents the opportunity to build. Understand that everyone starts at the bottom, and enjoy the climb.
Understand that learning causes stress. Stress is stress and too much can be unhealthy, but if it is a result of learning something new, then recognize it for what it is, know it is temporary and transitory, tough it out, and enjoy knowing the outcome when it arrives.

Also, don’t voice a complaint unless you are prepared to present at least one viable solution, if not several. Understand what makes each solution truly viable and what makes it infeasible. If you can’t present a solution to go with the complaint, then more introspection is needed. The term “introspection” is used deliberately, as it implies looking within rather than around.

Help people. Make an impact. Can we go wrong in pursuing either of those as goals? Sometimes I wish I had done more along these lines. Never do I wish I had done less.

Presenting “How Data Recovery Mechanisms Complicate Security, and What To Do About It” at #c16lv

Personally-identifiable information (PII) and classified information stored within an Oracle database is quite secure, as the capabilities of Oracle database ensure that it is accessible only by authenticated and authorized users.

But the definition of authentication and authorization can change, in a sense.authentication

In a production application, authentication means verifying that one is a valid application user and authorization means giving that application user the proper privileges to access and manipulate data.

But now let’s clone the data from that production application to non-production systems (i.e. development, QA, testing, training, etc), so that we can develop new application functionality, fix existing functionality, test it, and deliver user training.

By doing so, in effect the community of valid users has changed, and now instead of being production application users, the community of valid users is developers and testers.  Or it is newly-hired employees being trained using “live” information from production.

And the developers and testers and trainees are indeed authenticated properly, and are thus authorized to access and manipulate this same data in their non-production systems.temptation  The same information which can be used to defraud, steal identities, and cause all manner of mayhem, if one so chose.  This honor system is the way things have been secured in the information technology (IT) industry for decades.

Now of course, I can hear security officers from every point of the compass screaming and wildly gesticulating, “NO! That’s NOT true! The definition of authentication and authorization does NOT change, just because systems and data are copied from production to non-production!”  OK, then you explain what has been happening for the past 30 years?

In the case of classified data, these developers and testers go through a security clearance process to ensure that they can indeed be trusted with this information and that they will never misuse it.  Violating one’s security clearance in any respect is grounds for criminal prosecution, and for most people that is a scary enough prospect to wipe out any possible temptation.

Outside of government, organizations have simply relied on professionalism and trust to prevent this information from being used and abused.Medal of Honor  And for the most part, for all these many decades, that has been effective.  I know that I have never even been tempted to share PII or classified data in which I’ve come in contact.  I’m not bragging, it is just part of the job.

Now, that doesn’t mean that I haven’t been tempted to look up my own information.  This is essentially the same as googling oneself, except here it is using non-public information.

I recall a discussion with the CFO of an energy company, who was upset that the Sarbanes-Oxley Act of 2002 held she and her CEO criminally liable for any information breaches within her company.  She snarled, “I’ll be damned if I’m going to jail because some programmer gets greedy.”  I felt this is an accurate analysis of the situation, though I had scant sympathy (“That’s why you’re paid the big bucks“).  Since that time we have all seen efforts to rely less on honor and trust, and more on securing data in non-production.  Because I think everyone realizes that the bucks just aren’t big enough for that kind of liability.

This has given rise to products and features which use encryption for data at rest and data in flight.  But as pointed out earlier, encryption is no defense against a change in the definition of authentication and authorization.  I mean, if you authenticate correctly and are authorized, then encryption is reversible obfuscation, right?

To deal with this reality, it is necessary to irreversibly obfuscate PII and classified data, which is also known as data maskingGuyFawkes mask There are many vendors of data masking software, leading to a robust and growing industry.  If you irreversibly obfuscate PII and classified information within production data as it is cloned to non-production, then you have completely eliminated the risk.

After all, from a practical standpoint, it is extremely difficult as well as conceptually questionable to completely artificially generate life-like data from scratch.  It’s a whole lot easier to start with real, live production data, then just mask the PII and classified data out of it.

Problem solved!

Or is it?

Unfortunately, there is one more problem, and it has nothing to do with the concept of masking and irreversible obfuscation.  Instead, it has to do with the capabilities of the databases in which data is stored.

Most (if not all) of this PII and classified data is stored within databases, which have mechanisms built in for the purpose of data recovery in the event of a mishap or mistake.  In Oracle database, such mechanisms include Log Miner and Flashback Database.

Since data masking mechanisms use SQL within Oracle databases, then data recovery mechanisms might be used to recover the PII and classified data that existed before the masking jobs were executed.  It is not a flaw in the masking mechanism, but rather it is the perversion of a database feature for an unintended purpose.

Ouch.

This topic became a 10-minute “lightning talk” presentation on “How Oracle Data Recovery Mechanisms Complicate Data Security, and What To Do About It” on Wednesday 13-April 2016 at the Collaborate 2016 conference in Las Vegas.  Please read the slidedeck for my notes on how to deal with the situation presented here.

You can download the slidedeck here [wpdm_package id=’660′].