Tag Archives: [kccpb_sanity_check_2]

ASM normal redundancy and failure groups spanning SANs

Julian Dyke has started an interesting thread on the Oak Table mailing list after the latest UKOUG RAC and HA SIG. Unfortunately I couldn’t attend that event, I wish I had, and I knew it would be great.

Anyway, the question revolved around an ASM disk group created with normal redundancy spanning two storage arrays. This should in theory protect against the failure of an array, although at a high price. All ASM disks exported from an array would be 1 failure group. Remember that disks in a failure group all fail if the supporting infrastructure (network, HBA, controller etc) fails. So what would happen with such a setup, if you followed these steps:

  • Shutdown the array for failure group 2
  • Stop the database
  • Shutdown the second array – failure group 1
  • Do some more maintenance…
  • Startup failgroup B SAN
  • Start the database
  • Startup failgroup A SAN

ASM can tolerate the failure of one failgroup (capacity permitting), so the failure of failgroup 2 should not bring the disk group down, which would result in immediate loss of service. But what happens if it comes up after the data in the other failure group has been modified? Will there be data corruption?

Continue reading