A cache holdup battery
is nearing its expiration date and needs to be replaced.
What Caused the Problem?
A battery is nearing the end of its useful life. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.
Caution: Electrostatic charges can damage sensitive components. Always use proper antistatic protection when handling components. Touching components without using a proper ground may damage the equipment.
Caution: Risk of Data Loss. If you remove a battery while it is in a Battery Nearing Expiration state and write caching is enabled on a volume, you risk the chance of losing cached data if power to the storage array fails. If power fails before the storage array can detect the removed battery and write all cached data to the disks, you will lose cached data. Typically, it takes approximately 2 minutes to write all cached data (with a maximum of 10 minutes).
Important Notes
Contact your Technical Support Representative if you do not have a replacement battery available.
The purpose of the battery is to preserve cached data in the event of a power failure.
Refer to the Smart battery field in the Details area to determine if this battery is SBD (Smart Battery Data)-capable.
If the field is no, then write caching will be automatically suspended when the battery reaches its expiration date (if previously enabled). Write caching will be reinstated (if applicable for each volume) once you replace the battery and the replacement battery is charged to a sufficient level to support cached data in the event of a power failure.
If the field is yes, then write caching will not be affected when the battery reaches its expiration date provided that it can still hold sufficient charge to preserve cached data in the event of a power failure. If the battery can NOT hold sufficient charge to preserve cached data, then 1) write caching will be disabled and 2) a "Battery Replacement Required" problem will be reported in the Recovery Guru Summary area.
To see how many days remain until replacement is required, select the
Hardware > Tray > Change > Battery Settings
menu option, and then select the battery that is listed in the Details area.
The battery replacement procedure varies depending on the type of storage array you have. Some batteries are directly accessible while some batteries are inside the controller which require you to remove the controller first before servicing the battery. Consult the appropriate hardware manual if you need details on locating and replacing the battery.
A "Battery Nearing Expiration" problem may still occur if you do not reset the battery age after you replace the battery. Make sure you reset the battery age for the replacement battery.
You can resolve this problem in the following ways:
Replace the battery now and assume the risk of losing cached data as outlined in the caution, below. As long as you have the replacement battery readily accessible, the replacement procedure should take less than a minute to complete.
Ensure that write cached data is preserved before replacing the battery by disabling write caching for all volumes in the storage array.
Wait until the battery reaches its expiration date before you replace it. If you choose this option, you do not need to continue with the recovery steps, but the storage array will 1) remain in a Needs Attention condition, 2) continue to display this problem in the Summary area, and 3) continue to log MEL event every 24 hours.
To ensure a complete configuration restore (both disk pool and traditional volume group), it is highly recommended that storage array configuration data is backed up prior to executing this procedure. This is especially important for simplex storage arrays, and controllers that operate without the use of batteries. To save your configuration, open either the Command Line Interface (CLI), or the Script Editor from the Enterprise Management Window (EMW), and execute the following command:
save storageArray dbmDatabase sourceLocation=onboard controller[a] contentType=all file="hostfile.zip";
Recovery Steps
Refer to the Component requiring service field in the Details area to determine which recovery steps you need to complete.
If... | Then... |
The component requiring service is the battery | Go to Recovery Steps for a Directly-Accessible Battery. |
The component requiring service is the controller that contains the battery | Go to Recovery Steps for a Battery Inside a Controller. |
Recovery Steps for a Directly-Accessible Battery
1 |
| ||||||||||||||||||
2 | Remove the affected battery. Note: The Service Action Allowed status in the Details area is always NO for this problem because the component is not yet expired or failed. In this situation, it is acceptable to remove the component even though the Service Action Allowed is NO. | ||||||||||||||||||
3 | Insert the new battery securely into place. | ||||||||||||||||||
4 | Record the installation date (today's date) and the new replacement date (according to the battery's warranty). | ||||||||||||||||||
5 | Select the Hardware > Tray > Change > Battery Settings menu option. | ||||||||||||||||||
6 | Select the battery you just replaced, and then click Reset to set the affected battery's age to zero. Note:
| ||||||||||||||||||
7 |
| ||||||||||||||||||
8 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative. |
Recovery Steps for a Battery Inside a Controller
If... | Then... |
Your storage array has one controller | Go to Procedure for Storage Arrays with One Controller. |
Your storage array has two controllers | Go to Procedure for Storage Arrays with Two Controllers. |
Procedure for Storage Arrays with One Controller
1 | Stop all I/O from all hosts to this storage array. When the Cache Active LED on the controller is no longer active (up to 5 minutes), proceed to step 2. Caution: Risk of Data Loss. You must wait for the Cache Active LED to stop blinking to ensure that all cache has been written to the drives in the storage array. |
2 | Click the Save As button in the Recovery Guru dialog to save the remaining steps to a file. These steps may no longer be accessible from the Recovery Guru dialog after you complete step 3. |
3 | Remove the controller canister that contains the affected battery. |
4 | Replace affected battery with a new replacement battery. Refer to your hardware documentation for the battery replacement procedure. |
5 | Insert the controller canister (containing the new battery) securely into place. After the controller appears on the Hardware tab of the AMW, go to step 6. |
6 | Record the installation date (today's date) and the new replacement date (according to the battery's warranty). |
7 | Select the Hardware > Tray > Change > Battery Settings menu option. |
8 | Select the battery you just replaced, and then click Reset to set the affected battery's age to zero. Note:
|
9 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative. |
Procedure for Storage Arrays with Two Controllers
1 | If there are any hosts connected to this storage array that are NOT running a host-based, multi-path failover driver, stop I/O to the storage array from each of these hosts. | ||||||
2 | Place the affected controller offline.
| ||||||
3 | Click the Save As button in the Recovery Guru dialog to save the remaining steps to a file. These steps may no longer be accessible from the Recovery Guru dialog after you complete step 4. | ||||||
4 | Click the Recheck button to rerun the Recovery Guru. There should be an "Offline Controller" problem reported in the Summary area. | ||||||
5 | Follow the "Offline Controller" recovery steps until you have removed the controller. After you have removed the controller, do not continue with the "Offline Controller" recovery steps until you are instructed to do so later in this procedure. | ||||||
6 | Replace affected battery with a new replacement battery. Refer to your hardware documentation for the battery replacement procedure. | ||||||
7 | Complete the remaining "Offline Controller" recovery steps, then go to step 8. | ||||||
8 | Record the installation date (today's date) and the new replacement date (according to the battery's warranty). | ||||||
9 | Select the Hardware > Tray > Change > Battery Settings menu option. | ||||||
10 | Select the battery you just replaced, and then click Reset to set the affected battery's age to zero. Note:
| ||||||
11 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative. |
Target | NetAppESeries.StorageArray | ||
Parent Monitor | NetAppESeries.StorageArrayAvailability | ||
Category | Custom | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | NetAppESeries.FailureUnitMonitorType | ||
Remotable | True | ||
Accessibility | Internal | ||
Alert Message |
| ||
RunAs | Default | ||
Comment | Machine generated entity |
<UnitMonitor ID="NetAppESeries.FailureID_0005_Monitor" Accessibility="Internal" Enabled="true" Target="NetAppESeries.StorageArray" ParentMonitorID="NetAppESeries.StorageArrayAvailability" Remotable="true" Priority="Normal" TypeID="NetAppESeries.FailureUnitMonitorType" ConfirmDelivery="true" Comment="Machine generated entity">
<Category>Custom</Category>
<AlertSettings AlertMessage="NetAppESeries.REC_BATTERY_NEAR_EXPIRATION_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name='FailureDescription']$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="NetAppESeries.StateIdA5B983F5DF6C038EA2AAAD7B23F41E5E" MonitorTypeStateID="NoIssue" HealthState="Success"/>
<OperationalState ID="NetAppESeries.StateIdF0C0D0BF118883E5B3428CB756C3D4E0" MonitorTypeStateID="IssueFound" HealthState="Error"/>
</OperationalStates>
<Configuration>
<FailureID>5</FailureID>
<IntervalSeconds>59</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<Trace>0</Trace>
</Configuration>
</UnitMonitor>