Failed to change NetApp volume to data-protection (volume busy)

This relationship had been intentionally broken for some testing on the destination volume and when resync was issued, it had failed due to volume busy.

                            Healthy: false
                   Unhealthy Reason: Scheduled update failed to start. (Destination volume must be a data-protection volume.)
           Constituent Relationship: false
            Destination Volume Node: 
                    Relationship ID: aa9b0b54-64d9-11e5-be3f-00a0984ad3aa
               Current Operation ID: 1bed480d-1554-11e7-aa85-00a098a230de
                      Transfer Type: resync
                     Transfer Error: -
                   Current Throttle: 103079214
          Current Transfer Priority: normal
                 Last Transfer Type: resync
                Last Transfer Error: Failed to change the volume to data-protection. (Volume busy)

To check the snapshots on the volume for busy status and dependency:
snapshot show -vserver 'vserver_name' -volume 'volume_name' -fields busy, owners  

In this case, a running NDMP backup session was preventing the resync.

To list NDMP backup sessions:
system services ndmp status  

The system services ndmp status command lists all the NDMP sessions in the cluster. By default it lists the following details about the active sessions:

To list details for a NDMP backup session:
system services ndmp status -node 'node_name' -session-id 'session-id'  

From here you can confirm this is the NDMP session you need to kill by referencing the ‘Data Path’ field. This should be the path to the volume that is failing the resync.

To kill NDMP backup session:
system services ndmp kill 'session-id' -node 'node_name'  

The system services ndmp kill command is used to terminate a specific NDMP session on a particular node in the cluster. This command is not supported on Infinite Volumes.

After clearing the busy snapshot application dependency, I was able to successfully issue the resync as per normal operations.

VMware SRM Error: “Failed to create snapshots of replica devices”

Today I encountered VMware SRM error “Failed to create snapshots of replica device Cause: SRA command ‘testFailoverStart’ failed. Storage port not found Either Storage port information provided in NFS list is incorrect else Verify the “isPv4″ option in ontap_config file matches the ipaddress in NFS field.”

Found the solution in kb article 000016026:

Looks to be caused by the firewall-policy of the SVM data LIFs. These were set for “mgmt”, which are not detected by the SRA according to the kb article.

To change the firewall-policy from “mgmt” to “data”:
net int modify -vserver [vserver_name] -lif [data_lif_name] -firewall-policy data  
To list LIFs by firewall-policy:
net int show -fields firewall-policy  

Article also advises checking the ontap_config file on the SRM server to

ensure that the NFS IP address on the controller is correct and the IP address format mentioned in the NFS address field matches the value set for the isipv4 option in the ontap_config file

By default, the configuration file is located at install_dirProgram FilesVMwareVMware vCenter Site Recovery ManagerstoragesraONTAPontap_config.txt. You’ll look for the “isPv4” option.

NetApp OCUM Error: “Cluster cannot be deleted when discovery is in progress.”

I opened a case recently as I had a cluster in OnCommand Unified Manager that was no longer polling. When I tried to delete and recreate I received the error “Cluster cannot be deleted when discovery is in progress.” I am running version 7.0.

Turns out I had hit a Burt. The Burt # is 1053008, and you can view it by logging into the support site with your credentials.

The fix is here:

The link doesn’t seem to be working right now, but I copied the details below.

Note: for vApps you will need to use diag shell, instructions can be found here:

• Error unable to discover cluster, Cluster already exists.
• When a cluster is added to the OnCommand Unified Manager (UM) Dashboard, then this even gets logged:

Failed to add cluster An internal error has occurred. Contact technical support. Details: Cannot update server (com.netapp.oci.server.UpdateTaskException [68-132-983])

When the user attempts to remove the cluster, it fails indicating that it is being acquired.
Navigating to Health, Settings, Manager datasources and observing that the datasource is failing.
• For UM, the ocumserver-debug log may contain:
2016-12-19 08:06:14,583 DEBUG [oncommand] [reconcile-0] [c.n.dfm.collector.OcieJmsListener] OCIE JMS notification message received: {DatasourceName=Unknown, DatasourceID=-1, ClusterId=3387647, ChangeType=ADDED, UpdateTime=1482152623430, MessageType=CHANGE}
• When a cluster is added to the UM Dashboard this message may be displayed indicating that the issue is in the OnCommand Performance Manager (OPM) database:
Cluster in a MetroCluster configuration is added only to Unified Manager. Cluster add failed for Performance Manager.

Note: The MetroCluster part of the message is not relevant but is included as that full message is possible.
• For OPM, the ocfserver-debug log may contain:
2016-12-19 09:15:00,013 ERROR [system] [taskScheduler-5] [o.s.s.s.TaskUtils$LoggingErrorHandler] Unexpected error occurred in scheduled task.
com.netapp.ocf.collector.OcieException: com.onaro.sanscreen.acquisition.sessions.AcquisitionUnitException [35]
Failed to getById id:-1
Caused by: com.onaro.sanscreen.acquisition.sessions.AcquisitionUnitException: Failed to getById id:-1

This is under investigation in Documented Issue 1053008.

Because the cluster is successfully removed from the datasource tables but not the inventory tables, when the cluster is re-added there is a disconnect between these two tables. Attempting to re-add the inventory and performance fail due to duplicate entries tied to the old objects in the database as the values are not unique.
1. Shutdown the OPM host.
2. Shutdown the UM host.
3. Take a VMware snapshot or other backup per your company policy.
4. Boot UM
5. When the UM WebUI is accessible, boot OPM.
6. Check MYSQL in order to determine which hosts have the invalid datasource ID.
For vApps
Use KB 000030068, get to the diag shell
diag@OnCommand:~# sudo mysql -e “select datasourceId, name, managementIp from netapp_model.cluster where datasourceId = -1;”
| datasourceId | name| managementIp |
| -1 | clusterName | |

diag@OnCommand:~# sudo mysql -e “select datasourceId, name, managementIp from netapp_model.cluster where datasourceId = -1;”
| datasourceId | name| managementIp |
| -1 | clusterName | |

A. Open a Windows Command Prompt window
B. Browse to the MySQLMySQL Server 5.6bin directory
EXAMPLE:>cd “Program FilesMySQLMySQL Server 5.6bin”Authenticate MYSQL to access the Database:> mysql -u -p
C. When you press [ENTER] the system will prompt you to enter the user’s password.
a. NOTE: The user and password were created when MYSQL was first installed on the Windows host. There is not a NETAPP default user that can be used to authenticate MYSQL.
MySQL>select datasourceId, name, managementIp from netapp_model.cluster where datasourceId = -1;”
| datasourceId | name| managementIp |
| -1 | clusterName | |

  1. Download the attached script appropriate for the version of UM and OPM.
    For vApps,
    A. Use an application such as FileZilla or WinSCP to upload the script to the /upload directory on the vApp.
    B. Use KB 000030068, get to the diag shell
    C. Add the execute attribute to the script.
    a. Syntax# sudo chmod +x /jail/upload/
    For RHEL
    A. Be sure to have sudo or root access to the host.
    B. Move the script to /var/logs/ocum/
    C. Add the execute attribute to the script.
    a. Syntax# sudo chmod +x /var/logs/ocum/
    For Windows (UM only)
    A. Browse to the Directory where MYSQL is installed MySQLMySQL Server 5.6bin
    a. EXAMPLE: C:Program FilesMySQLMySQL Server 5.6bin
    B. Save a copy of the following Script in the ‘MySQL Server 5.6bin’ directory: BURT1053008_um_70_Windows”

  2. Execute the script.
    A.For vApps: sudo /jail/upload/scriptname
    B. For RHEL: sudo /var/logs/ocum/script
    C. For WIndows: MYSQL.ext -u -p MySQLMySQL Server 5.6bin >Script_name

  3. Confirm that the datasources with ID -1 are gone:
    vApps and RHEL: diag@OnCommand:~# sudo mysql -e “select datasourceId, name, managementIp from netappmodel.cluster where datasourceId = -1;”
    A.Authenticate MYSQL as per the steps outlined above, then run Syntax:>
    select datasourceId, name, managementIp from netapp
    model.cluster where datasourceId = -1;
    B. Type: Exit to exit MYSQL
  4. Reboot the host
  5. Once UM is back up, perform this same process with OPM.
  6. Once UM an OPM have been corrected, perform a discovery of the cluster and verify that it is showing up within the WebUI.
  7. If the same failure occurs, please contact NetApp Technical Support for further assistance.
    After I ran the scripts provided in the BURT, on both the OnCommand Unified Manager and OnCommand Performance Manager servers, the cluster no longer showed up in inventory and I could add succesfully.

NetApp ACP status showing “Partial connectivity ” after IOM6 firmware upgrade

Had an issue this week on a pair of FAS8080s running 8.2.2P1 Cluster-Mode. During firmware upgrade for IOM6 module from version 0208 to 0209, I ran into ACP (Alternate Control Path) Connectivity Status showing “Partial Connectivity”.

I followed the recommended action plan:
1. Disable the ACP feature in ONTAP:
>options acp.enabled off
2. Reseat the IOM module with the unresponsive ACP processor.
3. Reenable the ACP feature:
>options acp.enabled on

That however, did not resolve the issue and we had to replace the module in order to correct. I did not disable ACP prior to replacement.

After replacement, ACP shows status “Additional Connectivity”:

ACP status then shows “Full Connectivity” with module status as “inactive (upgrading firmware)”:

After firmware upgrade from 02.08 to 02.09, the module reboots:

It reports 02.09 firmware and status “inactive (initializing)”:

And concludes with status “active”:

I didn’t expect to resolve this via HW replacement because it hadn’t been reporting an ACP issue prior to the IOM6 firmware upgrade. But that’s what resolved it.

Enabling tracking quotas in NetApp Cluster-Mode

Tracking quotas are like regular quotas but without any quota limits enforced. Tracking quotas enable you to generate disk and file capacity reports, and when used in conjunction with quotas they are helpful because you can resize quota values without having to reinitialize (turning them off and on to activate).

I recently used tracking quotas on volumes dedicated for user home directories in order to automated a chargeback report of user directory folder sizes using the Data ONTAP PowerShell Toolkit. But more on that later. First we need to get tracking quotas enabled.

We begin by creating a quota policy:

::> quota policy create -vserver vserver_name -policy-name quotatrackingpolicy

Create tracking quota rule(s) (this can be qtrees or volumes, I prefer using volumes):

::> quota policy rule create -vserver vserver_name -policy-name quotatrackingpolicy -volume uservol1 -type tree -target "" 

::> quota policy rule create -vserver vserver_name -policy-name quotatrackingpolicy -volume uservol2 -type tree -target "" 

::> quota policy rule create -vserver vserver_name -policy-name quotatrackingpolicy -volume uservol3 -type tree -target "" 

Configure the vserver to use the policy you created:

::> vserver modify -vserver vserver_name -quota-policy quotatrackingpolicy

Enable the quotas on the volume(s):

::> quota modify -vserver vserver_name -volume uservol1 -state on
[Job 4992] Job is queued: "quota on" performed for quota policy "quotatrackingpolicy" on volume "uservol1" in Vserver "vserver_name".

::> quota modify -vserver vserver_name -volume uservol1 -state on
[Job 4993] Job is queued: "quota on" performed for quota policy "quotatrackingpolicy" on volume "uservol2" in Vserver "vserver_name".

::> quota modify -vserver vserver_name -volume uservol1 -state on
[Job 4994] Job is queued: "quota on" performed for quota policy "quotatrackingpolicy" on volume "uservol3" in Vserver "vserver_name".

Test volume quota report:

::> quota report -volume uservol1 -vserver vserver_name
Vserver: vserver_name  
                                    ----Disk----  ----Files-----   Quota
Volume   Tree      Type    ID        Used  Limit    Used   Limit   Specifier  
-------  --------  ------  -------  -----  -----  ------  ------   ---------
uservol1    user  *  0B    -       0       -   *  
uservol1    user  BUILTINAdministrators  78.74GB  -  163733  -  
uservol1    user  root  0B  -      2       -  
uservol1    user  ADDOMAINuser1  495.3MB  -  13087  -   *  
uservol1    user  ADDOMAINuser2  3.88GB  -  49889  -   *  
uservol1    user  ADDOMAINuser3  38.03MB  -  301  -   *  
uservol1    user  ADDOMAINuser4 3.33GB  -  9079  -   *  
uservol1    user  ADDOMAINuser5  3.18GB  -  37629  -   *  
uservol1    user  ADDOMAINuser6  612.0MB  -  4815  -   *  
uservol1    user  ADDOMAINuser7  83.76MB  -  989  -   *  
uservol1    user  ADDOMAINuser8  260.4MB  -  5378  -   *  
11 entries were displayed.  

For more information, visit here:
And here: