Error Attempting to Bounce Clustered Host

Topics: Server Deployment
Mar 25, 2014 at 4:21 PM
I'm using BTDF 5.5.

When attempting to restart a clustered host the deployment fails while trying to start the inactive node:
        Target BounceBizTalk:
            Target BounceAllBizTalkHosts:
                Host list (BizTalkHosts ItemGroup) not customized.
                Querying NJES1S7334 to request active cluster node...
                Active cluster node reported by NJES1S7334 is NJES1S7334.
                Stopping host instance: BiDiClusteredHost_MSMQ on NJES1S7334
                Stopped host instance : BiDiClusteredHost_MSMQ on NJES1S7334
                Starting host instance: BiDiClusteredHost_MSMQ on NJES1S7334
                C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.NC.Out.Collect\Deployment\Framework\BizTalkDeploymentFramework.targets(1181,5): error : The BizTalk Host instance "BiDiClusteredHost_MSMQ" on server "NJES1S7333" was not started. However the cluster which this is part of was brought online on node "NJES1S7334".\r
                C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.NC.Out.Collect\Deployment\Framework\BizTalkDeploymentFramework.targets(1181,5): error :    at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)\r
                C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.NC.Out.Collect\Deployment\Framework\BizTalkDeploymentFramework.targets(1181,5): error :    at System.Runtime.InteropServices.Marshal.ThrowExceptionForHR(Int32 errorCode)\r
                C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.NC.Out.Collect\Deployment\Framework\BizTalkDeploymentFramework.targets(1181,5): error :    at System.Management.ManagementObject.InvokeMethod(String methodName, ManagementBaseObject inParameters, InvokeMethodOptions options)\r
                C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.NC.Out.Collect\Deployment\Framework\BizTalkDeploymentFramework.targets(1181,5): error :    at System.Management.ManagementObject.InvokeMethod(String methodName, Object[] args)\r
                C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.NC.Out.Collect\Deployment\Framework\BizTalkDeploymentFramework.targets(1181,5): error :    at DeploymentFramework.BuildTasks.ControlBizTalkHostInstance.ControlHostInstances(String query)\r
                C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.NC.Out.Collect\Deployment\Framework\BizTalkDeploymentFramework.targets(1181,5): error :    at DeploymentFramework.BuildTasks.ControlBizTalkHostInstance.Execute()
            Done building target "BounceAllBizTalkHosts" in project "Deployment.btdfproj" -- FAILED.
        Done building target "BounceBizTalk" in project "Deployment.btdfproj" -- FAILED.
Coordinator
Mar 25, 2014 at 5:18 PM
Edited Mar 25, 2014 at 5:20 PM
Does restarting BiDiClusteredHost_MSMQ work properly in BizTalk Server Admin? The log indicates that NJES1S7334 is the active node, and the host instance was stopped successfully on that node, but failed trying to restart on that node. The only time the passive node NJES1S7333 appears is in the error message from attempting the start on the active node 334. I don't know why it's even mentioning 333.

I'm not sure what to make of the error. It should be able to successfully stop and start the host instance on the active node, and it appears that that's what it was trying to do. I could only find one mention of this error message.

There's also a script on TechNet for starting host instances that is supposed to handle a cluster. It's only a start, not a restart, but you could tweak it and use it as another way to debug.

Thanks,
Tom
Mar 25, 2014 at 5:39 PM
Within BizTalk Server Admin I can stop/start/restart the active node (334) without any issues. However, when attempting to start the inactive node (333) I get the same error. It's not really an error though, just informing me that I was trying to start the wrong node and that it started the active node instead.

Image

From the error during the deploy it seems to be the same thing (trying to restart the inactive node).
Coordinator
Mar 27, 2014 at 5:18 PM
An undocumented ClusterInstanceType value of 3... When I went through the BizTalk Admin MMC code, I found that their query explicitly filtered ClusterInstanceType on values of 0, 1 or 2 and thought it was odd. As a result, BizTalk Admin never includes the instances with a value of 3 in the UI. We didn't discover this early on because my original print-the-host-instances script only printed out the documented values and didn't have a catch-all Else.

I added the same filter into my query so the loop will never include the undocumented instances. Let's hope that solves it once and for all. I just uploaded a new build.

Thanks for your patience and persistence.
Tom
Mar 27, 2014 at 11:26 PM
Oh yes, it still works. I have it in my script, I just took it out to test what happens when attempting to stop/start the non-active clustered host instance.

Here's the full script:
Option Explicit
 
CONST ForReading = 1
' Host Instance status number
CONST HostInstServiceState_Stopped = 1
' is the host clustered?
CONST HostIsNotClustered = 0
CONST HostIsClustered = 1
CONST HostIsClusteredVirtual = 2
CONST HostIsClusteredManager = 3
 
Dim oFS: Set oFS = CreateObject("Scripting.FileSystemObject")
Dim thewholelist
Dim arrReplayList
Dim arrListRow
Dim sRow
Dim strGlobalhost

Call PrintHostInstances()

CheckForFile("HostInstancesList.txt")
' read in the list
thewholelist = oFS.OpenTextFile("HostInstancesList.txt", ForReading).ReadAll
arrReplayList = Split(thewholelist, vbCrLf)
' now we're going to do this in an inefficient way - for each record in the list,
' we query the management database and iterate through the whole list
' we could keep the collections around and just query once
' but this is a read-only operation and very low resource impact, so it's not a big deal
For Each sRow In arrReplayList
    If Len(sRow)>0 Then
        strGlobalhost=sRow
        Call HostInstanceStart(sRow)
        ' pause a second between each one
        WScript.Sleep(1000)
    End If
Next
 
WScript.Echo("Finished starting up host instances")
WScript.Quit 0
 
' Support functions, if yo are adding/modifying please keep these generic/re-usable.
Sub CheckForFile(thefile)
    Dim ocheckFS: Set ocheckFS = CreateObject("Scripting.FileSystemObject")
 
    If (Not ocheckFs.FileExists(thefile)) Then
        WScript.Echo thefile + " is missing"
        WScript.Quit 1
    End If
End Sub

Sub PrintHostInstances()
    On Error Resume Next
    Dim objWMIService
    Dim colHostInstances
    Dim objHostInstance
  
    Set objWMIService = GetObject("winmgmts://./root/MicrosoftBizTalkServer")
    Set colHostInstances = objWMIService.ExecQuery("SELECT * FROM MSBTS_HostInstance WHERE HostType=1")
    If (colHostInstances.Count <= 0) Then
        Wscript.Echo "Cannot find any enabled hosts on this system"
    Else
        For Each objHostInstance in colHostInstances
            Wscript.Echo objHostInstance.HostName & " (" & objHostInstance.RunningServer & ") - Cluster Type = " & objHostInstance.ClusterInstanceType & "; Service State = " & objHostInstance.ServiceState
        Next
    End If

    Set objWMIService = Nothing
    Wscript.Echo ""
    Wscript.Echo "------------------------------------"
    Wscript.Echo ""
End Sub

Sub HostInstanceStart(HostName)
    On Error Resume Next
    Dim objWMIService
    Dim colHostInstances
    Dim objHostInstance
    Dim strActiveClusterNode
    Dim iStartThis
    Dim iFoundName
  
    ' get a WMI object to hook in to the management database
    Set objWMIService = GetObject("winmgmts://./root/MicrosoftBizTalkServer")
  
    ' query BizTalk host instances that are of type In-Process (within BizTalk Server installation) and enabled
    iFoundName = 0
    Set colHostInstances = objWMIService.ExecQuery("SELECT * FROM MSBTS_HostInstance WHERE HostType=1 AND ClusterInstanceType!=3")
 
    ' If any host instance is found check name
    If (colHostInstances.Count <= 0) Then
        Wscript.Echo "Cannot find any enabled hosts on this system"
    Else
        Wscript.Echo("Checking status of " + HostName + ":")

        For Each objHostInstance in colHostInstances
            ' check the name
            If Ucase(HostName) = Ucase(objHostInstance.HostName) Then
                iFoundName = 1
                iStartThis = 0
                'WScript.Echo "...cluster instance type = " & objHostInstance.ClusterInstanceType
                WScript.Echo "...on " & objHostInstance.RunningServer
                
                ' is it clustered?
                If (objHostInstance.ClusterInstanceType = HostIsNotClustered) Then
                    iStartThis = 1
                Else
                    ' is it on the active node?
                    strActiveClusterNode = GetActiveClusterNode()

                    If (Ucase(objHostInstance.RunningServer) <> Ucase(strActiveClusterNode)) Then
                        WScript.Echo "...this is not the active host instance in the cluster"
                    Else
                        If (objHostInstance.ClusterInstanceType = HostIsClusteredManager) Then
                            WScript.Echo "...this is the cluster manager node, not valid for restarting"
                        Else
                            iStartThis = 1
                        End If
                    End If
                End If
     
                If iStartThis = 1 Then
                    Wscript.Echo "...stopping """ & objHostInstance.HostName & """ on " & objHostInstance.RunningServer
                    objHostInstance.Stop
                    CheckWMIError
                    Wscript.Echo "...starting """ & objHostInstance.HostName & """ on " & objHostInstance.RunningServer
                    objHostInstance.Start
                    CheckWMIError
                End If
            End If
        Next
        
        If iFoundName = 0 Then
            Wscript.Echo "Cannot find any enabled hosts instance for host " + HostName + ". Check to see if host instance is disabled or name is spelled wrong."
        End If
    End If

    Set objWMIService = Nothing
End Sub
 
Function GetActiveClusterNode()
    On Error Resume Next
    Dim strnodename
    Dim strnodename1
    Dim strComputer
    Dim wshShell
    Dim computername
    Dim arrComputers
    Dim objWMIService
    Dim Count, count1
    Dim colItems,objItem
    Dim strcomputername
  
    Set wshShell = WScript.CreateObject("WScript.Shell")
    strComputerName = wshShell.ExpandEnvironmentStrings("%COMPUTERNAME%")
 
    computername = Trim(strcomputername)
    Count = Len(strcomputername)
    Count1 = Len(strcomputername)
    Count = Count + 1
    Const wbemFlagReturnImmediately = &h10
    Const wbemFlagForwardOnly = &h20
    arrComputers = Array("localhost")

    For Each strComputer In arrComputers
        Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\MSCluster")
        Set colItems = objWMIService.ExecQuery("SELECT * FROM MSCluster_NodeToActiveGroup", "WQL", _
                                                wbemFlagReturnImmediately + wbemFlagForwardOnly)
        For Each objItem In colItems
            strnodename = right(objItem.GroupComponent, Count)
            strnodename1 = left(strnodename, Count1)

            If Len(strnodename1) > 0 Then
                'WScript.Echo "...active cluster node is " & strnodename1
                GetActiveClusterNode = strnodename1
            End If
        Next
    Next

    'WScript.Echo "...active cluster node is " + strnodename1
    Set objWMIService = Nothing
    Set wshShell = Nothing
End Function
 
Sub CheckWMIError()
    On Error Resume Next
    
    If Err <> 0 Then
        Dim strErrDesc: strErrDesc = Err.Description
        Dim ErrNum: ErrNum = Err.Number
        Dim WMIError: Set WMIError = CreateObject("WbemScripting.SwbemLastError")
        Dim FinalMessage
 
        If (TypeName(WMIError) = "Empty" ) Then
            FinalMessage = strErrDesc & " (HRESULT: " & Hex(ErrNum) & ")."
        Else
            FinalMessage = WMIError.Description & "(HRESULT: " & Hex(ErrNum) & ")."
        End If
        
        Set WMIError = nothing
        WScript.Echo FinalMessage
        ' if you want to quit on err: wscript.quit 0
    End If
End Sub
Here's the results:
Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.

BiDiClusteredHost_MSMQ (NJES1S7334) - Cluster Type = 3; Service State = 4
BiDiClusteredHost_MSMQ (NJES1S7333) - Cluster Type = 1; Service State = 1
BiDiClusteredHost_MSMQ (NJES1S7334) - Cluster Type = 2; Service State = 4
ReceivingClusteredHost (NJES1S7334) - Cluster Type = 3; Service State = 4
ReceivingClusteredHost (NJES1S7333) - Cluster Type = 1; Service State = 1
ReceivingClusteredHost (NJES1S7334) - Cluster Type = 2; Service State = 4
ReceivingHost (NJES1S7333) - Cluster Type = 0; Service State = 4
ReceivingHost (NJES1S7334) - Cluster Type = 0; Service State = 4
SendingHost (NJES1S7333) - Cluster Type = 0; Service State = 4
SendingHost (NJES1S7334) - Cluster Type = 0; Service State = 4
SqlHost (NJES1S7333) - Cluster Type = 0; Service State = 4
SqlHost (NJES1S7334) - Cluster Type = 0; Service State = 4
BiDiClusteredHost_FTP (NJES1S7334) - Cluster Type = 3; Service State = 4
BiDiClusteredHost_FTP (NJES1S7333) - Cluster Type = 1; Service State = 1
BiDiClusteredHost_FTP (NJES1S7334) - Cluster Type = 2; Service State = 4
IsolatedHost (NJES1S7333) - Cluster Type = 0; Service State = 4
IsolatedHost (NJES1S7334) - Cluster Type = 0; Service State = 4
ProcessingHost (NJES1S7333) - Cluster Type = 0; Service State = 4
ProcessingHost (NJES1S7334) - Cluster Type = 0; Service State = 4
SendingHost_SMTP (NJES1S7333) - Cluster Type = 0; Service State = 4
SendingHost_SMTP (NJES1S7334) - Cluster Type = 0; Service State = 4
TrackingHost (NJES1S7333) - Cluster Type = 0; Service State = 4
TrackingHost (NJES1S7334) - Cluster Type = 0; Service State = 4

------------------------------------

Checking status of BiDiClusteredHost_MSMQ:
...on NJES1S7333
...this is not the active host instance in the cluster
...on NJES1S7334
...stopping "BiDiClusteredHost_MSMQ" on NJES1S7334
...starting "BiDiClusteredHost_MSMQ" on NJES1S7334
Finished starting up host instances
Mar 28, 2014 at 12:50 PM
Just tried your latest patch (5.5.105.0) and it succeeded.

Here's the results:
BounceAllBizTalkHosts:
  Host list (BizTalkHosts ItemGroup) not customized.
  Host instance BiDiClusteredHost_MSMQ has cluster type Clustered.
  Querying localhost to request active cluster node...
  Active cluster node reported by localhost is NJES1S7334.
  Skipping passive clustered host instance: BiDiClusteredHost_MSMQ on NJES1S733
  3
  Host instance BiDiClusteredHost_MSMQ has cluster type ClusteredVirtual.
  Stopping host instance: BiDiClusteredHost_MSMQ on NJES1S7334
  Stopped host instance : BiDiClusteredHost_MSMQ on NJES1S7334
  Starting host instance: BiDiClusteredHost_MSMQ on NJES1S7334
  Started host instance : BiDiClusteredHost_MSMQ on NJES1S7334
  Host instance ReceivingClusteredHost has cluster type Clustered.
  Skipping passive clustered host instance: ReceivingClusteredHost on NJES1S733
  3
  Host instance ReceivingClusteredHost has cluster type ClusteredVirtual.
  Stopping host instance: ReceivingClusteredHost on NJES1S7334
  Stopped host instance : ReceivingClusteredHost on NJES1S7334
  Starting host instance: ReceivingClusteredHost on NJES1S7334
  Started host instance : ReceivingClusteredHost on NJES1S7334
  Stopping host instance: ReceivingHost on NJES1S7333
  Stopped host instance : ReceivingHost on NJES1S7333
  Starting host instance: ReceivingHost on NJES1S7333
  Started host instance : ReceivingHost on NJES1S7333
  Stopping host instance: ReceivingHost on NJES1S7334
  Stopped host instance : ReceivingHost on NJES1S7334
  Starting host instance: ReceivingHost on NJES1S7334
  Started host instance : ReceivingHost on NJES1S7334
  Stopping host instance: SendingHost on NJES1S7333
  Stopped host instance : SendingHost on NJES1S7333
  Starting host instance: SendingHost on NJES1S7333
  Started host instance : SendingHost on NJES1S7333
  Stopping host instance: SendingHost on NJES1S7334
  Stopped host instance : SendingHost on NJES1S7334
  Starting host instance: SendingHost on NJES1S7334
  Started host instance : SendingHost on NJES1S7334
  Stopping host instance: SqlHost on NJES1S7333
  Stopped host instance : SqlHost on NJES1S7333
  Starting host instance: SqlHost on NJES1S7333
  Started host instance : SqlHost on NJES1S7333
  Stopping host instance: SqlHost on NJES1S7334
  Stopped host instance : SqlHost on NJES1S7334
  Starting host instance: SqlHost on NJES1S7334
  Started host instance : SqlHost on NJES1S7334
  Host instance BiDiClusteredHost_FTP has cluster type Clustered.
  Skipping passive clustered host instance: BiDiClusteredHost_FTP on NJES1S7333
  Host instance BiDiClusteredHost_FTP has cluster type ClusteredVirtual.
  Stopping host instance: BiDiClusteredHost_FTP on NJES1S7334
  Stopped host instance : BiDiClusteredHost_FTP on NJES1S7334
  Starting host instance: BiDiClusteredHost_FTP on NJES1S7334
  Started host instance : BiDiClusteredHost_FTP on NJES1S7334
  Stopping host instance: IsolatedHost on NJES1S7333
  Stopped host instance : IsolatedHost on NJES1S7333
  Starting host instance: IsolatedHost on NJES1S7333
  Started host instance : IsolatedHost on NJES1S7333
  Stopping host instance: IsolatedHost on NJES1S7334
  Stopped host instance : IsolatedHost on NJES1S7334
  Starting host instance: IsolatedHost on NJES1S7334
  Started host instance : IsolatedHost on NJES1S7334
  Stopping host instance: ProcessingHost on NJES1S7333
  Stopped host instance : ProcessingHost on NJES1S7333
  Starting host instance: ProcessingHost on NJES1S7333
  Started host instance : ProcessingHost on NJES1S7333
  Stopping host instance: ProcessingHost on NJES1S7334
  Stopped host instance : ProcessingHost on NJES1S7334
  Starting host instance: ProcessingHost on NJES1S7334
  Started host instance : ProcessingHost on NJES1S7334
  Stopping host instance: SendingHost_SMTP on NJES1S7333
  Stopped host instance : SendingHost_SMTP on NJES1S7333
  Starting host instance: SendingHost_SMTP on NJES1S7333
  Started host instance : SendingHost_SMTP on NJES1S7333
  Stopping host instance: SendingHost_SMTP on NJES1S7334
  Stopped host instance : SendingHost_SMTP on NJES1S7334
  Starting host instance: SendingHost_SMTP on NJES1S7334
  Started host instance : SendingHost_SMTP on NJES1S7334
  Stopping host instance: TrackingHost on NJES1S7333
  Stopped host instance : TrackingHost on NJES1S7333
  Starting host instance: TrackingHost on NJES1S7333
  Started host instance : TrackingHost on NJES1S7333
  Stopping host instance: TrackingHost on NJES1S7334
  Stopped host instance : TrackingHost on NJES1S7334
  Starting host instance: TrackingHost on NJES1S7334
  Started host instance : TrackingHost on NJES1S7334
Done Building Project "C:\Program Files (x86)\GCMS Interfaces\CIC.GCMS.Common.N
C.Out.Collect\Deployment\Deployment.btdfproj" (Deploy target(s)).


Build succeeded.
Coordinator
Mar 28, 2014 at 2:35 PM
Edited Mar 28, 2014 at 3:21 PM
I was beginning to wonder if we'd ever get to the end of this one. Thanks so much for your help!!

One thing to consider: you're restarting all hosts including TrackingHost which really isn't necessary. My guess is you could exclude others like SendingHost_SMTP. If you add an ItemGroup with a BizTalkHosts element you can list only those hosts that need a restart based on this application.

The output is clearly quite verbose for the full restart. Would you rather see it as just "Restarting host instance: ..." and "Restarted host instance: ..." or is it fine as-is?

This issue is fixed in v6.0 (not yet available) and a patch is available for v5.5 on the Downloads page.

Thanks!
Tom
Marked as answer by tfabraham on 3/28/2014 at 8:21 AM
Mar 28, 2014 at 2:47 PM
Thank you so much for your help getting to the bottom of this. It's greatly appreciated!

Good point about the restarting all hosts. I was just trying to get the initial deploy to succeed so avoided customizing things too much, I may go back and do that now.

The restart output is a little much, but up to you if you want to clean it up or not.

No problems with cleaning up the thread, do what you need to help others.


I have a couple of other "minor" issues I've encountered with BTDF, but I'll start new threads for those...


Thanks again!
Mar 28, 2014 at 3:10 PM
tfabraham wrote:
One thing to consider: you're restarting all hosts including TrackingHost which really isn't necessary. My guess is you could exclude others like SendingHost_SMTP. If you add an ItemGroup with a BizTalkHosts element you can list only those hosts that need a restart based on this application.
BTW, just attempted this and the btdfproj does not support token replacement from the environment settings.

For example:
  <ItemGroup>
    <BizTalkHosts Include="${ProcessHost}" />
    <BizTalkHosts Include="${ReceiveHostMSMQ}" />
    <BizTalkHosts Include="${ReceiveHostFILE}" />
    <BizTalkHosts Include="${SendHostSQL}" />
  </ItemGroup>
Does not work.

The reason we would want to use tokens is that the host names can change from environment to environment. Not a big deal, just pointing it out.
Coordinator
Mar 28, 2014 at 3:18 PM
Yeah, that's due to the order of evaluation in MSBuild. MSBuild has already evaluated the ItemGroup before the properties are pulled in from the spreadsheet.

In the next release v6.0, your example will "just work" as-is. In the meantime, it's more of a pain. Here's the workaround.

Thanks,
Tom
Mar 28, 2014 at 3:32 PM
Good to know.

Thanks again, Tom. Keep up the great work!