markw.dev: blog

markw.dev | all posts collections tags

runspaces explained

by mark wilkinson, 2020-02-06

estimated read time: 8-10 minutes

tag(s): powershell, runspaces, programming

PowerShell runspaces are a great, if often confusing, feature of PowerShell. If you need to get a lot of work done fast, and have capacity to do lots of work in parallel, runspaces can help you out.

In this series of posts on runspaces I hope to give you the information you need to understand, use, and troubleshoot runspaces more effectively.

simple runspaces example

Below is the typical code you might see when reading about working with runspaces:

$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, 5)
$RunspacePool.Open()

$ScriptBlock = { Get-Random }

$Runspaces = @()
(1..10) | ForEach-Object {
    $Runspace = [powershell]::Create().AddScript($ScriptBlock)
    $Runspace.RunspacePool = $RunspacePool
    $Runspaces += New-Object PSObject -Property @{
        Runspace = $Runspace
        State = $Runspace.BeginInvoke()
    }
}

while ( $Runspaces.State.IsCompleted -contains $False) { Start-Sleep -Milliseconds 10 }

$Results = @()

$Runspaces | ForEach-Object {
    $Results += $_.Runspace.EndInvoke($_.State)
}

This code is executing a code block ten times (just returning a random number), and allowing 5 executions to run at a time via a runspace pool. This code will run just fine, and in many cases you can probably just copy and paste it into your script and be good to go. But that would be a pretty lame way to end a blog post, so lets dig a little deeper and see what all of this code does.

But First, Pedantics

So I have a problem with some of the posts I've read about runspaces. It all comes down to a small detail that I think makes a big difference in your understanding of them.

$Runspace = [powershell]::Create()

This code looks innocent. What does it do? You'd probably think it's creating a new runspace, but it's not. This code is instead creating a fresh instance of PowerShell. If you run this code and run Get-Runspace you'll see there is still just one listed, the one attached to your current session. So what is this instance we just created?

A PowerShell instance handles almost everything about executing PowerShell code, except executing the actual commands. A PowerShell instance is a "wrapper" of sorts that abstracts a lot of the functionality related to the runspace that is doing all the work. The PowerShell instance handles creating the command pipeline (think of it like a queue of commands to run) that the runspace will use, and also handles adding commands to it. A quick example script can show how you might do this manually without directly using the instance:

$PowerShell = [powershell]::Create()
$Pipeline = $PowerShell.Runspace.CreatePipeline()
$Pipeline.Commands.Add({Get-Variable})
$Pipeline.Invoke()

When you create a new PowerShell instance it comes with a default runspace, in this script we are using that runspace directly to do some work. This approach is pretty verbose though and can get really complicated, so instead we typically use the PowerShell instance itself to do this work:

$PowerShell = [powershell]::Create().AddScript({Get-Variable})
$PowerShell.Invoke()

The distinction between instances and runspaces isn't important for simple examples, but as we get deeper in future posts it will make it easier to understand more complex examples. Now that we have that out of the way we can dive into the example in a little more depth.

example explained

Starting with the first two lines in our example we are creating a runspace pool, and then opening it.

$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, 5)
$RunspacePool.Open()

A runspace pool is a mechanism to control the number of active runspaces executing at any given time. Think of it as a simple concurrency limiter. A runspace pool is attached to any number of PowerShell instances, and in turn those instances communicate with the pool to ensure only a certain number of runspaces execute code at a time. In this case we are creating a pool with a minimum of 1 executing runspace, and a maximum of 5. If we attempt to execute more they will wait for slots to become available as other runspaces complete their work.

Next we are creating an array to hold our instances as they are created, and then entering into a loop using the PowerShell range operator (1..10). The range operator is a quick way to generate an iterable array of a given size in PowerShell. In this case this operator is just generating an array with 10 elements in it, integers from 1 to 10, which means the code within the loop will be executed 10 times:

$Instances = @()
(1..10) | ForEach-Object {
    $Instance = [powershell]::Create().AddScript({Get-Random})
    $Instance.RunspacePool = $RunspacePool
    $Instances += New-Object PSObject -Property @{
        Instance = $Instance
        State = $Instance.BeginInvoke()
    }
}

Within the loop we:

Create a new instance, and add a scriptblock to it: {Get-Random}
Bind the instance to the runspace pool we created
Add the new instance to our $Instances array (it's more complicated than this, but we'll discuss it more below)

The code we are using to add the instance to the array is a little odd:

$Instances += New-Object PSObject -Property @{
    Instance = $Instance
    State = $Instance.BeginInvoke()
}

Here we are creating a custom object, PSCustomObject, with two properties:

instance - This is the new PowerShell instance we created
State - This is the output of BeginInvoke()

When we call BeginInvoke() we are telling the instance to execute its scriptblock asynchronously, and return an object we can use to determine the state of that execution. The object returned is an AsyncResult object, this object has an IsCompleted property to tell us if the script is complete or not, and also stores the final results of the execution when it completes.

Now back to the rest of the script. After all of our instances have been added to the array we enter a while loop and use the object we got back from BeginInvoke() to wait until all of our instances have finished executing:

while ( $Instances.State.IsCompleted -contains $False) { Start-Sleep -Milliseconds 10 }

Specifically we are generating an array of IsCompleted properties for all of the instances we created, and then seeing if that array contains $False, which would indicate something is still running.

Many examples omit the sleep statement in the while loop. This can lead to lots of extra CPU usage. When you omit the sleep you enter a tight loop where the computer will check the completion state of your instances as fast as it can. Adding the sleep slows this process down and can reduce CPU consumption considerably. In simple tests I have seen scripts go from consuming 25% CPU during this loop to not really consuming any noticable amount at all, just by adding this sleep statement.

Once everything has completed we break out of our while loop and finally loop through the instances and get our results:

$Results = @()

$Instances | ForEach-Object {
    $Results += $_.Instance.EndInvoke($_.State)
}

To get the results from a completed instance you have to execute the EndInvoke() method. EndInvoke() is kind of a misleading name, it isn't ending anything, instead it is retrieving whatever output was generated by an asynchronous process in a instance. If you recall, when we started the executions on our instances we called BeginInvoke() which returned an AsyncResult object which we then stored in the State property of our $Instances array. So the above code is looping through each of our instances, and calling EndInvoke() for the State property of that instance. It is then taking whatever data is returned and putting into a $Results array for use later.

While not required by any means, if you want to read a bit more on the async objects being passed around for this to work, take a look at this: Microsoft Docs: IAsyncResult

summary

In this post we covered a few key concepts related to runspaces and instances:

Instance - A fresh PowerShell child process spawned under the current process
Runspace - A thread that executes PowerShell code within a PowerShell instance
RunspacePool - Controls the number of runspaces that can be executing at any given time
BeingInvoke() - Method that can be called on any instance object to start execution, this method call will return an AsyncResult object that can be used to track completion and obtain the output of the script
IsCompleted - Property of the AsyncResult object output by BeginInvoke(), this will tell you when an instance has completed execution
EndInvoke() - Method that can be called on any instance object to end execution and return results - it expects an AsyncResult object as an argument

conclusion

This was a relatively quick introduce runspaces and instances. Hopefully you've come away with a better understanding of what they do and why you should think about using them. In future posts we'll go into more advanced topics like passing data into your instances, sharing data between instances, and debugging methods. When this series wraps up we will go over a more complex structure I developed to break out of work being done on parallel instances early to allow you to fail fast and not waste time waiting for everything to finish.

< newer post older post >