Long Running ASP.net Processes - a simple example27 Nov 2010
Sometimes, your web application needs to do something that takes a really long time - perhaps process a batch of files, backup or archive data, gather a bunch of data from external sources, or similar. When dealing with this situation, you're faced with a few challenges:
- browser and other timeout settings - web frameworks aren't designed to take more than a few seconds long to process a request and send back a response to the user.
- user feedback - the user needs some sort of indication that the system is working as intended and not frozen or encountered an error.
- user productivity - the user may want to do something else within your app while waiting for your process to finish.
I had to solve this problem myself a little while ago and thought I'd share my solution, which has a few concepts I did not find while searching for articles on the topic:
- Status update in the form of a log or history of process rather than just a single %age complete number used in a progress bar
- Providing parameters into the long running process, and
- Getting access to the HTTPContext during the long running process.<!--more-->
The key technologies used in solving the posted challenges are:
Ajax in the browser to dynamically update the UI with the current status in a smooth and expected manner for the user.
- I used the excellent jQuery libraries for this behavior.
Threads on the server to spawn a process that continues to run after a response has been given to the user.
Use of the ASP.net Cache as a way of communicating between the long running process and the rest of the web application
Use of JSON data to pass information between the browser and the server.
I've chosen to use ASP.net MVC in my example, because it provides very easy means to work with JSON requests and responses (and the framework rocks!). However the same approach could be used with ASP.net webforms as well. The MVC framework is not core to the solution. In fact, this approach could be used in a Java application just as well, given the ability to create threads and use some means of inter-process communication (session variables or similar).
The design is pretty straightforward. Let's look at the code in the browser first.
In the Browser
I won't show the HTML markup here, but it is very simple - just a jQueryUI button for the user to click. When the user clicks the button, we display a dialog box to hold the status updates, and the process is kicked off with an ajax request.
The browser stores that processID and creates a timer which will poll the server every second for an update on the status of that process:
Obviously polling every second may not be ideal so the frequency should be adjusted to suit your needs.
When the response with the updated status returns, the UI is updated:
When the process is completed, we let the user know and trigger an optional cleanup routine on the server that I'll go into more detail later. In this example I'm throwing an alert dialog to make it obvious to the user that the process is done, but this is just an example - you might want to do something a bit more user-friendly than that in your application.
Now, let's see what's happening on the server.
On the Server
Here is the controller action that triggers the long running process and returns a unique ID:
That is pretty straightforward and not very interesting, other than to point out that the only job of this action is to trigger the long running process, not do any of the actual work. More interesting is how it triggers the process:
Here we create a new Thread to do the actual work. The 'MyLongRunningProcess' is a method that we created to do the actual work. The 'ParameterizedThreadStart' method allows us to pass a single parameter to that method, so we use a simple object to hold whatever data we want to use during that process.
There are a few interesting points about the parameters:
- Because we've used a simple object to hold the data passed to the process, we can pass in whatever complex information and objects we might need.
- I've passed in a GUID processID that the long running process can use to tag it's status updates, and I've passed in the current HTTPContext so that the long-running process can have access to the environment of the web application. This is particularly useful if you are migrating controller code that may have assumptions/dependencies on the HTTPContext.
Here is the long running process itself:
Notice that I've assignedthe HTTPContext to the System.Web.HttpContext.Current property. I was actually surprised that I was able to do this. We are now keeping a handle to that Context which would otherwise be disposed of after the original request was made. I am assuming that when this thread finishes execution, that things will be disposed of during the normal .net garbage collection process, so I don't explicitly null the System.Web.HttpContext.Current at the end of the process. Perhaps I should - anyone with a bit more insight into this please leave a comment!
The process leaves status updates in the HTTPCache:
All we are doing is appending a new message onto the end of the existing status, which is simply an html string. This is very simple and actually works really well, but there are a few issues with this that I want to highlight:
- Because of the polling design, we won't know when the user has the very last status update, so I rely on a final action triggered by the browser to clean up the cache entry. Obviously this isn't guaranteed to happen - the user might close the browser or navigate to a different page, or the network might fail and not deliver the cleanup request.
- However, because the ASP.net cache will eventually kick the entry out of the cache based on it's expiry rules, we don't have to worry about a permanent buildup of garbage like we would if we were storing status in a file or other more permanent resource.
- The amount of data passed from the server to the client grows with each update to the status, and the majority of that data is redundant. Depending on the polling frequency, if your status update includes lists of thousands of files or similar, this could quickly become a real performance issue.
- This design could be refactored to only retrieve status updates that the client hasn't already recieved. Because of the inherent unreliability of http protocols, we can't just delete the status information once we send it back to the browser, we would have to have requests for status updates include some type of pointer to the last update recieved (datetime might suffice, but I think an ID of some sort would be more reliable), and send updates that have occurred after that point.
- The example stores the status as HTML (it includes
tags as line separators). This is not a good separation of design/layout and content/logic.
- An improvement would be to separate the updates with some other token (newlines), parse the data on the client, and apply whatever layout styling is appropriate, perhaps using jQuery templates.
- A variation would be to have the updateStatus method apply some type of template rendering (HTML.RenderPartial()?) to the status before it is inserted into the cache.
- The example uses a magic string in the returned status to determine when the process is completed. This is also not a good separation of design/layout and content/logic, and the status update could incorrectly think the process is finished if your status update includes that magic keyword before the process is finished - such as having the keyword in a filename, or similar.
- An improvement would be to return the 'complete or not' status (or a %age complete amount) as an additional parameter in the status update response, which the browser could check. This could be stored in a separate cache element (make sure to use the processID in the key).
Here is the action used to return the status update to the browser:
And here is the action used to cleanup the cache entry when we're done:
General notes and comments
In this example, the user can trigger multiple processes by clicking the button again before existing processes are finished. Each status dialog will get it's own updates appropriately. The user can also close the dialogs before the process is finished. This only hides the dialog. The updates continue to happen, the user just can't see them. You might need to provide a way for the user to bring the dialog back up again. Obviously the choice of a dialog in the first place is just one of convenience - you can put the status updates wherever you like.
Note - this process still runs inside the asp.net application pool, so is subject to issues like app-pool recycling for various reasons, which would terminate your long running process. If you have a really long running process, or need one that is guaranteed to stay running through application restarts, you need to create a windows service, run the thread there, and communicate to and from the web application in a completely different manner - something beyond the scope of this blog post. 🙂
I hope this helps someone needing to do something similar in their application. Feel free to leave questions or comments!
Here is a link to the source code: Long Running Process Example