Backup script for Robocopy and external hard disk drives


If you have data that you value on external disk drives you have a couple of options for backup.  One is to Raid the drives and allow the Raid architecture chosen to take care of the backup.  Another method is to ‘roll your own’ backup strategy.  This does bring with it all of the pitfalls of not running the backups, not verifying that the backups are working, or letting the backups get out of date.

With these caveats noted, one benefit is the ability to incrementally add storage when it is needed at low cost – rather than having to matching it to the Raid architecture or the enclosure you have selected.  Bare internal HDDs are the cheapest way to buy storage – and simple USB docks can be purchased for these for £20 ($30) or less.

What size the partition?

Partitioning a source drive can help to fit backups on to target backup drives.  For source drives a size of 465 GB will allow drives to be partitioned with minimal lost space (e.g. 3 TB will have six 465 GB partitions plus change.)  It also means you do not need to synchronize the size of your source and backup drives.  The partition is the unit of backup – six 465 GB partitions can be spread over three 1 TB backup drives, or two 2 TB drives leaving 1 TB spare.  465 GB is also a reasonable size to recover (as a secondary data recovery route.)  Backup drives do not need to be partitioned and can be a single large volume.  If you are recovering a backup drive something has gone seriously wrong with your backup strategy.

How to backup?

The follow script provides a method and is provided as-is, try and test it yourself – I accept no responsiblity for any data loss.

It works on a data disk with the label “Disk1.Partition1” and a folder “Data”.  It expects to write to a disk with label “BackupDisk.1” and into a folder “Mirrors\Disk1.Partition1\Data”.

The two sections that use WMIC map from the label you provide to the path where the drive is installed (if it is plugged in and switched on.)  Removing the problem of drives getting mapped to different drive letters between backup sessions.

If the source and target (backup) drives are found, then it constructs the source and target full path names and echos these to the output.

Backup is done in two phases to avoid disk size issues.  If data has been both added and removed from the source, but the folder walking encounters the added data first – then a single step use of robocopy may attempt to mirror more data than the backup drive has space for.

The /PURGE line will remove all data in the backup folder that does not currently exist in the source folder.  Freeing the space for added data.

The /XJ parameter skips NTFS junction points.

The second robocopy invocation with the /MIR performs the backup – copying all new or changed data from the source to the target.  This process can be interrupted and resumed (by re-running it) without loosing (much) progress.

The script is coded to be executed inside a batch file.  If you place batch file inside the folder you are backup up (e.g. as backMeUp.bat), then the backup process becomes a matter or connecting the source and backup disks to a system, then double-clicking the backMeUp.bat file.

For more on robocopy, Google it.

A final disclaimer in case you missed the one above: this script is provided as-is, try and test it yourself – I accept no responsiblity for any data loss.

@echo off

setlocal enabledelayedexpansion

:: Change these four lines to match your source and backup locations
@set sourceLabel=Disk1.Partition1
@set sourceSubPath=Data
@set targetLabel=BackupDisk.1
@set targetSubPath=Mirrors\Disk1.Partition1\Data

FOR /F "skip=1 tokens=1 delims= " %%A IN ('WMIC volume WHERE Label^="%sourceLabel%" Get DriveLetter^,Label /Format:table') DO (
       IF %%A GTR 0 (
	Set sourceDrive=%%A
       )
    )

FOR /F "skip=1 tokens=1 delims= " %%A IN ('WMIC volume WHERE Label^="%targetLabel%" Get DriveLetter^,Label /Format:table') DO (
       IF %%A GTR 0 (
	Set targetDrive=%%A
       )
    )

if "%sourceDrive%"=="" (
  echo Cannot find the SOURCE drive "%sourceLabel%" - please insert and retry
)

if "%targetDrive%"=="" (
  echo Cannot find the BACKUP drive "%targetLabel%" - please insert and retry
)

if not "%sourceDrive%"=="" (
 if not "%targetDrive%"=="" (

  set newsourcePath=%sourceDrive%\%sourceSubPath%
  set newtargetPath=%targetDrive%\%targetSubPath%

  @echo source Path is !newsourcePath!
  @echo target Path is !newtargetPath!

  echo on

  robocopy !newsourcePath! !newtargetPath! /PURGE /E /NOCOPY /XJ

  robocopy !newsourcePath! !newtargetPath! /MIR /XJ

  pause

  GOTO:EOF 

) )

@echo on

pause
Advertisements

Asynchronous methods in C# using generics


Asynchronous methods provide a means to execute long running operations in parallel. One strategy for implementing these is to define generic methods to handle the asynchronous actions and pass the methods to be called asynchronously as parameters. A simple example follows.

The generic support methods

A class is created with static methods to support start and completion of asynchronous methods. The support methods for a single parameter with return type are shown. Other methods can be added to handle other delegate signature requirements.

    // Parameter and return type async method helper
    public static IAsyncResult BeginAsync<T1, TResult>(Func<T1, TResult> method,
                                                       T1 parameter)
    {
        var methodCopy = method;
        if (methodCopy == null)
            throw new ArgumentNullException();

        // Start the asynchronous operation
        // No callback or state object
        return methodCopy.BeginInvoke(parameter, null, null);
    }


    // Parameter and return type async method helper - async done
    public static TResult EndAsync<T1, TResult>(IAsyncResult aResult)
    {
        Func<T1, TResult> method =
            (Func<T1, TResult>) ((AsyncResult) aResult).AsyncDelegate;

        // Retrieve the result
        return method.EndInvoke(aResult);
    }

Asynchronous running

With these methods in place it becomes a simple process to use the asynchronous operations in code. Each operation calls the appropriate generic method to invoke the function, and a generic method to retrieve the result. A blocking strategy is shown here (so not applicable for use in a UI thread) but the example can easily be modified to use callbacks on completion.

    private int LongComputation(String data)
    {
        // Some long operation
        Thread.Sleep(10000);

        return data.Length;
    }

    private int MultiComputation()
    {
        IAsyncResult aRes1 =
            AsyncMethods.BeginAsync<String, int>(LongComputation, "Visions");
        IAsyncResult aRes2 =
            AsyncMethods.BeginAsync<String, int>(LongComputation, "of");
        IAsyncResult aRes3 =
            AsyncMethods.BeginAsync<String, int>(LongComputation, "Software");

        // Blocks for async call completion - not for use on UI thread
        int result = AsyncMethods.EndAsync<String, int>(aRes1);
        result += AsyncMethods.EndAsync<String, int>(aRes2);
        result += AsyncMethods.EndAsync<String, int>(aRes3);

        return result;
    }

WCF servers and clients without configuration files


WCF servers and clients can be set up using a declarative mechanism in web.config and app.config files. It is also possible to set up WCF servers and clients without using config files. A simple example follows.

For these examples the WCF server and client will be hosted in simple WPF applications.

WCF Server

A reference to the System.ServiceModel assembly / dll is added to link the associated namespaces. Then a ServiceContract is defined:

using System.ServiceModel;

...

[ServiceContract]
public interface IWCFContract
{
    [OperationContract]
    int Add(int value1, int value2);

    [OperationContract]
    int Multiply(int value1, int value2);
}

Next a concrete implementation of the service contract needs to be defined:

public class WCFConcrete : IWCFContract
{
    public int Add(int value1, int value2)
    {
        return (value1 + value2);
    }

    public int Multiply(int value1, int value2)
    {
        return (value1 * value2);
    }
}

Then the service host is programmatically created and the metadata is exposed and endpoints added:

using System.ServiceModel;
using System.ServiceModel.Description;

...

public ServiceHost CreateAndOpenHost()
{
    ServiceHost newHost =
        new ServiceHost(typeof(WCFConcrete),
                        new Uri("http://localhost:8002/SimpleService"));

    // Configure publication of service metadata
    ServiceMetadataBehavior behaviour = new ServiceMetadataBehavior();
    behaviour.HttpGetEnabled = true;
    behaviour.MetadataExporter.PolicyVersion = PolicyVersion.Policy15;
    newHost.Description.Behaviors.Add(behaviour);

    // Add a MEX (metadata exchange) endpoint
    newHost.AddServiceEndpoint(ServiceMetadataBehavior.MexContractName,
                               MetadataExchangeBindings.CreateMexHttpBinding(),
                               "mex");

    // Add an application endpoint using a WSHttpBinding
    newHost.AddServiceEndpoint(typeof(IWCFContract), new WSHttpBinding(), "");

    // Open the service host
    newHost.Open();

    return newHost;
}

Appropriate permissions will need to be given to the user running the application. This can be achieved by running the following in an elevated / administrator command window:

netsh http add urlacl url=http://+:8002/SimpleService user=DOMAIN\username

This service can now be run. The running service can be verified by browsing to the service in Internet Explorer:

http://localhost:8002/SimpleService

WCF Client

To create the client class run the svcutil.exe (located at C:\Program Files\Microsoft SDKs\Windows\{version}\Bin) on the wsdl for the service (while the service is running):

svcutil.exe http://localhost:8002/SimpleService?wsdl

This will create a WCFConcrete.cs (along with a config file which is not needed for this example.) This class file is added to the client project. Again a reference to the System.ServiceModel assembly will need to be added to the client project.

The ChannelFactory is then used to create a client to the service:

using System.ServiceModel;

...

public static IWCFContract CreateWCFClient(String endpointAddress)
{
    IWCFContract result = null;

    // Set up the binding and endpoint.
    WSHttpBinding binding = new WSHttpBinding();
    EndpointAddress endpoint = new EndpointAddress(endpointAddress);

    // Create a channel factory.
    ChannelFactory<IWCFContract> channelFactory =
                  new ChannelFactory<IWCFContract>(binding, endpoint);

    // Create a channel.
    result = channelFactory.CreateChannel();

    return result;
}

With this in place a client can simply be created and used to access the service (when it is running):

...

IWCFContract client = CreateWCFClient("http://localhost:8002/SimpleService");

int addResult = client.Add(2, 3);
int multiplyResult = client.Multiply(2, 3);

The adapter pattern and presenter-first UIs


The adapter pattern provides a technique to bind together the presenter logic and the UI screens of a presenter-first model (and other loosely coupled models) of user interface implementation. An example follows.

An application to double an integer requires a presenter function to perform the doubling on request:

public class Presenter
{
    public IInput Input { get; set; }
    public IOutput Output { get; set; }

    public void PerformDoubling()
    {
        int inputValue = Input.GetValue();
        int result = inputValue * 2;
        Output.SetValue(result);
    }
}

The presenter is using interfaces to access the user interface – which allows presenter unit tests to be built via mocks independent to the development of the UI:

public interface IInput
{
    int GetValue();
}

public interface IOutput
{
    void SetValue(int newValue);
}

Binding the interfaces to a deployed UI can then take the form of UI-specific adapters to the IInput and IOutput interfaces. I.e. for a Window1 class containing input and output accessors (implemented using explicit marshalling as described in this blog entry):

public partial class Window1 : Window
{
    ...

    public String ThreadSafeGetInputValue() ...
    public void ThreadSafeSetOutputValue(String newValue) ...
}

The following two adapters can be used to convert the UI to match the required interface without the need to modify the UI code. (Null and error checking has been excluded below to reduce code footprint:)

public class AdapterIInputWindow1 : IInput
{
    private Window1 Window1 { get; set; }

    public AdapterIInputWindow1(Window1 wrappedWindow)
    {
        Window1 = wrappedWindow;
    }

    // IInput implementation - excluding error checking
    public int GetValue()
    {
        int result = Convert.ToInt32(Window1.ThreadSafeGetInputValue());
        return result;
    }
}


public class AdapterIOutputWindow1 : IOutput
{
    private Window1 Window1 { get; set; }

    public AdapterIOutputWindow1(Window1 wrappedWindow)
    {
        Window1 = wrappedWindow;
    }

    // IOutput implementation - excluding error checking
    public void SetValue(int newValue)
    {
        Window1.ThreadSafeSetOutputValue(newValue.ToString());
    }
}

Finally the UI can initialize its presenter and set the IInput and IOutput interfaces to the appropriate adapters:

public partial class Window1 : Window
{
    private Presenter Presenter { get; set; }

    public Window1()
    {
        InitializeComponent();

        Presenter = new Presenter();
        Presenter.Input = new AdapterIInputWindow1(this);
        Presenter.Output = new AdapterIOutputWindow1(this);
    }


    public void XamlAction(object sender, EventArgs args)
    {
        Presenter.PerformDoubling();
    }
    ...
}

Adapting agile techniques to open source and personal projects


Agile techniques are a great way to build applications in a flexible manner – allowing requirements to change during the lifetime of a project and delivering frequent working releases. Most agile methods are geared towards corporate development environments – where all contributors are available for face-to-face or instant messaging-based communication and each can guarantee a consistent and high level of contribution.

I’ve recently been looking at how agile techniques can be adapted to the benefit of both open source collaborations and personal projects.

Some key benefits of agile are:

  • Sustained engagement of the stakeholders – by allow the stakeholders both visibility into the project, frequent working releases and the ability to change things if requirements change.
  • Frequently available working releases – by time-boxing activities and test-driven development methods.
  • A high level of confidence in the delivered software – by test-driven development methods, a large set of automated unit tests and pair programming.
  • An engaged, productive development team – by frequent short meetings, pair programming and repeated complete delivery cycles.

There are a number of obstacles that open source and personal projects put in the way of using pure corporate agile techniques:

  • A highly distributed team – it is unlikely to be able to get people together in person, or even online at a scheduled time.
  • Lumpy time commitments – these projects are typically background work for the contributors who cannot commit a specific amount of time during any given period.
  • Contributor churn – existing contributors may disappear from the project at any given point, and new contributors may join.
  • A variable level / patchwork coverage of skills – dependent on the team of contributors, the areas that require effort may have no skills coverage or a variable level of skills coverage.
  • Loose or non-existent chain of command – due to the community nature of open source projects and potential churn of original project members there may be no formal day-to-day direction of the project.

Using collaborations and interactions

The distributed nature of the contributors to an open source project means that in most cases it is not feasible to have the frequent face-to-face meetings of standard agile implementations.

An adaptation of the methodology is needed and one option is to use online collaborations (shared documents / spreadsheets / etc.) and interactions to connect contributors.

The methodology is focused on scope-boxing the changes to the project (rather than time-boxing where a consistent level of effort can be sustained.) This means that each contributor should ensure that the project remains in a releasable state (does not degrade) with every contribution. Automated unit-tests (preferably integrated with the IDE build environment) should be created for each contribution to enable future contributors to ensure the same.

The methodology also aims to be as lightweight as possible – so that contributors can focus on progressing the project.

Collaborations

  • The project backlog – Mimicking the product backlog of Scrum, the project backlog contains the set of stories that define the future modifications considered or targeted for the project. Stories can include: user stories – the raw features and interactions for end users, design stories – object model creation, code-internal refactorings and migration of code to use a given pattern, bug stories – the tracking list of bugs in the project.
    Any contributor can add any type of story to the project backlog. The priority of each story is maintained in the backlog and can be set either by those directing the effort or by a summation of contributor votes.
  • Personal treadmills – Each contributor maintains a community-visible personal treadmill – which contains the current story they are working on along with the history of stories they have previously completed. When the user claims a story and puts it on their treadmill the fact that it has been claimed is registered within the project backlog.
  • The delivery log – As each story is completed and submitted to the code repository it is removed from the project backlog and added to the appropriate section of the delivery log. This contains the set of features and changes that have occurred on the project. Sections within the delivery log include features (for user stories), designs (for design stories) and fixes (for bug stories.)

Interactions

  • Adding a story – Any contributor can add a story to the project backlog. The story is tagged with the type of story it is (user, design, etc.) and contains a description of the desired state of the application once the story is completed. Stories should be as atomic as possible such that they lead to treadmill work with a short duration.
  • Prioritizing stories – A community-wide method of prioritizing stories is to allow each contributor a number of movable votes (e.g. 5) which they can then apply to separate outstanding stories in the project backlog. The stories are then prioritized by the count of votes for each story. As stories get completed contributor votes get freed up to be applied to other outstanding stories in the backlog.
  • Contribution – A contributor claims an outstanding story and adds it to their treadmill. The story is marked as claimed in the project backlog and tagged with the date for the claim. Each contribution should be delivered such that a releasable project is available at the end of the contribution. The use of test-driven techniques strongly supports the continual verification of existing functionality as contributors add stories.
  • Release phase lockout – If needed a release phase lockout can be performed on the project backlog. In the run up to a release a line can be drawn under the set of user stories targeted for the release. Only bug stories are allowed to be added above the line. Contributors working on the release branch should focus on completing stories above the line.
  • Exceptions – Any stories that have been claimed by a contributor who has subsequently become inactive will be available for claim by other contributors. An appropriate timescale is agreed by the community (e.g. a week) before a story can be claimed by another contributor. Any stories on the backlog that are persistently being left unclaimed can have their priority increased.

The same methodology easily scales down to personal projects.

Use explicit marshalling to update a WPF UI from a non-UI thread


One option for updating a WPF UI from a non-UI thread (including a background worker) is to perform explicit marshalling using the dispatcher. A simple example follows.

A separate blog entry details how to update a UI using a background worker’s implicit marshalling.

Lets assume there is a C# window mediator class that has a reference to a pair of WPF controls – one for user input and one for user reporting. The WPF window constructs the mediator and sets the two control properties during its construction. Two functions provide access to the data and may be called from any thread:

public partial class MainWindow : Window
{
    private WindowMediator m_mediator = null;

    public MainWindow()
    {
        InitializeComponent();
   
        m_mediator = new WindowMediator();

        // Controls declared in the window's XAML
        m_mediator.IncomingDataControl = m_xamlTextBox;
        m_mediator.ReportDataControl = m_xamlTextBlock;
        ...
    }
    ...
}

public class WindowMediator
{
    // Controls.  A TextBox to retrieve data and a TextBlock to report data
    public TextBox IncomingDataControl { private get; set; }
    public TextBlock ReportDataControl { private get; set; }

    // Access functions to retrieve and set data (also see below)
    public String GetIncomingData(bool reformat) { ... }
    public void SetReportData(String newReport) { ... }
}

When updating the values of a WPF control, the code needs to be executed on the UI thread – i.e. the thread that owns the WPF control. The control’s dispatcher provides a function CheckAccess (which is the equivalent of the Windows Forms property InvokeRequired) to determine whether the call is currently executing on the UI thread.

If not – the Invoke method of the dispatcher can be used to execute a delegate on the appropriate thread. The Action framework class can be used to generate a Delegate from the current method (or from an anonymous method) and pass the parameters across:

public void SetReportData(String newReport)
{
    if (!ReportDataControl.Dispatcher.CheckAccess())
    {
       // Switch threads and recurse
       ReportDataControl.Dispatcher.Invoke(
          System.Windows.Threading.DispatcherPriority.Normal,
          new Action<String>(SetReportData), newReport);
    }
    else
    {
        ReportDataControl.Text = newReport;
    }
}

A similar method can be used to retrieve data from a WPF control. The generic framework class Func can be used to add a return type:

public String GetIncomingData(bool reformat)
{
    String result = "";

    if (!IncomingDataControl.Dispatcher.CheckAccess())
    {
       // Switch threads and recurse
       result = (String) IncomingDataControl.Dispatcher.Invoke(
          System.Windows.Threading.DispatcherPriority.Normal,
          new Func<bool, String>(GetIncomingData), reformat);
    }
    else
    {
        if (reformat)
        {
             result = "--" + IncomingDataControl.Text;
        }
        else
        {
             result = IncomingDataControl.Text;
        }
    }

    return result;
}

Garbage collection, generations and the large object heap


The .Net CLR maintains a managed heap used to dynamically allocate and garbage collect objects. This heap is divided into two address spaces – one used by a generational garbage collector for small objects and a second used for large objects.

The three generations

The first address space (sometimes called the small object heap – SOH) holds the three GC generations (0, 1 and 2) and is used for small objects (less than 85,000 bytes in size.) Each generation has a memory budget which can change over the lifetime of the application and is used to trigger collection of that generation.

The garbage collector will collect objects in generation 0 when its memory budget is exceeded. Any survivors of a GC on generation 0 are promoted to generation 1 (any non-survivors have their memory reclaimed.) Each generation is a contiguous address space – so promotion to generation 1 includes moving the objects into the address space allocated to generation 1 – compacting the memory used.

The GC will continue collecting generation 0 until the memory budget for generation 1 also gets exceeded. Once this occurs both generation 0 and generation 1 are collected. Any survivors of the generation 1 collection are promoted to generation 2. Generation 2 contains the oldest / longest lived objects. Again promotion includes compacting the memory by moving the objects into the address space allocated for generation 2.

This process continues with frequent collections of generation 0, less frequent collections of generation 1 and infrequent collections of generation 2. As the garbage collector performs each collection it also adapts to the memory usage patterns of the application – changing the memory budget allocations for each generation to optimise performance.

The large object heap

The second address space contains the large object heap (LOH) and is used for large objects (85,000 bytes and larger.)

Objects on the LOH are considered part of generation 2 and collected with this generation. This means that short-lived large objects will only be collected either when the generational GC collects generation 2, the LOH exceeds its memory budget or the user programmatically invokes a collection of generation 2.

Due to the cost of moving large objects, the CLR does not compact the memory space for the LOH. Large objects will remain where they were originally allocated. This is an implementation artefact of the CLR GC – as such it may be changed in the future. The size boundary of objects considered large may also change so if you require a static memory location for an object you will need to pin it.