Saturday, December 28, 2013

HDInsight - Hadoop on Windows Azure!!

In the past year, I’ve been working with the HDInsight team on making the Hadoop eco-system available on Windows Server and Windows Azure. We built services that enable Azure users to quickly deploy elastic Hadoop clusters (based on Hortonworks’s Hadoop Distribution Package for Windows) on Windows Azure. By harnessing the parallel processing  power of Hadoop, HDInsight clusters enable users to effectively analyze Tera bytes and Peta bytes of data stored in Azure Blob Storage.

We released installation package that enables developers to quickly and easily install a single node HDInsight cluster (HDInsight Emulator) on their dev box.

We developed open source SDK that includes PowerShell cmdlets (now integrated with Windows Azure PowerShell) and .NET APIs that make it easier to deploy, manage and run jobs against HDInsight cluster.

To help you get started, this post demonstrates how to:

  1. Deploy and interact with HDInsight cluster on Windows Azure.
  2. Install HDInsight Emulator on a local dev box.

HDInsight Service

HDInsight Service enables users to deploy Hadoop clusters on Azure. These cluster are used to analyze data stored in Azure Blob Storage. To make this work, Azure HDInsight clusters are configured to use the ASV (Azure Storage Vault) implementation of HDFS, which reads/writes/streams data via Azure Blob instead of the local file-system.

The practice of using Azure Blob instead of local storage often raises questions regarding network latency and the loss of Data Locality. Luckily, HDInsight clusters and storage accounts are deployed on Azure Q10 infrastructure that features incredibly low networking overhead. As a result, for up to 50 worker nodes, reading from Azure Blob is just as fast  as reading from the local disk.

Storing the data in Azure storage instead of on the workers local storage has many benefits. In addition to geo replication and faster writes, the most obvious gain is that the data is not attached to the cluster. This enables you to create/delete clusters without a need to migrate the data. Brad Sarsfield and Denny Lee did a great job explaining ‘Why use Blob Storage with HDInsight on Azure?’

Creating an HDInsight cluster

Once you have Windows Azure subscription, deploying HDInsight cluster is only couple of clicks away. You can use the management portal to create a storage account (will store the data to be processed by your HDInsight cluster) and HDInsight cluster that will be associated with that account.

Once your cluster is ready, it will appear under the HDInsight tab.

image

Since your new HDinsight cluster is using its associated storage account as Distributed File System - you will notice that all the files that you might be used to see in HDFS (if you used Hadoop on non cloud environment) are stored in the Blob Storage under a container with the name of your cluster.

image

Interacting with HDInsight cluster

You can use the Windows Azure PowerShell module to deploy/delete HDInsight clusters and run jobs on your cluster. Click here to install via Web Platform Installer. image

Once the installation is complete, launch the Windows Azure PowerShell window.

You can quickly authenticate using your Windows Azure Account (you can also use cert). Type: Add-AzureAccount image

The Sign in window will appear,  enter your credentials and click Continue.image

Select your subscription:

PS C:\> $subscriptionName ="Visual Studio Ultimate with MSDN"

PS C:\> Select-AzureSubscription $subscriptionName

Query for available cluster:

Once the appropriate subscription is selected, you can query for a list of your HDInsight clusters by:

PS C:\> Get-AzureHDInsightCluster

image

Run 10GB GraySort (Tera Gen/Sort/Validate) Job

Since hadoop-examples.jar comes with the HDInsight cluster, you can run any one of the jobs available in that examples package.

Follow the instructions here to run the GraySort mini benchmark that will generate 10GB of data, sort the data and validate the results. Works like a champ!

image

image

image

HDInsight Emulator

HDInsight Emulator is a single node HDInsight deployment that allows developers to develop and debug jobs on their local development box. You can install the HDInsight Emulator via Web Platform Installer from here. All the missing prerequisites will be detected and installed automatically!

image

Once the installation is complete, you will notice that all the supported Hadoop services will be running as Windows Services on your local machine.

You are good to go! Follow the instructions here to learn how to run MR/HIve/Pig jobs on your local HDInsight cluster. You will notice that HDFS is configured as the default distributed file system. You can however, change the core-site.xml to point to your Azure Blob Storage account.

Monday, January 2, 2012

Tracking Performance Degradation with Visual Studio Load Testing Framework and Cruise Control .NET

In previous posts we created a Load Test for the book store service and saw how the Load Test can be executed continuously via 'Cruise Control .NET' (CCNet),

In this post, we'll take a step forward and add a custom task to CCNet that will query the database and generate a custom xml summary that includes performance comparison with previous runs, performance counters measurements per machine (divided to logical groups) and exception details for all the tests that have failed. We will also extend the CCNET portal to present the custom content.

image

Here's a screenshot of the resulting portal:

image

The source code can be downloaded from http://bookstoreservice.codeplex.com/SourceControl/list/changesets

Getting Started!

Generating the Summary XML

Let's review the process that generates the summary xml out of the LoadTest2010 database (where the load test data is saved).

Here's the class diagram

image

We used 'LINQ to SQL' is to query data from the LoadTest2010 database.

image

Using 'Server Explorer' we can connect to the LoadTest2010 database and drag/drop the appropriate Tables/Views onto the dbml editor surface. For each Table/View – matching metadata classes and CRUD operations are automatically generated.

The auto-generated classes (LoadTest2010DataContext, LoadTestTestResult, LoadTestMessageSummary, LoadTestRun, LoadTestTransactionResult, ) are used by DataAccessService for all the interactions with the database.

image

Here's the DataAccessService class.

class DataAccessService : IDisposable
{
    private readonly LoadTest2010DataContext context;
    private readonly List<LoadTestRun> loadTestRuns;
 
    public DataAccessService(string connectionString)
    {
        context = new LoadTest2010DataContext(connectionString);
        
        loadTestRuns = context.LoadTestRuns.OrderByDescending(
            run => run.StartTime).ToList();
    }
 
    public List<LoadTestRun> LoadTestRuns
    {
        get { return loadTestRuns; }
    }
 
    internal LoadTestRun GetLastTestRun()
    {
        if (loadTestRuns.Count == 0)
        {
            throw new Exception("Cannot find LoadTestRunId in LoadTestRun table");
        }
        LoadTestRun loadTestRun = loadTestRuns[0];
        return loadTestRun;
 
    }
 
    public IEnumerable<LoadTestTestResult> GetResults(int id)
    {
        List<LoadTestTestResult> results = context.LoadTestTestResults.Where(
            result => result.LoadTestRunId == id).ToList();
 
        return results;
    }
 
    public IList<LoadTestMessageSummary> GetErrorMessages(int id)
    {
        List<LoadTestMessageSummary> summaries =
            context.LoadTestMessageSummaries.Where(
            result => 
                result.LoadTestRunId == id && 
                result.SubType == "TestError").ToList();
 
        return summaries;
    }
 
    public IEnumerable<LoadTestTransactionResult> GetTransactions(int id)
    {
        List<LoadTestTransactionResult> results = 
            context.LoadTestTransactionResults.Where(
            result => result.LoadTestRunId == id).OrderBy(
            result => result.ScenarioName).ToList();
        
        return results;
    }
 
    public IGrouping<int, LoadTestTransactionResult>[] GetPrevTransactions(int id)
    {
        // Get all the transactions for prev runs
        var transactionResults = context.LoadTestTransactionResults.Where(
            result => result.LoadTestRunId < id).ToList();
        
        List<IGrouping<int, LoadTestTransactionResult>> transactionsById = 
            transactionResults.GroupBy(
            result => result.LoadTestRunId).OrderByDescending(
            results => results.Key).ToList();
 
        return transactionsById.ToArray();
    }
 
    public void Dispose()
    {
        context.Dispose();
    }
}

Here's the program main workflow:

static void Main(string[] args)
{
    string connectionString;
    if (args.Length > 0)
    {
        connectionString = args[0];
    }
    else
    {
        connectionString = Settings.Default.LoadTest2010ConnectionString;
    }
 
    const string xmlFileName = "LoadTestSummary.xml";
 
    using (var dataAccessService = new DataAccessService(connectionString))
    {
        LoadTestRun lastTestRun = dataAccessService.GetLastTestRun();
        int loadTestRunId = lastTestRun.LoadTestRunId;
        var prevTransactions = dataAccessService.GetPrevTransactions(
            loadTestRunId);
 
        var loadTestRuns = dataAccessService.LoadTestRuns;
 
        var xmlGenerator = new XmlGenerator(lastTestRun, prevTransactions, loadTestRuns);
 
        var loadTestTestResults = 
            dataAccessService.GetResults(loadTestRunId);
        
        var testMessageSummaries = 
            dataAccessService.GetErrorMessages(loadTestRunId);
        
        var transactions = 
            dataAccessService.GetTransactions(loadTestRunId);
 
        foreach (var testResult in loadTestTestResults)
        {
            var testCaseName = testResult.TestCaseName;
            int errors = testMessageSummaries.Where(summary =>
                summary.TestCaseName == testCaseName).Count();
 
            xmlGenerator.AddResult(testResult, errors);
        }
 
        foreach (var transaction in transactions)
        {
            xmlGenerator.AddTransaction(transaction, prevTransactions);
        }
 
        var messageSummaries = testMessageSummaries.OrderBy(
            summary => summary.TestCaseName);
        
        foreach (var messageSummary in messageSummaries)
        {
            xmlGenerator.AddException(messageSummary);
        }
 
        xmlGenerator.Save(xmlFileName);
    }
}

As presented above, we used the DataAccessService to query data from the database and the XmlGenerator class to add the aggregated data to Xml.

Cruise Control .NET Configuration

We'll start by installing CCNet from here

Once the installation is complete, we need to add the appropriate project/tasks to CCNet configuration file located here: %Program Files%CruiseControl.NET\server\ccnet.config

<cruisecontrol xmlns:cb="urn:ccnet.config.builder">
 
  <project name="LoadTesting-BookStoreService">
    <!-- Run tests every 4 hours-->
    <triggers>
      <intervalTrigger
        name="continuous"
        seconds="14400"
        buildCondition="ForceBuild"
        initialSeconds="30" />
    </triggers>
 
    <workingDirectory>
      C:\CodePlex\BookStoreService\bookstoreservice
    </workingDirectory>
 
    <tasks>
      <exec>
        <executable>DeleteResults.bat</executable>
        <description>Delete previous results</description>
        <!-- Timeout after 1 minute-->
        <buildTimeoutSeconds>60</buildTimeoutSeconds>
      </exec>
 
      <msbuild>
        <executable>
          C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe
        </executable>
        <projectFile>BookStore.sln</projectFile>
        <targets>Build</targets>
        <!-- Timeout after 15 minutes -->
        <timeout>900</timeout>
      </msbuild>
 
      <!-- Tests -->
      <exec>
        <executable>
          C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\MSTest.exe
        </executable>
        <buildArgs>
          /testcontainer:Tests\BookStoreLoadTest.loadtest
        </buildArgs>
        <description>Run load tests</description>
        <!-- Timeout after 20 minutes-->
        <buildTimeoutSeconds>1200</buildTimeoutSeconds>
      </exec>
 
      <!-- Generate LoadTestSummary.xml from database -->
      <exec>
        <executable>
          ContinuousIntegration\LoadTestResultPublisher\bin\Debug\LoadTestDbToXml.exe
        </executable>
        <description>Generate summary xml from database</description>
        <!-- Timeout after 20 minutes-->
        <buildTimeoutSeconds>1200</buildTimeoutSeconds>
      </exec>
 
    </tasks>
 
    <publishers>
      <merge>
        <files>
          <!-- Add the result file to the build log -->
          <file>LoadTestSummary.xml</file>
        </files>
      </merge>
      <xmllogger />
    </publishers>
 
  </project>
</cruisecontrol>

The difference between this configuration and the configuration in the previous post, is that here we have an extra task for running the process LoadTestDbToXml.exe (explained above). This process queries the LoadTest2010 database and creates xml name LoadTestSummary.xml. In addition, instead of adding the test result trx to the build log, we will add LoadTestSummary.xml.

With the configuration above, CCNet will do the following (every 4 hours or one demand):

  1. Delete the last LoadTestSummary.xml
  2. Build the BookStore.sln
  3. Run the load test.
  4. Run LoadTestDbToXml.exe (this run will generate LoadTestSummary.xml)
  5. Add LoadTestSummary.xml to the build log (so we can present it in the portal)

Now. we need to extend the CCNet portal to present the results. We need to:

  1. Create xsl that transform the LoadTestSummary.xml to HTML
  2. Copy the xsl to '%Program Files%\CruiseControl.NET\webdashboard\xsl'
  3. Add link to the xsl to 'Program Files (x86)\CruiseControl.NET\webdashboard\dashboard.config'
  4. Restart IIS

Here's the xsl:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html"/>
 
  <xsl:template match="/">
    <xsl:apply-templates select="/cruisecontrol/build/*[local-name()='LoadTestCustomResults']" />
  </xsl:template>
 
  <xsl:template match="/cruisecontrol/build/*[local-name()='LoadTestCustomResults']">
    <p />
    <table border="1" cellSpacing="0" cellPadding="5" >
      <thead style="text-align: center;">
        <td>Name</td>
        <td>Run #</td>
        <td>Outcome</td>
        <td style="background-color: darkblue; color: white;">Duration</td>
        <td>Comment</td>
      </thead>
      <tr>
        <td>
          <xsl:value-of select="@Name"/>
        </td>
        <td>
          <xsl:value-of select="@LoadTestRunId"/>
        </td>
        <td>
          <xsl:choose>
            <xsl:when test="@Outcome ='Completed'">
              <span style="color: forestGreen; font-weight: bold;">
                <xsl:value-of select="@Outcome" />
              </span>
            </xsl:when>
            <xsl:otherwise>
              <span style="color: Red; font-weight: bold;">
                <xsl:value-of select="@Outcome" />
              </span>
            </xsl:otherwise>
          </xsl:choose>
        </td>
        <td>
          <xsl:value-of select="@Duration"/>
        </td>
        <td>
          <xsl:value-of select="@Comment"/>
        </td>
      </tr>
    </table>
 
    <h2>
      Transactions
    </h2>
    <xsl:apply-templates select="*[local-name()='Transactions']">
    </xsl:apply-templates>
 
    <h2>
      Test Results
    </h2>
    <xsl:apply-templates select="*[local-name()='Results']">
    </xsl:apply-templates>
 
    <h2>
      Exceptions
    </h2>
    <xsl:apply-templates select="*[local-name()='Exceptions']">
    </xsl:apply-templates>
 
  </xsl:template>
 
  <xsl:template match="*[local-name()='Transactions']">
    <table border="1" cellSpacing="0" cellPadding="5" >
      <thead style="text-align: center;">
        <td>Name</td>
        <td>Runs</td>
        <td>Minimum</td>
        <td style="background-color: darkblue; color: white;">Average</td>
        <td>Maximum</td>
        <td style="width: 70px">
          <xsl:value-of select="@CompTitile1"/>
        </td>
        <td style="width: 70px">
          <xsl:value-of select="@CompTitile2"/>
        </td>
        <td style="width: 70px">
          <xsl:value-of select="@CompTitile3"/>
        </td>
        <td style="width: 70px">
          <xsl:value-of select="@CompTitile4"/>
        </td>
        <td style="width: 70px">
          <xsl:value-of select="@CompTitile5"/>
        </td>
      </thead>
      <xsl:apply-templates select="./*" />
    </table>
  </xsl:template>
 
  <xsl:template match="*[local-name()='Transaction']">
    <tr>
      <td>
        <xsl:value-of select="@Name"/>
      </td>
      <td>
        <xsl:value-of select="@Runs"/>
      </td>
      <td>
        <xsl:value-of select="@Minimum"/>
      </td>
      <td>
        <xsl:value-of select="@Average"/>
      </td>
      <td>
        <xsl:value-of select="@Maximum"/>
      </td>
      <td>
        <xsl:choose>
          <xsl:when test="@CompStatus1 = 'true'">
            <span style="color: forestGreen; font-weight: bold;">
              <xsl:value-of select="@CompValue1"/>
            </span>
          </xsl:when>
          <xsl:otherwise>
            <span style="color: Red; font-weight: bold;">
              <xsl:value-of select="@CompValue1"/>
            </span>
          </xsl:otherwise>
        </xsl:choose>
      </td>
      <td>
        <xsl:choose>
          <xsl:when test="@CompStatus2 = 'true'">
            <span style="color: forestGreen; font-weight: bold;">
              <xsl:value-of select="@CompValue2"/>
            </span>
          </xsl:when>
          <xsl:otherwise>
            <span style="color: Red; font-weight: bold;">
              <xsl:value-of select="@CompValue2"/>
            </span>
          </xsl:otherwise>
        </xsl:choose>
      </td>
      <td>
        <xsl:choose>
          <xsl:when test="@CompStatus3 = 'true'">
            <span style="color: forestGreen; font-weight: bold;">
              <xsl:value-of select="@CompValue3"/>
            </span>
          </xsl:when>
          <xsl:otherwise>
            <span style="color: Red; font-weight: bold;">
              <xsl:value-of select="@CompValue3"/>
            </span>
          </xsl:otherwise>
        </xsl:choose>
      </td>
      <td>
        <xsl:choose>
          <xsl:when test="@CompStatus4 = 'true'">
            <span style="color: forestGreen; font-weight: bold;">
              <xsl:value-of select="@CompValue4"/>
            </span>
          </xsl:when>
          <xsl:otherwise>
            <span style="color: Red; font-weight: bold;">
              <xsl:value-of select="@CompValue4"/>
            </span>
          </xsl:otherwise>
        </xsl:choose>
      </td>
      <td>
        <xsl:choose>
          <xsl:when test="@CompStatus5 = 'true'">
            <span style="color: forestGreen; font-weight: bold;">
              <xsl:value-of select="@CompValue5"/>
            </span>
          </xsl:when>
          <xsl:otherwise>
            <span style="color: Red; font-weight: bold;">
              <xsl:value-of select="@CompValue5"/>
            </span>
          </xsl:otherwise>
        </xsl:choose>
      </td>
    </tr>
  </xsl:template>
 
  <xsl:template match="*[local-name()='Results']">
    <table border="1" cellSpacing="0" cellPadding="5" >
      <thead style="text-align: center;">
        <td>Name</td>
        <td>Total</td>
        <td style="background-color: fireBrick; color: white;">Failed</td>
        <td style="background-color: darkblue; color: white;">Duration</td>
      </thead>
      <xsl:apply-templates select="./*" />
    </table>
  </xsl:template>
 
  <xsl:template match="*[local-name()='Result']">
    <tr>
      <td>
        <xsl:value-of select="@TestCaseName"/>
      </td>
      <td>
        <xsl:value-of select="@Runs"/>
      </td>
      <td>
        <xsl:value-of select="@Errors"/>
      </td>
      <td>
        <xsl:value-of select="@Average"/>
      </td>
    </tr>
  </xsl:template>
 
  <xsl:template match="*[local-name()='Exceptions']">
    <table border="1" cellSpacing="0" cellPadding="5" >
      <thead style="text-align: center;">
        <td>Test Name</td>
        <td style="background-color: fireBrick; color: white;">Exception</td>
      </thead>
      <xsl:apply-templates select="./*" />
    </table>
  </xsl:template>
 
  <xsl:template match="*[local-name()='Exception']">
    <tr>
      <td>
        <xsl:value-of select="@TestCaseName"/>
      </td>
      <td colspan="4" bgcolor="#FF9900">
        <b>
          <xsl:value-of select="@MessageText"/>
        </b>
        <br />
        <xsl:value-of select="@StackTrace"/>
      </td>
    </tr>
  </xsl:template>
 
</xsl:stylesheet>
 

Here's the modified dashboard.config:

<?xml version="1.0" encoding="utf-8"?>
<dashboard>
  <remoteServices>
    <servers>
      <!-- Update this list to include all the servers you want to connect to. NB - each server name must be unique -->
      <server 
        name="local" 
        url="tcp://localhost:21234/CruiseManager.rem" 
        allowForceBuild="true" 
        allowStartStopBuild="true" 
        backwardsCompatible="false" />
    </servers>
  </remoteServices>
  <plugins>
    <farmPlugins>
      <farmReportFarmPlugin categories="false" />
      <cctrayDownloadPlugin />
      <administrationPlugin password="Pa$$word1" />
    </farmPlugins>
    <serverPlugins>
      <serverReportServerPlugin />
    </serverPlugins>
    <projectPlugins>
      <projectStatisticsPlugin xslFileName="xsl\StatisticsGraphs.xsl" />
      <projectReportProjectPlugin />
      <viewProjectStatusPlugin />
      <latestBuildReportProjectPlugin />
      <viewAllBuildsProjectPlugin />
    </projectPlugins>
    <buildPlugins>
      <buildReportBuildPlugin>
        <xslFileNames>
          <xslFile>xsl\header.xsl</xslFile>
          <xslFile>xsl\modifications.xsl</xslFile>
          <xslFile>xsl\MsTestReport2008.xsl</xslFile>
          <!-- Updated! Adding xsl for presenting load test results -->
          <xslFile>xsl\MsTestLoadReportCustom2010.xsl</xslFile>          
        </xslFileNames>
      </buildReportBuildPlugin>
      <buildLogBuildPlugin />
    </buildPlugins>
    <securityPlugins>
      <simpleSecurity />
    </securityPlugins>
  </plugins>
</dashboard>
We are done!

Now we can start the 'Cruise Control .NET' service or run the executable process from %Program Files%CruiseControl.NET\server\ccnet.exe.

Once the CCNet service/process is started, we can go ahead and trigger a build.The simplest way is to browse to the portal (installed and deployed to IIS during the installation) and force a build.

image