Quantcast
Channel: TechNet Technology News
Viewing all articles
Browse latest Browse all 13502

Azure Vault Storage in HDInsight: A Robust and Low Cost Storage Solution

$
0
0

HDInsight is trying to provide the best of two worlds in how it manages its data.

Azure Vault Storage (ASV) and the Hadoop Distributed File System (HDFS)
implemented by HDInsight on Azure are distinct file systems that are optimized,
respectively, for the storage of data and computations on that data. ASV provides
a highly scalable and available, low cost, long term, and shareable storage
option for data that is to be processed using HDInsight. The Hadoop clusters deployed
by HDInsight on HDFS are optimized for running Map/Reduce (M/R) computational
tasks on the data.

HDInsight clusters are deployed in Azure on compute nodes to execute M/R
tasks and are dropped once these tasks have been completed. Keeping the data in
the HDFS clusters after computations have been completed would be an expensive
way to store this data. ASV provides a full featured HDFS file system over
Azure Blob storage (ABS). ABS is a robust, general purpose Azure storage
solution, so storing data in ABS enables the clusters used for computation to
be safely deleted without losing user data. ASV is not only low cost. It has been
designed as an HDFS extension to provide a seamless experience to customers by
enabling the full set of components in the Hadoop ecosystem to operate directly
on the data it manages.

In the upcoming release of HDInsight on Azure, ASV will be
the default file system. In the current developer preview on www.hadooponazure.com data stored in
ASV can be accessed directly from the Interactive JavaScript Console by
prefixing the protocol scheme of the URI for the assets you are accessing with
ASV://

To use this feature in the current release, you will need
HDInsight and Windows Azure Blob Storage accounts. To access your storage
account from HDInsight, go to the Cluster and click on the Manage Cluster tile.

Click on the Set up ASV button.

 

Enter the credentials (Name and Passkey) for your Windows Azure Blob Storage account.


Then return to the Cluster and click on the Interactive Console tile to access the JavaScript console.

Now to run Hadoop wordcount job with data an ASV container name hadoop use
Hadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar wordcount asv://hadoop/ outputfile

The scheme for accessing data in ASV is asv://container/path

To see the data in asv
#cat asv://hadoop2/data

 


 
  
  
  
  
  
  
  
  
  
  
  
  
 
 


 

 

 

 


Viewing all articles
Browse latest Browse all 13502

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>