Editorial & Analysis
Also by this author
02 Jan 2013
New High Storage service offers fast access to large amounts of data for complex data-crunching tasks such as seismic analysis, log processing and high-end data warehousing.
Intent on helping its customers meet their big data objectives using the cloud, Amazon Web Services (AWS) has introduced a new storage package, High Storage, which offers fast access to large volumes of data.
Part of the Amazon Elastic Compute Cloud (EC2) service, High Storage is designed to run data-intensive analysis jobs - the company mentions “seismic analysis, log processing and data warehousing” - on a parallel file system architecture, so that workloads can be distributed across multiple disks in order to achieve faster throughput times.
“Customers whose applications require high sequential read and write performance over very large data sets can take advantage of the capabilities of this new Amazon EC2 instance type,” said the company in a statement. “High Storage instances are especially well-suited for customers who use Hadoop, data warehouses and parallel file systems to process and analyse large data sets in the AWS cloud.”
In other words, AWS is positioning High Storage as the complement to its Elastic MapReduce service, which provides a platform for Hadoop big data analysis.
An ‘instance’ is the term used by AWS to refer to a bundle of computing resource - such as processing, storage, memory and so on - configured to meet the needs of a particular type of workload.
A High Storage instance, for example, provides customers with 35 EC2 Compute Units (ECUs) of compute capacity, 117 gigabytes (GB) of RAM, 48 terabytes (TB) of storage, spread across 35 hard disk drives, so that disk performance doesn’t impact data transfers. This combination means that it can deliver more than 2.4 gigabytes per second (GB/s) of sequential I/O performance.
“As customers move every imaginable workload to AWS, we continue to provide them with additional instance families to meet the requirements of their applications,” said Peter De Santis, vice president of Amazon EC2. High Storage instances are the ninth Amazon EC2 instance family (others include Cluster Compute and High I/O) and they are used to power Amazon’s own cloud-based data warehousing service, Redshift, announced in early December.
Right now, the new High Storage instance family is only available in the US East Region, but the company has promised that it will be available in other AWS regions “in the coming months”. When that happens, customers will be able to launch High Storage instances using the AWS Management Console, from the EC2 or Elastic MapReduce command lines, or from the AWS SDK (software development kit) or other third-party libraries.