Experimenting with Hadoop MapReduce on the AWS Platform - Part I

Summary: In this article, we present a solution architecture suitable as a foundation for a Hadoop MapReduce pilot on the AWS platform. We start by considering under what circumstances an organisation may want to manage their own Hadoop MapReduce platform on EC2 instead of simply using Amazon Elastic Map Reduce …

more ...

Interactive Data Preparation

Summary: In this article we look at the burgeoning field of Interactive Data Preparation. We take a close look at what it is, why its becoming increasingly valuable, some common use cases, recent commercial activity and why industry trends will continue to fuel investment in this space.

In this article …

more ...


Data Provenance

Data provenance, or data lineage as it is otherwise known, refers to the process of recording the origin and transform history of data throughout its lifetime. In this short article, we take a look at data lineage from the perspective of data analytics, specifically with a focus on large [Big …

more ...