Summary: In this short article we look at why Hadoop MapReduce is considered a necessary foundation of an agile Enterprise Data Analytics platform.
In this short article we look at why Hadoop MapReduce is considered a necessary foundation of an agile Enterprise Data Analytics platform.
Earlier this week I spoke with an Enterprise data analytics platform manager within a financial services organisation. He shared some of the operational challenges that his team faces and expressed a strong desire for the data analytics platform to provide the same levels of agility that the business has come to expect from development teams. Agile software development has been a focus of software developers since it was first introduced as part of the Agile Manifesto in 2001. An agile development process is based on iterative and incremental development, whereby the solution and requirements evolve throughout the lifetime of a software development project. The agile development model replaced the earlier waterfall software development model, a model that relies on detailed requirements to be known upfront and development flows on through a series of well defined stages. An agile development model brings a number of advantages including accelerating delivery of business value, continuous realignment with evolving business needs, increased and more accurate visibility to the business, improved quality and reducing the risk of failure.
Let us consider how agility has in part fuelled the adoption of NoSQL databases. NoSQL databases have two main advantages over traditional Relational DataBase Management Systems (RDBMS) ; they are far easier to scale and do not require a schema to be associated with each table before data is stored. RDBMS are and will continue to be very important for a number of reasons - refer to "Why do we need relational database management?" systems for a good summary. Not withstanding the limitations of performing higher order functions on data stored within a NoSQL database, they do provide a great deal of agility. For example, if during an agile development process, the business requirements evolve and result in particular table needing to store additional fields, then in a NoSQL database, there is no need to modify a schema because it does not exist.
So how does Hadoop MapReduce provide agility within an Enterprise data analytics platform? The first thing to consider is the platform will be used by a number of different internal customers. In a large Enterprise organisation, a number of data products or services are often built on top of the data analytics platform. While some of them may be well understood during the planning and architecture stage, the majority will exist at some point in the future. How can you possibly design a platform that will support higher order services that you as yet have no requirements for? This plays to Hadoop MapReduce's strength as the Swiss Army knife of the distributed computing world. While it may not be the best tool for any one purpose, it does everything good enough. This tradeoff between agility vs performance allows data platform teams to remain nimble and flexible in supporting the business while doing so in a cost efficient and performant manner.