In this marketing tech in transition series, my team and I highlight four imperatives: Big Data, Cloud, Identity Resolution and Artificial Intelligence. and share how applying these imperatives will transform your marketing tech. To dive deeper into our Big Data journey over the last five years, I sat down with Epsilon's Enterprise Architecture leads, Prashanth Athota, Krishna Kodali and Jayendra Gurram.
Nearly every client I speak to seems to fall into two camps: 1) They have implemented a Big Data solution in some form 5+ years ago or 2) They have dabbled with Big Data architectures, but are now ready to jump "full cannonball" into a Data Lake. These two camps are obviously in different positions of maturity, but they actually face very similar challenges.
Both camps are looking for value and not just expense reduction. The first big wave of implementations of Hadoop and its cohort of technologies was justified by cost reduction. Once that was done, however, that new cost run rate became the new norm. Now if they want to justify new projects, they must articulate accretive value. The second camp never really committed to Big Data in a transformational way and are now playing catch-up. Cost reduction is assumed, but they additionally need to solve immediate and time-sensitive business problems that can only best be solved through the Big Data architectures. In other words, they have to pursue transformation to deliver value and continue to be relevant.
I wish I could somehow easily impart to them our journey. We had to start our journey specifically to address a variety of business needs across an array of market verticals. We had to build an architecture that scaled to meet marketers' needs and could support an ever-changing array of consumer data. I sat down with Prashanth Athota and his team and we reminisced about the past five years. A few highlights are included here.
The first market reality for us was that we were going to be servicing an array of user personas and, in general, there are cognitive limitations in going from SQL-like thinking and operating to noSQL techniques. Overcoming this requires an API-driven approach that abstracts the data access layer coupled with user experience design that is carefully planned according to the business operations of the marketer. Yes, there will be sophisticated users that can handle the underlying access layers natively, but generally we can optimize most of their data access requirements for them through our data management and API layers.
The underlying data management layer required intense forensics, trial and adaptation to get the proper performance and economies. One of the main functions we were providing was the capability to ingest and dice data in a variety of ways. This was not going to be accomplished by one data management design pattern. It was not just about throwing a bunch of data on top of Hadoop. That's the mistake many early stage Big Data projects face.
True, traditional Relational Database Management Systems (RDBMS) face economic issues to scale, but they also have their place and should not be immediately discarded during your Big Data initiative. To support the mission properly, you should get comfortable that you're going to have a heterogeneous environment. You'll have a big problem with row-level updates on HDFS. HBase can handle those, but you aren't going to be doing more than sequential scans there. Impala can be used for interaction with the UI, but work will actually get passed off to Hive to handle the underlying jobs. Cassandra can effectively manage the real-time events and aggregations. That trusty RDBMS you've had for data warehousing might still be best place to manage structured data needed for traditional marketing tools for business intelligence and marketing automation.
We also had to tackle the extremes represented by managing massive amounts of data at rest that was being actively used by thousands of users, while also dealing with billions of data-signals-in-motion in real-time. Having a robust EMS was critical. The legacy software used to manage this was limited to a single node license based approach, where to scale meant we couldn't just spin up a node automatically without licensing it. Economically, that just didn't work. We needed elasticity. Moving to Kafka has eliminated this constraint. With a combination of Kafka and Memcached, we can maintain our thread pool and scale our event topics to handle volatility and variety of data and events at a scale that aligns to our business growth.
Finally, for technical architects that are trying to create value by supporting your sales, marketing, analytic and customer experience teams, you might find yourself lacking good case studies to pull from. For example, our users can create tens of thousands of tables in HBase. HBase has a limitation of 200 per node. We had to figure out how to manage that. There weren’t (and aren’t) published cases for this.
Also, when should you use SSD and how do you optimize its use for economic reasons? You'll additionally find bugs or limitations in the base code of the different data management components. You'll need people that can rip the code open and pinpoint a recommendation or correct a flaw. You'll make that contribution yourself or you'll ask your distributor to do so. In the end, though, this is all worth it. You'll be transforming the way you serve your internal constituents and will be setting yourself up to deliver continuous value.
Don't be sucked into the notion that it's a one-size-fits-all approach. Learn and adapt. Be prepared for implementing a variety of techniques. Don't be afraid to contribute your learning and lean on others and apply their experience.
Tomorrow I sit down with other members of the team to discuss our adaption to the Cloud.