The Reality of Delivering Big Data

Like most things I am faced with, I like to distill reality into as few workable blocks as possible. Some people may challenge me on trying to oversimplify the complex. I take the approach that complexity can be distilled into many simpler elements. By doing so, the solution can be easier to manage. It also makes it easier to communicate to stakeholders, and Big Data is no exception. Engineering diagrams are a must for many team members but not necessarily for all stakeholders.

Recently, I have witnessed many conversations regarding the processing, analyzing, and dashboarding of what is being called real time data. I even struggle with that term, because it is essentially near time operational data. That is however not relevant for this discussion. What is relevant for this discussion is: how do you design, build, test, and implement for a complex data stream while also presenting it graphically?

Wordart that spells simple from the word complex to symbolize Big Data does not have to be complexThese questions are the crux of this simple discussion. Before you create your first draft of the schema or the architecture, you must put into perspective the problem you are being asked to solve. You must deal with the Three “V’s” of Big Data: Volume, Velocity, and Variety.

The Three “V’s” of Big Data

Volume

How much data are you going to get? How wide and deep the data is are critical elements to consider. This will impact data storage and the type of storage used. Do you need SSD drives, or will large capacity traditional drives work? This is where it becomes critical. If you do not understand the volume of the data to be dealt with, then how will you be able to store it to retrieve it?

Additionally, you need to understand how the information will be presented. The goal here is to know how the data is going to be used. This will impact indexes, schemas, and normalization levels. This is often overlooked. Organizations are forced to throw more hardware at it when a simpler and less elegant data schema could have resolved the issue. Do not let perfection get in the way of good. In this context, ‘good’ is giving the user what they asked for, rather than the perfect engineered lab experiment.

A pure data model may contain too many narrow tables, and a pure business object model may contain too many unnecessary related tables. Never lose site of the problem you are trying to solve.

If possible, and depending on how the data is coming to you, you may want to look at changes in the protocols. For example, changing the inbound process from XML to JSON can drop the payload volume by up to 40%. Other simple techniques of reducing tag label names can have profound impacts on the volume of data. Although bigger and faster pipes work, they can also lead to more expensive ongoing costs. In most cases, a simpler transfer protocol could have been implemented.

Velocity

Will the data be coming at you like a garden hose or a fire hydrant? When a problem arises in production, I always look to see if the speed of the data is causing the issue. This typically has an impact within a few areas. For example, is the inbound process architect-ed to consume the data quickly? Is the pipe big enough to handle the load? I tend to lean towards a queuing mechanism when the data is so large that a parallel process approach is needed. Others may call it buffering, but the end result is the same. Take the massive inbound load. Dump it into a working area, and then apply the complex Extraction, Transformation, and Load (ETL) against this working set. This allows for simple resets. It also provides an excellent way of keeping the network data pipes uncluttered.

If the speed is slower, but the volume is higher (like in a complex trans-actionable set) then you can optimize the ETL process to make light work of the slower, larger volume. If it is high volume and high velocity, then typically a parallel inbound process feeding a common working queue gives the most reliable and more flexible approach. This approach also allows for the abstraction of functions to be clearly defined and managed.

Variety

The reality of Extraction, Transformation, and Load (ETL) comes into play when questioning the complexity and variation of the actual data. Is the data a simple narrow collection of a few elements? Or is this a very layered, parent-child transactional dataset? How much transformation is needed? How many business rules are needed to make sense of the data to create an information set? These types of engagements, when I get called in to address, can take some time to optimize. This is especially true when it is not thoughtfully considered from the beginning. In most cases, the database teams tend to drift towards an easy approach rather than a more holistic approach. In this case, there is no right or wrong answer. This is because the overall picture has to be considered.

Use creative pragmatic approaches. Do you create functions? Do you use complex joins, indexes, or views? How are triggers being impacted? These are the types of questions that need to be addressed. One of the bigger questions you should ask is: how dynamic are the business rules? If the business rules change frequently, then you need to create an approach that allows for database control of the rules, and NOT allow developers to control the execution of the rules. If you do use the database to control the rules, then it will typically impact performance in a negative way. Like I said, it is a balancing act. Do not forget to inquire about whether the data can be cleaned at the source.

In Conclusion

The good news is that in many cases, a thoughtful design (respectful of the variety and understanding of the volume and velocity) can be achieved with much less effort, rather than trying to deal with it while in production.

Need some help with your big data project? MID-RANGE offers a full range of managed IT services and infrastructure to meet your big data needs.

Big Data does not have to be scary for big ideas are just a collection of smaller ideas.

Tim Lalonde

Tim Lalonde is the VP of Technical Operations at Mid-Range. He works with leading-edge companies to be more competitive and effective in their industries. He specializes in developing business roadmaps leveraging technology that create and support change from within — with a focus on business process re-engineering, architecture and design, business case development and problem-solving.

With over 30 years of experience in IT, Tim’s guiding principle remains simple: See a problem, fix a problem.

Other Articles

https://www.midrange.ca/key-reasons-veeam-o365/, 7 Key Reasons Why You Need Veeam Backup for Microsoft Office 365

7 Key Reasons Why You Need Veeam Backup for Microsoft Office 365

Discover the essential reasons why Veeam Backup for Office 365 is crucial for your business. Protect against accidental deletions, internal and…

IBM i, The Looming Skill Gap: How IBM i Users Face a Retiring Workforce and a Talent Drought

The Looming Skill Gap: How IBM i Users Face a Retiring Workforce and a Talent Drought

Debunking common myths about cloud migration to unlock its potential. Simplified processes, robust security, cost savings, and performance…

Information Flow, Holiday Message from the Mid-Range Team 2023

Holiday Message from the Mid-Range Team 2023

Holiday message to our customers and partners for this holiday season.

Information Flow, Holiday Message from the Mid-Range Team

Holiday Message from the Mid-Range Team

Holiday message to our customers and partners for this holiday season.

, Maintaining Data Center Hygiene: Our Comprehensive Approach at Mid-Range

Maintaining Data Center Hygiene: Our Comprehensive Approach at Mid-Range

In the fast-paced and interconnected digital landscape, data centers have become the nerve centers that power our modern world. These facilities…

Power Cloud, Debunking the Top 5 Myths about Migrating to the Cloud: Insights for IT Decision Makers

Debunking the Top 5 Myths about Migrating to the Cloud: Insights for IT Decision Makers

Debunking common myths about cloud migration to unlock its potential. Simplified processes, robust security, cost savings, and performance…

Immutable Backup, Why Your Business Needs Immutable Backups Today

Why Your Business Needs Immutable Backups Today

Immutable backups are an essential tool for any organization that wants to ensure the integrity and recoverability of its critical data. IBM…

Job, Job Opp: Technical Specialist (IBM i)

Job Opp: Technical Specialist (IBM i)

Deliver technical support and services to Mid-Range Managed Services customer systems in accordance with their contracted agreement.  This position…