The Reality of Delivering Big Data

Like most things I am faced with, I like to distill reality into as few workable blocks as possible. Some people may challenge me on trying to oversimplify the complex. I take the approach that complexity can be distilled into many simpler elements. By doing so, the solution can be easier to manage. It also makes it easier to communicate to stakeholders, and Big Data is no exception. Engineering diagrams are a must for many team members but not necessarily for all stakeholders.

Recently, I have witnessed many conversations regarding the processing, analyzing, and dashboarding of what is being called real time data. I even struggle with that term, because it is essentially near time operational data. That is however not relevant for this discussion. What is relevant for this discussion is: how do you design, build, test, and implement for a complex data stream while also presenting it graphically?

Wordart that spells simple from the word complex to symbolize Big Data does not have to be complexThese questions are the crux of this simple discussion. Before you create your first draft of the schema or the architecture, you must put into perspective the problem you are being asked to solve. You must deal with the Three “V’s” of Big Data: Volume, Velocity, and Variety.

The Three “V’s” of Big Data

Volume

How much data are you going to get? How wide and deep the data is are critical elements to consider. This will impact data storage and the type of storage used. Do you need SSD drives, or will large capacity traditional drives work? This is where it becomes critical. If you do not understand the volume of the data to be dealt with, then how will you be able to store it to retrieve it?

Additionally, you need to understand how the information will be presented. The goal here is to know how the data is going to be used. This will impact indexes, schemas, and normalization levels. This is often overlooked. Organizations are forced to throw more hardware at it when a simpler and less elegant data schema could have resolved the issue. Do not let perfection get in the way of good. In this context, ‘good’ is giving the user what they asked for, rather than the perfect engineered lab experiment.

A pure data model may contain too many narrow tables, and a pure business object model may contain too many unnecessary related tables. Never lose site of the problem you are trying to solve.

If possible, and depending on how the data is coming to you, you may want to look at changes in the protocols. For example, changing the inbound process from XML to JSON can drop the payload volume by up to 40%. Other simple techniques of reducing tag label names can have profound impacts on the volume of data. Although bigger and faster pipes work, they can also lead to more expensive ongoing costs. In most cases, a simpler transfer protocol could have been implemented.

Velocity

Will the data be coming at you like a garden hose or a fire hydrant? When a problem arises in production, I always look to see if the speed of the data is causing the issue. This typically has an impact within a few areas. For example, is the inbound process architect-ed to consume the data quickly? Is the pipe big enough to handle the load? I tend to lean towards a queuing mechanism when the data is so large that a parallel process approach is needed. Others may call it buffering, but the end result is the same. Take the massive inbound load. Dump it into a working area, and then apply the complex Extraction, Transformation, and Load (ETL) against this working set. This allows for simple resets. It also provides an excellent way of keeping the network data pipes uncluttered.

If the speed is slower, but the volume is higher (like in a complex trans-actionable set) then you can optimize the ETL process to make light work of the slower, larger volume. If it is high volume and high velocity, then typically a parallel inbound process feeding a common working queue gives the most reliable and more flexible approach. This approach also allows for the abstraction of functions to be clearly defined and managed.

Variety

The reality of Extraction, Transformation, and Load (ETL) comes into play when questioning the complexity and variation of the actual data. Is the data a simple narrow collection of a few elements? Or is this a very layered, parent-child transactional dataset? How much transformation is needed? How many business rules are needed to make sense of the data to create an information set? These types of engagements, when I get called in to address, can take some time to optimize. This is especially true when it is not thoughtfully considered from the beginning. In most cases, the database teams tend to drift towards an easy approach rather than a more holistic approach. In this case, there is no right or wrong answer. This is because the overall picture has to be considered.

Use creative pragmatic approaches. Do you create functions? Do you use complex joins, indexes, or views? How are triggers being impacted? These are the types of questions that need to be addressed. One of the bigger questions you should ask is: how dynamic are the business rules? If the business rules change frequently, then you need to create an approach that allows for database control of the rules, and NOT allow developers to control the execution of the rules. If you do use the database to control the rules, then it will typically impact performance in a negative way. Like I said, it is a balancing act. Do not forget to inquire about whether the data can be cleaned at the source.

In Conclusion

The good news is that in many cases, a thoughtful design (respectful of the variety and understanding of the volume and velocity) can be achieved with much less effort, rather than trying to deal with it while in production.

Need some help with your big data project? MID-RANGE offers a full range of managed IT services and infrastructure to meet your big data needs.

Big Data does not have to be scary for big ideas are just a collection of smaller ideas.

Other Articles

Man working on laptop at office

Job Opp: I.T. Solution and Services Architect

This position works within the team responsible to provide I.T. Managed Services to our x86-based customer systems, in accordance with service…

Employees Working together at office

Job Opp: Account Executive

This position works within the team responsible to provide I.T. Managed Services to our x86-based customer systems, in accordance with service…

Technicians using laptop while analyzing server in server room

Job Opp: Technical Operations Specialist (x86)

This position works within the team responsible to provide I.T. Managed Services to our x86-based customer systems, in accordance with service…

Abstract Blue Image Data Connections and technology

Solution Decision Information Flow for Better Business Decisions

The use of a Solution Decision Information Flow can improve your business decisions. Read this article to see a real-world example of how it can…

Collaborate 2020 event

Join us at COLLABORATE 20, an event for the Oracle community

At COLLABORATE 20, IT decision makers, super users, system administrators and developers alike find practical solutions to today’s business…

Director of Sales Handshake outside with buildings

Job Opp: Director of Sales

Reporting to the Vice President Sales, the Director of Sales is responsible for direct, monitor and report on Mid-Range sales objectives and…

Technicians using laptop while analyzing server in server room

Job Opp: Technical Specialist (IBM Power, iSeries, VIOS, Storage)

This role will work within the field services team providing expertise to our customers in accordance with company policies, procedures and…

IBM Platinum Partner Trophy

Mid-Range achieved Platinum Partner Status with IBM!

This is an achievement that few businesses are able to achieve, and requires dedication and success beyond industry averages. As an IBM Platinum…

Abstract Blue Image Data Connections and technology

October 29th Cybersecurity Awareness Leadership Lessons

Danny Pehar, Cybersecurity Expert and Forbes Author, teamed up with Mid-Range to deliver cybersecurity training during a recent event in Toronto,…