The Reality of Delivering Big Data

Like most things I am faced with, I like to distill reality into as few workable blocks as possible. Some people may challenge me on trying to oversimplify the complex. I take the approach that complexity can be distilled into many simpler elements. By doing so, the solution can be easier to manage. It also makes it easier to communicate to stakeholders, and Big Data is no exception. Engineering diagrams are a must for many team members but not necessarily for all stakeholders.

Recently, I have witnessed many conversations regarding the processing, analyzing, and dashboarding of what is being called real time data. I even struggle with that term, because it is essentially near time operational data. That is however not relevant for this discussion. What is relevant for this discussion is: how do you design, build, test, and implement for a complex data stream while also presenting it graphically?

Wordart that spells simple from the word complex to symbolize Big Data does not have to be complexThese questions are the crux of this simple discussion. Before you create your first draft of the schema or the architecture, you must put into perspective the problem you are being asked to solve. You must deal with the Three “V’s” of Big Data: Volume, Velocity, and Variety.

The Three “V’s” of Big Data

Volume

How much data are you going to get? How wide and deep the data is are critical elements to consider. This will impact data storage and the type of storage used. Do you need SSD drives, or will large capacity traditional drives work? This is where it becomes critical. If you do not understand the volume of the data to be dealt with, then how will you be able to store it to retrieve it?

Additionally, you need to understand how the information will be presented. The goal here is to know how the data is going to be used. This will impact indexes, schemas, and normalization levels. This is often overlooked. Organizations are forced to throw more hardware at it when a simpler and less elegant data schema could have resolved the issue. Do not let perfection get in the way of good. In this context, ‘good’ is giving the user what they asked for, rather than the perfect engineered lab experiment.

A pure data model may contain too many narrow tables, and a pure business object model may contain too many unnecessary related tables. Never lose site of the problem you are trying to solve.

If possible, and depending on how the data is coming to you, you may want to look at changes in the protocols. For example, changing the inbound process from XML to JSON can drop the payload volume by up to 40%. Other simple techniques of reducing tag label names can have profound impacts on the volume of data. Although bigger and faster pipes work, they can also lead to more expensive ongoing costs. In most cases, a simpler transfer protocol could have been implemented.

Velocity

Will the data be coming at you like a garden hose or a fire hydrant? When a problem arises in production, I always look to see if the speed of the data is causing the issue. This typically has an impact within a few areas. For example, is the inbound process architect-ed to consume the data quickly? Is the pipe big enough to handle the load? I tend to lean towards a queuing mechanism when the data is so large that a parallel process approach is needed. Others may call it buffering, but the end result is the same. Take the massive inbound load. Dump it into a working area, and then apply the complex Extraction, Transformation, and Load (ETL) against this working set. This allows for simple resets. It also provides an excellent way of keeping the network data pipes uncluttered.

If the speed is slower, but the volume is higher (like in a complex trans-actionable set) then you can optimize the ETL process to make light work of the slower, larger volume. If it is high volume and high velocity, then typically a parallel inbound process feeding a common working queue gives the most reliable and more flexible approach. This approach also allows for the abstraction of functions to be clearly defined and managed.

Variety

The reality of Extraction, Transformation, and Load (ETL) comes into play when questioning the complexity and variation of the actual data. Is the data a simple narrow collection of a few elements? Or is this a very layered, parent-child transactional dataset? How much transformation is needed? How many business rules are needed to make sense of the data to create an information set? These types of engagements, when I get called in to address, can take some time to optimize. This is especially true when it is not thoughtfully considered from the beginning. In most cases, the database teams tend to drift towards an easy approach rather than a more holistic approach. In this case, there is no right or wrong answer. This is because the overall picture has to be considered.

Use creative pragmatic approaches. Do you create functions? Do you use complex joins, indexes, or views? How are triggers being impacted? These are the types of questions that need to be addressed. One of the bigger questions you should ask is: how dynamic are the business rules? If the business rules change frequently, then you need to create an approach that allows for database control of the rules, and NOT allow developers to control the execution of the rules. If you do use the database to control the rules, then it will typically impact performance in a negative way. Like I said, it is a balancing act. Do not forget to inquire about whether the data can be cleaned at the source.

In Conclusion

The good news is that in many cases, a thoughtful design (respectful of the variety and understanding of the volume and velocity) can be achieved with much less effort, rather than trying to deal with it while in production.

Need some help with your big data project? MID-RANGE offers a full range of managed IT services and infrastructure to meet your big data needs.

Big Data does not have to be scary for big ideas are just a collection of smaller ideas.

Tim Lalonde

Tim Lalonde is the is the Director Of Business Development at Mid-Range. He works with leading-edge companies to be more competitive and effective in their industries. He specializes in developing business roadmaps leveraging technology that create and support change from within – with a focus on business process re-engineering, architecture and design, business case development and problem-solving.

With over 30 years of experience in IT, Tim’s guiding principle remains simple: See a problem, fix a problem.

Other Articles

Responding to a disaster using a framework

Before you create your first draft of the schema or the architecture you must put into perspective this part of the problem you are being asked to…

The Importance Of Cyber Insurance

I’ve been in the cybersecurity industry for pretty much my entire adult life; it’s the only career I’ve ever had. And yet one of my favorite…

Why Cybersecurity Is So Complicated

When you think of problems people had with cell phones back in the 80s, whether you experienced them firsthand or you’ve seen clips from movies, you…

Top Takeaways from ESG’s White Paper on Multicloud Storage Environments - Person using a tablet

Top Takeaways from ESG’s White Paper on Multicloud Storage Environments

Research shows that when an organization has taken the necessary steps to reach Storage Maturity, they have a competitive advantage due to their…

Home Cyber Hygiene Checklist

Our homes are becoming smarter and, as cool as that is, what’s even cooler are people who are as smart as their smart homes.

A New Opportunity for Cyber Attacks

There's a cybersecurity threat so troublesome that not even some of the best cybersecurity technology can stop it. It's called psychology, and it's…

An important message to our customers & suppliers

At Mid-Range, the health of our employees, customers, partners and suppliers is our top priority. As we face the COVID-19 crisis, we would like to…

7 Cybersecurity Tips

For today’s enterprise, the question is not whether you will be attacked. It’s when, by what, and how badly your company’s reputation or finances…

Cybersecurity Professionals

The Cybercrime Equation

Learn the best ways to keep your systems and your data safe from ransomware attacks, with advice from Danny Pehar, Bestselling Cybersecurity Author…