Before you start planning for your big data requirements and asking for a big data budget, make sure what you think is big data isn’t really just big problems. That’s my best advise from a more enlightened perspective on how organizations are responding to processing problems in their larger information systems and data warehouse applications.
What is Big Data
Previously I tried to put some definition to what is big data. I tried to mash-up the definitions from the big data players while leaving myself an out by including a catch all definition of big data.
Today, I am wondering if I didn’t have it backwards. The catch all definition was:
“Any data that is too large to process may be enough to consider it big data”
This interpretation of the big data definition may be what is actually behind so many of the big data initiatives, especially the ones that are experiencing big problems.
Big Data ODS or EDW
If you accept the catch all definition of big data then you might imagine that a 2TB ODS or EDW could be classified as being big data. You might even imagine how you could hit a processing limitation at much less than 2TB.
For those of you in higher education that would mean even a small or mid-sized institution might legitimately claim they have big data by building a data warehouse to support a learning analytics solution. Many might conclude they have big data simply by implementing the ODS or EDW solution for the SIS or ERP platform.
Since my experience with higher education ODS and EDW products is limited to the Ellucian Banner ODS EDW, take it with a grain of salt when I say I understand why some institutions implementing an ODS or EDW feel like they have a big data problem because they hit processing limitations at even modest volumes, velocity and variety of data.
Big problems come in all shapes and sizes. They can be singular source problems or ones of multiple variables and factors. Problems are problems and are as varied as you might imagine.
- Sometimes an initiative comes up short in delivering the expected value.
- Sometimes an IT department is missing on service delivery expectations.
- Sometimes something is simply stuck and needs a firm nudge.
- Sometimes things are just broken, as in not working, and have to be fixed.
People, Process and Technology
In any one of these situations there is always a mix of factors that can easily be fit into the People, Process, and Technology categories.
If you are a more pragmatic person, you might look a little deeper to understand if your technology factors are really people or process factors. You may even want to consider if your process factors are people factors which can be compounded by technology factors.
The point of taking that perspective is that someone made a decision about the technology and how to implement and support it. There are processes that help with those decisions and their accomplishment which are also dependent on people.
Root Cause of Big Data
So when you have a big data due to big problems in your technology stack or in your IT processes consider what the root causes to them are – then address them while remembering it may not be your data that’s the problem. Consider:
- You can have configuration errors at every level of the stack.
- You can have a resource capacity constraint.
- You can have bad code, bad SQL, and even bad process.
- And you can have bad processes that should prevent or detect every one of these.
In all likelihood, you probably have a little bit of everything going on making your ODS or EDW seem like you have a big data challenge.
I suppose I should go a little further and challenge you to shed the modern transformational leadership mumbo-jumbo and realize that sometimes you simply have to address the people issues that are creating or compounding the process and technology issues. I know this is easier said than done. But that doesn’t mean you get a pass on it.
In any event, because a big data – big problem has so many factors, you will simply have to take a very disciplined approach to systematically assessing the entire environment and remediating any weaknesses or issues. Be prepared for an ongoing iterative process due to the dynamic nature of troubleshooting complex problems.