Basics: a glimpse into Big Data Analytics
In the last instalment of our “BigData” buffet, we served a mouthful portion of definition margins and generic challenges. We’ve introduced the most important “V”s of BigData stripping them down to the core, just so you’d get a 360° perspective of data processing. We covered the Hows, tiptoed around some Whys and there’s more coming about Whats. Today we have decided to focus a bit more on the “Why” angle of this data processing habit, in an attempt to outline the output of your efforts. More specifically, we’ll stop at BigData Analytics, their essence, purpose and variety, that fuel predictive business decisions.
Connecting the dots
As we’ve already mentioned, BigData is not large, but there is a ton of it. As a concept, it’s the combination of structured, semi-structured and unstructured data collected for forms and organizations’ projects, predictive modelling and other analytics uses.
Big data can stream from business transaction systems, customer databases, clickstream, logs, social networks, medical records, and many other sources. There’s no numerical standard to define big, but BigData is historically defined by three “V”s: Volume, typically referring to terabytes, petabytes and exabytes; Variety of data types; and the Velocity at which the data is generated, processed and collected. IBM introduced new “V”s that outline the Veracity for our trustworthiness of data, the Variability of ways businesses can use and format data and the business Value analytics deliver. Some people include even more “V”s – but that’s rather off-point today.
Businesses use BigData for all manners of analysis and when used correctly, it helps companies improve operations, enhance customer service, create personalized marketing campaigns, and increase profitability in real life. BigData helps medical researchers identify disease risk factors and diagnose illnesses, enables energy companies to better identify drilling and oil locations, and aids governments with emergency response and crime prevention.
Looking past the opportunities, BigData presents challenges like price storage capacity, data security, availability, quality assurance, compliance issues, and skills required to create, deploy and manage big data systems. We’ll take a closer look at those in some of our future buffet sessions, for now, let’s focus on the juicy bits, that power efficient and predictive decision making: introducing BigData Analytics.
Defining BigData Analytics
To define this one, we’ll have to take a few steps back and reach those definition bars we’ve screened a while bach on our blog: in essence, this field would be your liver that is trying to make sense of any alcohol you ingest. The harder the liquor, the more complex all those analytical processes are.
In this case, “the liver” is tasked with making sense and representing all those streams of data, in a business-friendly way, outlining a cause-effect pattern that powers predictive results. If you have a client spending more than 4 minutes on your product page (on average), you can predict an easy onboarding path. If you’re a healthcare worker and the average body temperature on the floor is above 37° Celsius – you can assume there’s an infection going on and isolation measures could be required, if you’re a stockbroker and one of your paired assets starts losing value: it’s time to sell.
What we’re trying to say here, is that BDA results in complex procedures of examining data streams to identify valuable business input; some micro-strategy metrics that power your customer retention; hidden patterns that point at a predictive financial loss; or specific market trends that influence audience behaviour. Shortly put, BDA is organising structured, unstructured or data lacking a meta-model, into a collection of metrics that can help any organisation run a predictive environment and take informed business decisions.
Big data analytics is a form of advanced analytics, which involve complex applications with elements such as predictive models, statistical algorithms and what-if analysis powered by analytics systems.
In essence, this “generally accepted” definition encapsulates the entire “why” of all this BigData processing adventure. Without it, there would be no purpose for eating up all that virtual ram and storage within those fluffy public clouds.
BigData Analytics growth and evolution
If we look ad BDA in the past few decades, we’ll find that most of the Business Intelligence (BI) approaches were somewhat traditional, focusing on highly organised data into multiple warehouses, across Business Units. Just to prove this has been around since the early 80s’, try to remember what was Chandler Bing’s job? Sutch titles were somewhat novel, so it is no wonder any of his “Friends” (from the analogically named sitcom) could not remember what his job was. Chandler was a Statistical Analysis and Data Reconfiguration Officer, and at some point, he cared dearly about his WENUS (Weekly Estimated Net Usage Systems). Despite his concern, back then, BDA was defined by highly integrated systems, designed and optimized for handling big data queries in a very efficient manner, hence the ease of getting the right analytics within a reasonable timeframe.
Everything started to change around the 2000s when all of us began to identify as Netizens (citizens of the internet) transforming data processing as well. Analytics spread across multiple warehouses with different meta-models; its variety increased faster than a doughnuts’ source and our Instant Messaging got populated with emotion symbols that brought ambiguity to every augmented piece of information.
Previous software approaches were simply not enough and a new way of processing emerged. That’s when Apache Hadoop (a collection of open-source software utilities) came to rescue. In addition to its’ “integrated systems” approach, this BDA kit brought more openness to the table.
More openness, in terms of the type of data that it could handle, the emergence of new data types and habits such as Bring Your Own Data (BYOD). More openness in terms of generating various derivated types of analytics, delivering analytics libraries, and languages that can be easily supported, learned and distributed across teams. More openness and flexibility in terms of the hardware, enriching deployment options by enabling the possibility of bringing your custom gear or even usage of heterogeneous configurations, setting everyone free to use whatever fits them best.
Fast-Forward to our time and observe yet another metamorphosis in regards to BigData Analytics. Welcome to the Cloud era! Don’t set your hopes too high – there’s nothing fluffy about this; though you might have to deal with some “fuzzy” streams here.
Naturally, Hadoop came to be (partially or sometimes entirely) replaced by another trend: BDA powered by cloud-enabled, serverless, functions as a service. If we should think about it for a while, we could align this trend to our own behaviour.
Our consumption habits are now focused on a sharing economy. We use rideshares instead of car rentals, we book an Airbnb to sleep for an overnight trip instead of calling in for hotel reservations and we’re using UberEats instead of ordering directly from a restaurant. This consumer pattern is now applied to IT, and in fact, “serverless” might as well be the UberEats of the techs’ sharing economy, powered by the cloud. Now, if you’re a regular reader of our blog you might have heard of serverless and Functions as a Service (FaaS). Just to recap, serverless and FaaS are two different things – so make sure to avoid mixing them up.
In essence, FaaS quite similar to the Imperius Curse for the cloud (yes, a Harry Potter reference – but hey, it’s the perfect analogy). As a business unit, you’re telling the cloud: “Bro! I have my beautiful code and I need to use it. This is my business logic, please run it for me, and maybe run it for me <this many> times”. All of this, without having to care about managing dedicated systems, dedicated hardware, or even dedicated software. FaaS hides the fact there are servers behind this magic, hence the confusion with “serverless”.
Truth to be told “serverless” is more than FaaS, especially when we’re talking BDA. Sooner or later, you’ll hear about “Object Storage”, a term related to keeping your data safe, sound and immutable within “the cloud”. In essence, this is a cloud-hosted “hard drive”; only here you won’t have to deal with partitioning, disk volumes, file system types etc. You just BYOD and let the “serverless” cloud figure out how to store it, how to distribute it, make it highly available and so on. BDA here are highly abstracted, you just have a REST API designed for data upload and download and you can bring kilobytes batches, going up to terabytes of data – for the “serverless” architecture, the process is the same since the system itself scales according to your needs.
Another thing that makes “serverless” similar to an UberEats is its’ “Pay As You Go” consumption model. You using it as you go, hence, you pay as you go. This means you’re just paying for the gigabytes that you’re storing at this point, right now. And if you store less, you will be paying less in a very elastic, completely seamless way. If you have your data centricity ducks in a row, chances are you’d be better off with a serverless model, cutting down on operational and hardware maintenance costs.
What you need to get out of this is that BDAs’ evolution is ever-changing. Just think about it: two decades ago, you couldn’t run a data-centric environment without specific hardware and talent. 10 years later, hardware became less of a problem, shifting challenges towards your business and system administration resources. Today, both of those challenges are virtualised via Cloud, Serverless and FaaS approaches, leaving you fewer challenges on the talent side, with a heavier tool on building efficient meta-models from your business perspective.
90% of the data available today has been created over the last 2 years. We found ourselves on the verge of a BigData era and BDA might be the key to finding the right answers within the right time. The immense variety of patterns, veracity and complexity of these analytics may convey into a wider picture of our world, giving us the possibility to prevent disasters and mediate risks at the business level and maybe drive a sustainability trend at the global level.
We’ve only scraped the surface of BDA, just to give you a basic outlook at this term and of course, there’s more coming up about this. We’re getting close to the end of our Basics series, and from there on, our engineers will guide you through “the IT Chronicles of Narnia”, delivering you insight-advice regarding front-end, back-end, API, mobile and data-centricity approaches. We’ll be off for a couple of weeks, so until our next blog post, make sure to explore our entire Basics chain and maybe drop us a comment or two.