Skip to content

Go Jump in a Data Lake

You’ve all heard the expression before. “Go jump in a lake.” It means the same thing as “Get outta town” or “I don’t believe you.” These days, many organizations are jumping into a lake – a data lake – a relatively new feature of the data topography of many companies. It’s all in the data.

[caption id="attachment_2607" align="alignleft" width="119"]Pittsburgh's Data Guy, KIK Consulting Bob Seiner, Pittsburgh's Data Guy, KIK Consulting[/caption]

What is a data lake? Some sources say that it is a place to keep your data, that holds a vast amount of raw data, in its native format, until it will be used. If not managed appropriately, organizations have come to call these things “data swamps” (or even “data cesspools”). Your organization may, in fact, be setting sail onto your first data lake.

But that’s not what this column is about. I want to talk to you about “data lakes” being the latest example of the slang used in the data management industry.

Data people are unique. Believe me that I know this firsthand. People in my industry come up with awkward names for different aspects of data or managing data. To the data people, these names seem totally logical. To non-data people these terms just muddy the water.

In this column, I want to share with you some of the slang-iest data terms being used today and definitions that will help you to understand what the heck your data people are talking about. Here is a brief list of most-searched on data terms and a simple definition of what they mean in most situations:

Meta Data is a term that has been newsworthy in recent times. Meta data (or metadata as it is commonly used) is data about the data (or in news cycles – information about the Snowden phone calls rather than the conversations themselves), data documentation so to speak that defines such things as what the data is called, a business description of the data, where the data has come from and the valid values that the data can take on. Metadata is the backbone to successful data management as it improves the value and understanding of the data resulting in better usage of the data.

Big Data is just what you might expect. Big data is the name given to sets of data that are so large and complex that traditional tools to process the data, so it can be used in applications and software packages, are inadequate to deal with them. Big Data is often considered in terms of several Vs – volume, velocity, variety and veracity.

Small Data needs explanation. Small data is a tiny (by comparison) and finely-tuned set of data that is used to serve a specific purpose for a selected audience. Small data is a newly-added term only recently making it into the data management industry. Small data will become a more interesting topic once organizations begin showing value from their Big Data initiatives.

Smart Data is less straightforward. Smart data is data that is formatted so it can be acted upon both where it is collected and then downstream in an analytical platform. In the analytical platform, further data consolidation and analytics take place. What makes this data smart is the advanced thought and design that is put into how the data will be immediately fit-for-purpose.

Data Warehouse is the grand-daddy of all data terms. A data warehouse is not a building where the data is stored. A data warehouse is a data resource, or a system used designed specifically for reporting and data analysis. The data warehouse is considered a core component of a business intelligence strategy. Metadata about the data in the warehouse is a core component provided to bring successful return on investments in data warehouses.

Data Lakes is also a relatively new term. As I wrote earlier, a data lake is another place to keep your data, that holds a vast amount of raw data, in its native format, until it will be used. Data lakes initially are undocumented (little metadata) until the time is such that the data must be made digestable to the business and analytical communities.

And finally, …

The Internet of Things is a term that is popping up more and more these days. The Internet of Things (or IoT for short) is the network of physical devices, vehicles, home appliances and other items embedded with electronics, software, sensors and network connectivity which enable these objects to connect and exchange data. You can think of the IoT as D2D (Device to Device) data exchange the same way that B2B is focused on business to business exchange.

There are new words, terms and phrases being added to the data landscape year after year. Hopefully this quick fix of new and old data terms will help you to understand the data people’s lingo and get you started with asking the questions about how these “data things” are related and can add value for you and your organization. As I have told you before … It’s all in the data.