From NSA surveillance and data collection, to grocery store reward cards and what you put out with the trash, everything is a story waiting to be interpreted. New ways of collecting and interpreting data are changing business practices, providing insights into consumers and allowing data scientists to answer questions they never knew to ask. This emerging science – or art – is big data.

Information is constantly being produced through a variety of channels from bank transactions and market reports, to orders and invoicing, surveys, email, social media, news and weather or traffic reports. Simple website clicks have given way to smart thermostats and scales, even blood-sugar monitoring systems that all report streams of data to the cloud. Approximately 100 hours of video are uploaded to YouTube every minute.

By 2011, 2.3 trillion megabytes of data was created every day, according to an IBM study. That amount of data would fill the hard drives of more than 4 million of today’s run-of-the mill home or office computers. Talk about big data!

St. Thomas faculty and research centers are integrating big data into curriculum and scholarly work.

Big data “is an emerging science,” said Richard Rexeisen, Ph.D., professor of marketing. As such it is difficult to define. “You have to think about data in its most abstract form.”

“There is no universal definition of big data,” said John Olson, Ph.D., professor and chair of the Operations and Supply Chain Management Department. “There are connections and relationships that we can discover with large volumes of data that cannot be found with small volumes of data.”

Daniel McLaughlin, M.H.A., director of the Center for Health and Medical Affairs, explained that big data has three important attributes:

  • Volume: It consists of very large data sets.
  • Velocity: It is being produced at a tremendous speed.
  • Variety: It contains data from many sources, both structured and unstructured.

This much information requires new ways to manage, interpret and act upon it. Given its ability to affect commerce and operations across a business, big data is a business priority. The tools and analyses possible with big data can connect the dots in new ways never considered before. “In addition to providing solutions to long-standing business challenges,” the IBM report notes, “big data inspires new ways to transform processes, organizations, entire industries and even society itself.”

“We’ve gone beyond the rudimentary observations, or the classic ways of doing a survey and asking people questions,” Rexeisen said. “We’re now in the realm of inferential mathematics.”

What does that mean? In traditional statistics we look for clues on causation, Olson explained. “A leads to B. In big data we look for universal correlations. A and B are related, but we just may not know how.” The correlations are at such a high level that they resemble the truth. “From a data science perspective, we try to move from absolute precision to general trends that explain large amounts of behavior.”

“The newest, hottest profession is data scientist,” McLaughlin said. “It’s just a new name for business analyst, and we’re teaching this skill.”

Betting on Big Data

The gaming and casino industry has been a leader in using big data. The Venetian Casino and Hotel in Las Vegas uses big data related to its customer relationship management system and guest preferences. “By measures of association they are able to understand trends in the data,” Olson said. “They never know exactly why a guest spends A, B or C, just that there are behaviors at certain times of year with certain groups of people.”

Harrah’s Casinos looked through its data and discovered a pain point for a group of customers: If women 21- to 34-years-old lost more than $700, they never came back, McLaughlin explained. The casino’s response is to monitor each gambler and when that demographic hits $600 in losses, McLaughlin said, “They send over a ‘luck ambassador’ who says, ‘Hey, Sally, we see you’re having a bad day. Why don’t you have dinner on us?’ This helped them retain a customer they would have lost.”

“The art of data mining is to find that little cluster – those people who have the same behaviors,” McLaughlin said. On the Internet, Google and Amazon have complete models of us. They know exactly what we like and what we don’t like. It is really knowledge mining; figuring out when you look at the data if it makes sense or not. Amazon.com uses big data in recommending products. Based on each shopper’s browsing and buying habits, Amazon can, “with a high degree of probability, make predictions about other products and services you might want to take a look at,” Rexeisen said.

Amazon compares each consumer’s activity on its site with that of other customers, and uses that comparison to recommend other items of interest. “Recommendations change regularly based on a number of factors, including when you purchase, rate or like a new item, as well as changes in the interests of other customers like you,” Rexeisen said.

Disease and Pain Management

McLaughlin has been researching – and using – big data in the health care arena. He said health care doesn’t have the types of customer data of Internet giants and casinos – yet. “We’ve got a medical record, but we don’t have the behavior down like Google does,” he said. “If you’re a health care provider, do you want to get that information? Is it ethical to get that information?”

McLaughlin worked with the St. Cloud Hospital on a project to better assist patients’ pain management. Traditionally, patients with pain would bounce around the system getting unsatisfactory care and spending a lot of time and money. The hospital built a model to improve care and develop specialized pain clinics. Its next step was to mine its own data and to calibrate its model, McLaughlin explained. It found $2 million in annual savings.

Allina Health looked to its data to see if it could better manage chronic disease. Using just its medical record information, McLaughlin said, an analysis helped to show three groups of clients: people who are generally healthy, people who are sick and being seen regularly, and those who don’t know they’re sick. “With something like 80 percent accuracy they could predict whether they would be admitted or in the ER,” he said.

Big data is just one of the tools for medical research, McLaughlin said. The ethics and privacy practices already in place in medicine carry through.

He wondered about combining medical data with marketing data on consumer behavior to improve compliance – doing what the doctor says. “Could you get better behavior out of your patients? Hence, better health for them, and lower costs for providers. But does that cross an ethical line?” he asked.

Ethical Use of Data

Big data will afford our society many tremendous benefits, Rexeisen said, “[but] I’m not sure we’ve slowed down enough to think about all of the ethical questions that have been raised. We’ve not yet contemplated the domain of questions we must ask ourselves about this discipline.”

Your data says a lot about you, including your age, gender, marital status, income level, health, hobbies, habits and plenty of other things that you likely consider private. “I’m beginning to increasingly include this as a conversational element in my consumer behavior class,” Rexeisen said. “I don’t think any of us know what data is truly private, and with these tools, I don’t need much more than your public data to infer much of the private information.”

He believes that St. Thomas is well positioned to take up the cause of raising these questions for conversation as well as beginning to address the issues that arise. “We have a mandate of our mission to address big data within this ethical framework. What are the ethical questions that we need to begin asking ourselves to get ahead of a rapidly emerging, highly valuable technology?”