by Jānis Gulbis
Data Mining – A Threat, Evolution, or Opportunity?
Our world continues to move into the digital era. With the proliferation of devices capable of collecting data, we are experiencing a data explosion. Every Internet-connected-device, from a smartphone to a cash register, creates data.
Data scientists and data miners are considered a gem to companies around the world. According to LinkedIn, data mining was one of the hottest jobs in 2014, and continues to be one of the highest-paying jobs.
Back To Small Data
But let us get down to earth. I believe that it is the Small Data that helps entrepreneurs, business owners, and managers making data-driven decisions.
Small data is data in a volume and format that makes it accessible, informative and actionable, connecting people with timely and meaningful insights for everyday tasks.
Before stepping into Big Data, businesses and entrepreneurs have to learn how to use the Small Data first. It is the primary skill, required to navigate through this increasingly complex world.
In this article, I will review several interesting cases of data analysis. It doesn't matter whether you think data mining is a good thing or bad; This might change your perspective.
- How data analysis changed the whole baseball industry;
- How data can help us control the spread of malaria or other epidemic diseases;
- How phone companies are tracking your every step; and
- How someone can tell that you're pregnant before you know it.
The Beginning of Data Mining
While it may sound overwhelming, data mining is not a new term. The process of collecting data goes back before the birth of the computer. Industries and government institutions have been collecting data for centuries. (Image: rayli.net)
In 1763, Thomas Bayes published a probability theorem, now called the Bayes’ theorem. This is a fundamental concept in data mining and probability theory even today.
Data mining is used in almost every industry:
- Science: space and ocean research, statistical probability;
- Medicine: clinical trials or genome sequencing;
- Business: credit card transactions, purchasing behavior, stock market movements;
- Government: national security, tax collection, and fiscal policy.
Those are just a few of the data mining applications.
Data has changed industries, and still changes them. Those who master data analytics define the new rules.
Evolution of Baseball – The Moneyball Principle
It is a story about the United States baseball team, the Oakland Athletics. It proves that using a statistical approach can help to assemble a winning team.
In professional baseball, there are large-market and small-market organizations. The New York Yankees and Oakland Athletics are good examples. Both have to make important decisions based on their economic status.
Before the success of Oakland Athletics, scouts did not pay particular attention to statistics. They ventured all over the country to evaluate players, based on five tools: speed, quickness, arm strength, hitting ability, and mental toughness.
This general theory is now considered the “old” scouting theory.
The New Approach To Baseball Scouting
The Moneyball theory places no emphasis on the body of the athlete. It did not matter what physical tools the athlete possesses. The theory was based mainly on the on-base percentage.
From 1999 to 2003, on-base percentage was a significant predictor of wins, but it wasn’t a significant predictor of individual player salaries. Good on-base players were undervalued. At the same time, sluggers, players who consistently hit home runs and doubles, were overvalued.
Oakland Athletics used this knowledge to gain the advantage. They sold overvalued assets and purchased undervalued ones. This eventually brought them to the playoffs in 2002 and 2003, with only one-third of the payroll.
Spread and Control of Epidemic Diseases
Research findings, published in October 2012, revealed how human travel patterns contribute to the spread of malaria.
Malaria is a mosquito-borne disease caused by a parasite. This disease kills about 1 million people each year; ninety percent of them are children under age of five. Malaria is a threat to over three billion people globally.
Researchers from Harvard School of Public Health analyzed cell phone data from 15 million people in Kenya. They combined this data with regional incidents of malaria.
Research showed that malaria, in large part, emanates from Kenya’s Lake Victoria region and spreads east, primarily toward Nairobi, the capital of Kenya.
By using this data, researchers can now build a map of malaria parasite movements between “source” and “sink” areas. This information can help public health officials decide how to control this disease.
Privacy Concerns – The Other End of The Stick
The collection of data is no longer limited to the information we put into the Internet. We have become more exposed and vulnerable as our personal information is more readily accessible. Even the simplest act of purchasing an item from a store will leave a personal behavior pattern. User behavior patterns can reveal personal information that one might not want to reveal.
That raises a concern. Is Data Mining a threat to our privacy? Perhaps it's just an evolution and we all have to adapt?
Congratulations! You Will Have a Baby…
Target analyzed a group of 25 key products. When purchased together, these products could predict that a women is likely pregnant. Based on these predictions, Target sends out relevant coupons.
The teenage girl started to receive coupons for baby clothes, cribs, and other maternity related products. Obviously, when the girl’s father saw these coupons, he got angry. He went straight to the manager of Target to express his frustration in person. But, between talks and the frustration of the girl’s father, it turned out that Target was right. The girl was indeed pregnant.
You Are Being Tracked! Anywhere You Go
Even if you don't share any information on the Internet, if you have a mobile phone in your pocket, you're being tracked.
In the summer of 2006, the European Union Commission released a new directive called the Data Retention Directive 2006/24/EC. It required that phone companies and Internet service providers collect and keep a wide range of information about their customers for at least six months to two years.
A German citizen named Malte Spitz requested this data from his phone company. After many unanswered requests and a lawsuit, Spitz eventually received all his data, which included 35,830 lines of code — a detailed, nearly minute-by-minute account of half a year of his life.
With the help of data visualization tools, Malte was able to create a visualization for 6 months of his life. Geographical location, calls made and received, number of SMS messages, and Internet usage: Everything was recorded. More in this TED talk.
These are only a few examples, but the trend is obvious. We leave data patterns everywhere we go. This makes us vulnerable because our personal information is exposed. On the other hand, it is an opportunity for companies and organizations who can use this data to better understand their customers.
Like most every topics out there, data mining is a stick with two ends. When you pick up one end of this stick, you inevitably pick up the other end as well.