Wednesday, October 8, 2014

Overview About Big Data and Hadoop - Part 1

Big Data has come to Indonesia. One of my customer ask about it. I don't know if the needs has come or only temporary joy from outside. Still, I think I need to improve my knowledge about Big Data and Hadoop. I found that the book titled : The Executive's Guide To Big Data & Apache Hadoop by Robert D. Schneider is very good and insightful. This book contained everything you need to understand and get started with Big Data and Hadoop. You can find the eBook free in Google, but if you need something more summarize please read my summary below.

Introducing Big Data

Big Data has the potential to transform the way you run your organization. When used properly it will create new insights and more effective ways of doing business, such as:
  • How you design and deliver your products to the market
  • How your customers find and interact with you
  • Your competitive strengths and weaknesses
  • Procedures you can put to work to boost the bottom line

What Turns Plain Old Data into Big Data?

From Robert D. Schneider perspective, organizations that are actively working with Big Data have each of the following five traits in comparison to those who don’t:

  1. Larger amounts of information
  2. More types of data
  3. Data that’s generated by more sources
  4. Data that’s retained for longer periods
  5. Data that’s utilized by more types of applications

1. Larger Amounts of Information
Enterprises are capturing, storing, managing, and using more data than ever before. Generally, these events aren’t confined to a single organization; they’re happening everywhere: 

On average over 500 million Tweets occur every day 
World-wide there are over 1.1 million credit card transactions every second 
There are almost 40,000 ad auctions per second on Google AdWords 
On average 4.5 billion “likes” occur on Facebook every day

Comparing Database Sizes


2. More Types of Data
Structured data – regularly generated by enterprise applications and amassed in relational databases – is usually clearly defined and straightforward to work with. On the other hand, enterprises are now interacting with enormous amounts of unstructured – or semi-structured – information, such as:

  • Clickstreams and logs from websites 
  • Photos 
  • Video 
  • Audio 
  • XML documents 
  • Freeform blocks of text such as email messages, Tweets, and product reviews

3. Generated by More Sources
Enterprise applications continue to produce transactional and web data, but there are many new conduits for generating information, including: 

  • Smartphones 
  • Medical devices 
  • Sensors
  • GPS location data 
  • Machine-to-machine, streaming communication

4. Retained for Longer Periods
Government regulations, industry standards, company policies, and user expectations are all contributing to enterprises keeping their data for lengthier amounts of time. Many IT leaders also recognize that there are likely to be future use cases that will be able to profit from historical information, so carelessly throwing data away isn’t a sound business strategy. However, hoarding vast and continually growing amounts of information in core application storage is prohibitively expensive. Instead, migrating information to Hadoop is significantly less costly, plus Hadoop is capable of handling a much bigger variety of data.


5. Utilized by More Types of Applications
Faced with a flood of new information, many enterprises are following a “grab the data first, and then figure out what to do with it later” approach. This means that there are countless new applications being developed to work with all of this diverse information. Such new applications are widely varied, yet must satisfy requirements such as bigger transaction loads, faster speeds, and enormous workload variability.


Big Data is also shaking up the analytics landscape. Structured data analysis has historically been the prime player, since it works well with traditional relational database-hosted information. However, driven by Big Data, unstructured information analysis is quickly becoming equally important. Several new techniques work with data from manifold sources such as:

  • Blogs 
  • Facebook 
  • Twitter 
  • Web traffic logs 
  • Text messages 
  • Yelp reviews
  • Support desk calls 
  • Call center calls

Implications of Not Handling Big Data Properly

Failing to keep pace with the immense data volumes, mushrooming number of information sources and categories, longer data retention periods, and expanding suite of data-hungry applications has impeded many Big Data plans, and is resulting in:

  • Delayed or faulty insights 
  • An inability to detect and manage risk 
  • Diminished revenue 
  • Increased cost 
  • Opportunity costs of missing new applications along with operational use of data 
  • A weakened competitive position


Checklist: How to Tell When Big Data Has Arrived

1. You’re getting overwhelmed with raw data from mobile or medical devices, sensors, and/or machine-to-machine communications. Additionally, it’s likely that you’re so busy simply capturing this data that you haven’t yet found a good use for it.

2. You belatedly discover that people are having conversations about your company on Twitter. Sadly, not all of this dialogue is positive.

3. You’re keeping track of a lot more valued information from many more sources, for longer periods of time. You realize that maintaining such extensive amounts of historical data might present new opportunities for deeper awareness into your business.

4. You have lots of silos of data, but can’t figure out how to use them together. You may already be deriving some advantages from limited, standalone analysis, but you know that the whole is greater than the sum of the parts.

5. Your internal users – such as data analysts – are clamoring for new solutions to interact with all this data. They may already be using one-off analysis tools such as spreadsheets, but these ad-hoc approaches don’t go nearly far enough.


6. Your organization seeks to make real-time business decisions based on newly acquired information. These determinations have the potential to significantly impact daily operations.

7. You’ve heard rumors (or read articles) about how your competitors are using Big Data to gain an edge, and you fear being left behind.

8. You’re buying lots of additional storage each year. These supplementary resources are expensive, yet you’re not putting all of this extra data to work.

9. You’ve implemented – either willingly or by necessity – new information management technologies, often from startups or other cutting-edge vendors. However, many of these new solutions are operating in isolation from the rest of your IT portfolio.



Click Here For Part 2

No comments:

Post a Comment

Share Your Inspiration...