Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things.

or we can say , Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.

For example:

The image,video,text we uploaded on any social media is one kind of data.

The sms or mail we do is one kind of data.

So, Have you ever wondered how much data your mobile phone generates in the form of texts , phone calls , Emails , photos , Videos , searches and music ??

Approximately 40 Exa Bytes of Data is get generated every month by a single smart phone user . Amazed ? ..yes its true . Now imagine this no [40 ExaBytes] multiplied by 5 billion smart phone user .40 EB X 5,000,000,000 = 200,000,000,000 EB , its a large large large data . Infact this data is quite a lot for traditional systems to handle and this massive amount of data is what we term as BIGDATA . 40 EB X 5,000,000,000 = 200,000,000,000 EB , its a large large large data . Infact this data is quite a lot for traditional systems to handle and this massive amount of data is what we term as BIGDATA.

Data generated on internet per day:

->2.1 Milllion snaps on SnapChat

->3.8 million Search Queries on google

->1.5 million people log on to the facebook

- >4,5 millions vidoes are watched on you tube.

- >188 million emails are send .

and many many more social media platforms.

In short we say that everything we use in mainly in our daily life is be a type of data.so, there is lots of data.

Data can be qualitative or quantitative:

2)Quantitative data is numerical information (numbers)

Everyone uses data in their day-to-day life but 90% of them are not know where these huge data is actually stored or how these large data can be maintained?

As you noticed that the data is huge and in technical terms this huge amount of Data is causing lots of problems and by combining those problems, we have invented Big Data Problem. Big Data is not a technology, it’s just a name of the problem.

What is Big Data?

Why Big Data?

So, first we know about what launched the big data Era?

Acc. to ,an influential report by a company called McKinsey in 2013 claimed that the area of data science will be the number one catalyst for economic growth. McKinsey identified one of our new opportunities that contributed to the launch of the big data era. A growing torrent of data.

This refers to the idea that data seems to be coming continuously and at a fast rate. Think about this, today you can buy a hard drive to store all the music in the world for only $600. That’s an amazing storage capability over any previous forms of music storage.

In 2010 there were 5 billion mobile phones in use. You can be sure that there are more today and as I’m sure you will understand, these phones and the apps we install on them are a big source of big data, which all the time, every day, contributes to our core.

And Facebook, which recently just set a record of having one billion people login in a single day, has more that 30 billion pieces of content shared every month. Well, that number’s from 2013. So i’m sure that it’s much higher than that now.

Does it make you think how many Facebook shares you made last month? All this leads to projections of serious growth. 40% in global data per year, and 5% in global IT spending. This much data has sure pushed the data science field to start remaining itself and the business world of today.

But, there’s something else contributing to the catalyzing power of data science. It is called cloud computing. We call this on demand computing. Cloud computing is one of the ways in which computing has now become something that we ca do anytime, and anywhere.

You may be surprised to know that some of your favorite apps are from businesses being run from coffee shops. This new ability, combined with our torrent of data, gives us the opportunity to perform novel, dynamic and scalable data analysis, to tell us new things about our world and ourself.

To summarize, a new torrent of big data combined with computing capability anytime, anywhere has been at the core of the launch of the big data era.

Why is Big Data Important ?

>Time Reductions

>Understand the market conditions

>Control online reputation

>Using Big Data Analytics to Boost Customer Acquisition and Retention

>Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights

>Big Data Analytics As a Driver of Innovations and Product Development.

Let’s do an deep analysis how companies are using and managing BigData — among some of the biggest companies in the world — Microsoft , Apple , Amazon , Alphabet , Facebook etc.

How Amazon uses big data:

To combat this, Amazon uses Big Data gathered from customers while they browse to build and fine-tune its recommendation engine. The more Amazon knows about you, the better it can predict what you want to buy. And, once the retailer knows what you might want, it can streamline the process of persuading you to buy it — for example, by recommending various products instead of making you search through the whole catalogue.

Amazon’s recommendation technology is based on collaborative filtering, which means it decides what it thinks you want by building up a picture of who you are, then offering you products that people with similar profiles have purchased.

Amazon gathers data on every one of its customers while they use the site. As well as what you buy, the company monitors what you look at, your shipping address (Amazon can take a surprisingly good guess at your income level based on where you live), and whether you leave reviews/feedback.

This mountain of data is used to build up a “360-degree view” of you as an individual customer. Amazon can then find other people who fit into the same precise customer niche (employed males between 18 and 45, living in a rented house with an income of over $30,000 who enjoy foreign films, for example) and make recommendations based on what those other customers like.

Let’s now see the Traditional Approach of Storing and Processing Big Data:

In a traditional approach, usually the data that is being generated out of the organizations, such as the banks or stock markets, or the hospitals is given as an input to an ETL (Extract, Transform and Load) System.

An ETL System, would extract this data, transform this data, (that is, it would convert this data into proper format) and finally load this data into the database.

Once this process is completed, the end users would be able to perform various operations, such as generate reports and perform analytics by querying this data.

But as this data grows, it becomes a challenging task to manage and process this data using this traditional approach.

This is one of the reasons for not using the traditional approach for storing and processing the Big Data.

Now, let’s try to understand, some of the major drawbacks associated with using the traditional approach for storing and processing the Big Data.

The second drawback is scalability. As the data grows expanding this system would be a challenging task.

And the last drawback is, it is time-consuming. It takes a lot of time to process and extract, valuable information from this data, as it is designed and built based on legacy computing systems.

Hope this makes clear, why the traditional approach or the legacy computing systems are not used to store and process the Big Data.

Challenges Associated with Big Data

1. Input / output processing :

Its include:

  1. Data collection
  2. Data preparation
  3. Data input
  4. Processing
  5. Data output/interpretation
  6. Data storage

2. Volume :

3. Velocity:Have you ever thought why Google is so fast ? so its simple answere is velocity

4. Costing : sometimes it also became challenge depends on company-to company/business-to-business

Solution of Big Data Problem is:

Commodity Hardware implemented by the concept of DISTRIBUTED STORAGE -:

What is Distributed Storage?

Very easily think in this way, you have 4 laptops or 4 storage servers, typically known as Slave Nodes or Data Node. Every laptop is connected via networking with one main laptop typically known as Master Node or Name Node. Now suppose each server has 10 GB of storage, so if somehow 40GB data came then we won’t be able to store it in one server, so here comes the play of Distributed Storage.

  • Master is always receiving the data and distributing the data in between the slaves. That means now we don’t have to think about Volume Problems. Because no matter how big the Data is, we can easily distribute them in the slaves and also we don’t need to purchase bigger storages.
  • So, as we are not purchasing bigger storages so our costing will also decrease. Now we can purchase lots of small storage servers and attach them with master. Suppose in future the data becomes more huge, then we will purchase more storage servers and keep on attaching them with master.
  • Final thing speed, if you notice suppose one storage server takes 10 minute to store 10 GB data, now as in parallel there are multiple storage serves in parallel so to store the same 40 GB data in 4 storage device (10 GB in each server) we will only need 10 minutes. Also it’s not always about storing the data, it’s also about how faster you can read the data.whereas if we use one storage to read 40GB data then it will take over 40 minute. These are simple examples, in actually Industry these architectures are more bigger with lots of components attached to each other.

>This master slave setup is also called as TOPOLOGY .This entire setup working as a team is called CLUSTER.

>One of the product or technology that used to implement the Distributed Storage is Hadoop which uses HDFS (hadoop distributed file system)protocol.

The Wrap-Up

As we all know,World is running because of Data, and as Data is huge and Companies can’t delete the data, so it’s a very big challenge for them to store the Data, which leads us to the World of Big Data.

In upcoming days I am going to publish lots of articles on Big Data Tools and Technologies, So definitely follow me Medium.

Here is my LinkedIn profile if u have any queries definately comment below or DM me on linkedin

https://www.linkedin.com/in/abhay-agarwal-637b801a2/

I am tech enthusiast fascinated towards technology and its various disciplines including Big Data, Hadoop, Web Development, Competative Programming,ML,etc.