What is Data?

Data can be qualitative or quantitative:

What is Big Data?

Why is Big Data Important ?

How Amazon uses big data:

Now, let’s try to understand, some of the major drawbacks associated with using the traditional approach for storing and processing the Big Data.

Challenges Associated with Big Data

1. Input / output processing :

  1. Data collection
  2. Data preparation
  3. Data input
  4. Processing
  5. Data output/interpretation
  6. Data storage

2. Volume :

3. Velocity:Have you ever thought why Google is so fast ? so its simple answere is velocity

4. Costing : sometimes it also became challenge depends on company-to company/business-to-business

Solution of Big Data Problem is:

Commodity Hardware implemented by the concept of DISTRIBUTED STORAGE -:

What is Distributed Storage?

  • Master is always receiving the data and distributing the data in between the slaves. That means now we don’t have to think about Volume Problems. Because no matter how big the Data is, we can easily distribute them in the slaves and also we don’t need to purchase bigger storages.
  • So, as we are not purchasing bigger storages so our costing will also decrease. Now we can purchase lots of small storage servers and attach them with master. Suppose in future the data becomes more huge, then we will purchase more storage servers and keep on attaching them with master.
  • Final thing speed, if you notice suppose one storage server takes 10 minute to store 10 GB data, now as in parallel there are multiple storage serves in parallel so to store the same 40 GB data in 4 storage device (10 GB in each server) we will only need 10 minutes. Also it’s not always about storing the data, it’s also about how faster you can read the data.whereas if we use one storage to read 40GB data then it will take over 40 minute. These are simple examples, in actually Industry these architectures are more bigger with lots of components attached to each other.

>This master slave setup is also called as TOPOLOGY .This entire setup working as a team is called CLUSTER.

>One of the product or technology that used to implement the Distributed Storage is Hadoop which uses HDFS (hadoop distributed file system)protocol.

The Wrap-Up

--

--

--

I am tech enthusiast fascinated towards technology and its various disciplines including Big Data, Hadoop, Web Development, Competative Programming,ML,etc.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

An Example of Building a NLP-based Recommendation System (Part 2)

Big Data Expert Guide 2020

Plagiarism Detection in Online Exams using Machine Learning

Data Mesh

Global Process Analytics Market is projected to reach a value of over USD 4.9 billion by 2027

MAAPing Aboveground Terrestrial Carbon

IMAGE PROCESSING: Tableau

Know your Intent: State of the Art results in Intent Classification for Text

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhayagarwal

Abhayagarwal

I am tech enthusiast fascinated towards technology and its various disciplines including Big Data, Hadoop, Web Development, Competative Programming,ML,etc.

More from Medium

A brief overview of Apache Hadoop

Kinesis Data Stream Working and Architecture — Part2

Parallelising Data Processing On A Budget.

Programmatically Ingesting Data via Amazon Athena