Ever thought..!! How big Facebook is and How it ingests thousands of terabytes of data in a day.
So Here are some of my thoughts and research on this that how these big MNC’s like facebook, google, instagram manages or stores thousands of terabytes of data with high speed and high efficiency.
Everyone in this era of growing social media is using all these platforms for sharing views, thoughts, ideas. So the users are the one who give there ideas,thoughts,data, information in these platforms and these platforms stores these thousands of terabytes of data in a day.
By the analysis and research these big MNC’s like Google process over 40,000 search queries per second on an average. Google currently processes over 20 petabytes of data per day. Facebook is also the one which processes thousands of terabytes in a day.But where these large amount of Data is being stored? And how it is so secured and how these big MNC’s manages such large amount of data.
PROBLEM - BIG DATA..!!
The data size is continuously increasing day by day and to store such large amount of data with a security is major concern. If companies create storage for large amount of data in a single hard Disc so another problem occurs which is known as Input/Output Problem(Input to read data and output to save any kind of data) and Input/Output processing in hard disc is very slow. So Big Data is a big challenging problem which has two major factors having large Volume which is the large size of data and Velocity which is the speed of storing large amount of data in secured way.
Not only the large size of data and management of speed but also cost and many more factors as well is a big problem.
SOLUTION TO MANAGE BIG DATA PROBLEM..!!!
To overcome this problem of volume ,velocity,cost and various more factors. DISTRIBUTED STORAGE is used and Distributed storage is the core of all the issues related to Big Data.
Solution to this Problem is Distributed storage that is dividing or distributing the data in different independent laptops which means Instead of storing the data in a single PC, storing data into multiple PC’s. For example we can consider one laptop has 400 Gb data so dividing that into 4 blocks by providing 100 Gb to 4 different PC’s respectively so since the data is divided into blocks so the Volume problem is solved and Velocity problem is also solved because all the Four PC’s will take same time and run parallely.
A model named Master Slave Model which is used for storing data and distributing storage.The Master is the one who divides the data into multiple slaves or we can also say Master is the one laptop or the name node and many laptops are slave or Data Node and Slave/ Data node is the one who is contributing there hard disc to master node for storing the distributed data and on combining they form a CLUSTER that is known as MASTER-SLAVE CLUSTER.
All this Challenge can be solved by Master Slave Cluster for implementing this a software which introduces HADOOP and Hadoop is is the one who will create cluster via Networking, using HDFS Protocol which is Hadoop Distributed File System which will help in storing data in a distributed manner.
So we can say more and more community hardware computer use more and more data will be stored quickly and efficiently.
And By this we can say HADOOP is the one which will help in managing and storing huge amount of data by striping or splitting the data generated by the users on these big Mncs like facebook, google per day in thousands of terabytes.
I Hope this blog is helpful and Informative and gives you a thought which you never ever had thought before that where the data is being stored and how these big MNC’s manages such huge amount of data. It is well said “Knowledge has a beginning but no end.” So Do share this so that its a beginning for someone else’s knowledge..Because its all about Right Education..!!!
Thank You..!! :)