Steps to learn BIG DATA
Area of big data is very vast and it is not a thing that can be learnt in a month. To master big data you have to go through various softwares and tools in several stages. Technically big data is a combination of programming skills, analytical skills, database skills, data structure and algorithm, machine learning and many more. Hence you should known exactly what you are going to do.
You need to go through several stages if you want to become a master in big data:
- Linux opreating system/ Shell scripting: Ubuntu is the best operating system to learn big data tool like Hadoop. Since Ubuntu is Linux distribution, hence you should have an overall idea about Linux. And Linux is Unix-like computer operating system so you should also have somewhat idea about Unix. Shell scripting is about the interface between the user and computer operating system.
- Java/ Python: Hadoop framework is built in Java format. But if you want to learn Hadoop, you can also work on Python and even C++. It is not necessary that program in Hadoop must be coded in Java. If you are doing Python then it has some advantages over Java. Advantages of Python over Java is explained in further points.
- Learning Hadoop: Hadoop is an open source big data tool. First you need to understand that it is not a single product in itself. It is made up of four core modules as follow:
- Hadoop Distributed File System: It is data storage and file handling system.
- Hadoop YARN: It is basically the resource management and job scheduling technology.
- Hadoop Mapreduce: It manage parallel processing in the Hadoop environment.
- Hadoop Common: It support other three modules mentioned above.
- Spark: It is also one of the big data tool. Compare to Hadoop it will provide high speed and more tools. But for doing Spark you need to know Python. So if you are learning Python over Java then you will have an advantage of switching to Spark after working on Hadoop.
- While learning Hadoop getting your hands dirty on coding, will be a big advantage.
Online courses available for Big data
You may go through:
- https://cognitiveclass.ai/learn/big-data
- https://www.coursera.org/learn/big-data-introduction
- https://intellipaat.com/big-data-hadoop-training
- https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Good information you shared. keep posting.
data science in malaysia