This course introduces the fundamentals of modern data processing for data engineers, analysts, and IT professionals. You will learn the basics of Hadoop MapReduce, including how it works, how to compile and run Java MapReduce programs, and how to debug and extend them using other languages. The course includes practical exercises such as word counts across multiple files, log file analysis, and large-scale text processing with datasets like Wikipedia. You will also cover advanced MapReduce features and use tools like Yarn and the Job Browser. The course then covers higher-level tools such as Apache Pig and Hive QL for managing data workflows and running SQL-like queries. Finally, you will work with Apache Spark and PySpark to gain experience with modern data analytics platforms. By the end of the course, you will have practical skills to work with big data in various environments.

Discover new skills with 30% off courses from industry experts. Save now.


Hadoop and Spark Fundamentals: Unit 2
This course is part of Hadoop and Spark Fundamentals Specialization

Instructor: Pearson
Included with
Recommended experience
What you'll learn
Understand and implement Hadoop MapReduce for distributed data processing, including compiling, running, and debugging applications.
Apply advanced MapReduce techniques to real-world scenarios such as log analysis and large-scale text processing.
Utilize higher-level tools like Apache Pig and Hive QL to streamline data workflows and perform complex queries.
Gain hands-on experience with Apache Spark and PySpark for modern, scalable data analytics.
Skills you'll gain
Details to know

Add to your LinkedIn profile
August 2025
4 assignments
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate

There is 1 module in this course
This module introduces the core components of big data processing with Hadoop and Spark. It covers the fundamentals of Hadoop MapReduce, including its operation, programming, and debugging, followed by practical examples such as word count, log analysis, and benchmarking. The module then explores higher-level tools like Apache Pig and Hive for simplified data processing. Finally, it introduces Apache Spark and its Python interface, PySpark, highlighting Spark’s growing role in data analytics.
What's included
20 videos4 assignments
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Explore more from Data Management
- Status: Free Trial
- Status: Free Trial
Johns Hopkins University
- Status: Free Trial
Duke University
Why people choose Coursera for their career





Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.
If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.
Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.
More questions
Financial aid available,