Posts
De Manejar
Cancel

Project Socket Stream with Spark Streaming

In this post, we consider a small example with Spark Streaming. My work is creating a project with Spark Streaming listen in port 7777 and filter line contain “error” word and print it to console. ...

Summary of questions about Apache Hadoop

The main goal of Apache Hadoop Open data storage and powerful data processing. Save costs when storing and processing large amounts of data. You can see more details about Hadoop’s goals HERE Ha...

Hadoop MapReduce and basic WordCount program with MapReduce

MapReduce is a processing technique and a programming model for distributed computing to deploy and process big data. Hadoop MapReduce is a data processing framework of Hadoop built on the idea of ...

Commands for manipulating files and directories on HDFS

The commands on HDFS are generally quite similar to the commands on Linux, both in terms of their functions and names, if you are familiar with Linux/Ubuntu then you probably don’t need to learn mu...

HDFS

Hadoop Distributed File System (HDFS) is a distributed storage system designed to run on common hardware. Highly fault-tolerant HDFS is implemented using low-cost hardware. HDFS provides high-throu...

Install and deploy Hadoop single node

Every major industry is implementing Apache Hadoop as the standard framework for big data processing and storage. Hadoop is designed to be deployed across a network of hundreds or even thousands of...

An overview of Hadoop

Hadoop is a framework based on a solution from Google to store and process large data. Hadoop uses the MapReduce algorithm to process input data in parallel. In short, Hadoop is used to develop app...

MapReduce programming model for Bigdata

MapReduce is a processing technique and a programming model for distributed computing to deploy and process big data. MapReduce contains two important tasks: map and reduce. WordCount is a typical ...