Cancel

Project Socket Stream with Spark Streaming

In this post, we consider a small example with Spark Streaming. My work is creating a project with Spark Streaming listen in port 7777 and filter line contain “error” word and print it to console. ...

Sep 23, 2021 2021-09-23T20:52:00+07:00 1 min

Summary of questions about Apache Hadoop

The main goal of Apache Hadoop Open data storage and powerful data processing. Save costs when storing and processing large amounts of data. You can see more details about Hadoop’s goals HERE Ha...

Aug 9, 2021 2021-08-09T20:52:00+07:00 3 min

Hadoop MapReduce and basic WordCount program with MapReduce

MapReduce is a processing technique and a programming model for distributed computing to deploy and process big data. Hadoop MapReduce is a data processing framework of Hadoop built on the idea of ...

Aug 3, 2021 2021-08-03T16:00:00+07:00 8 min

Commands for manipulating files and directories on HDFS

The commands on HDFS are generally quite similar to the commands on Linux, both in terms of their functions and names, if you are familiar with Linux/Ubuntu then you probably don’t need to learn mu...

Jul 6, 2021 2021-07-06T16:00:00+07:00 1 min

HDFS

Hadoop Distributed File System (HDFS) is a distributed storage system designed to run on common hardware. Highly fault-tolerant HDFS is implemented using low-cost hardware. HDFS provides high-throu...

Jul 4, 2021 2021-07-04T16:00:00+07:00 5 min

Install and deploy Hadoop single node

Every major industry is implementing Apache Hadoop as the standard framework for big data processing and storage. Hadoop is designed to be deployed across a network of hundreds or even thousands of...

Jul 1, 2021 2021-07-01T16:00:00+07:00 4 min

An overview of Hadoop

Hadoop is a framework based on a solution from Google to store and process large data. Hadoop uses the MapReduce algorithm to process input data in parallel. In short, Hadoop is used to develop app...

Jun 29, 2021 2021-06-29T20:52:00+07:00 2 min

MapReduce programming model for Bigdata

MapReduce is a processing technique and a programming model for distributed computing to deploy and process big data. MapReduce contains two important tasks: map and reduce. WordCount is a typical ...

Jun 24, 2021 2021-06-24T08:00:00+07:00 3 min

Recent Update

Trending Tags