Home
De Manejar
Cancel

Crawl Housing Data from Alonhadat with Scrapy

In this article, I will introduce in detail how to create a project with Scrapy and use it to analyze and extract housing data from the Alonhadat website. If your machine doesn’t have Scrapy yet, y...

Why Website Owners Don't Completely Want to Protect Their Websites from Being Crawled?

Current websites are no longer as easy to extract data from as before because the structure of websites now is also very different from before. They don’t have clearly defined parts for quick analy...

Crawler, Some Things I Share About Crawlers and Upcoming Crawler Series?

Crawler, Web Scrape, Web Scraping, data collection, data scraping,… are probably the words we use most to talk about creating programs that analyze and extract data from websites. There is definite...

Project Socket Stream with Spark Streaming

In this post, we consider a small example with Spark Streaming. My work is creating a project with Spark Streaming listen in port 7777 and filter line contain “error” word and print it to console. ...

Summary of questions about Apache Hadoop

The main goal of Apache Hadoop Open data storage and powerful data processing. Save costs when storing and processing large amounts of data. You can see more details about Hadoop’s goals HERE Ha...

Hadoop MapReduce and basic WordCount program with MapReduce

MapReduce is a processing technique and a programming model for distributed computing to deploy and process big data. Hadoop MapReduce is a data processing framework of Hadoop built on the idea of ...

Docker Basics and Practice

This article is referenced and noted from the Docker practice series on Viblo. Link to the series: Docker Practice from Basics I. Installing Docker and Some Basic Concepts Installation can be re...

Understanding Apache Nifi

Apache Nifi is used to automate and control data flows between systems. It provides us with a web-based interface that can collect, process, and analyze data. NiFi is known for its ability to buil...

Kafka In Depth

While working on a big data storage and processing project at school, I learned about Kafka and used it in my project. However, at that time I only knew simply that it was a message queue to pour d...

Commands for manipulating files and directories on HDFS

The commands on HDFS are generally quite similar to the commands on Linux, both in terms of their functions and names, if you are familiar with Linux/Ubuntu then you probably don’t need to learn mu...