Posts Project Socket Stream with Spark Streaming
Post
Cancel

Project Socket Stream with Spark Streaming

In this post, we consider a small example with Spark Streaming. My work is creating a project with Spark Streaming listen in port 7777 and filter line contain “error” word and print it to console.

Project preparation

This is a simple project write by Scala. You can see all project in https://github.com/demanejar/socket-stream, please clone this project to your local before run it.

There are 2 main file in this project, first is SocketStream.scala file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds

object SocketStream {
	def main(args : Array[String]){
		val conf = new SparkConf().setAppName("Socket-Stream")
		val ssc = new StreamingContext(conf, Seconds(1))
		val lines = ssc.socketTextStream("localhost", 7777)
		val errorLines = lines.filter(_.contains("error"))
		errorLines.print()
		ssc.start()
		ssc.awaitTermination()
	}
}
  • If you run project in cluster with multi node, change hostname from localhost to master node IP.
  • The task of this file is creating StreamingContext listening in port 7777, filter line contain error word and print it to console.

The remaining file is build.sbt:

name := "socket-stream"
version := "0.0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.4.1"
)
  • We must build project to file .jar before submit it to Spark cluster with spark-submit.
  • If your computer no install sbt, you can refer how to install sbt in https://www.scala-sbt.org/download.html.

Run project and see result

Start Spark and check status via UI address of master node in port 8080 (spark://PC0628:7077):

Build project to .jar file with sbt command:

1
sbt clean package

Run spark-submit with .jar file just create:

1
spark-submit --master spark://PC0628:7077 --class SocketStream target/scala-2.11/socket-stream_2.11-0.0.1.jar
  • master is address of master node
  • class is path to main function of project

Open other terminal and run command below to start sending text through port 7777:

1
nc -l 7777

Result in console is very fast:

Open port 4040 to see again detail job just done (localhost:4040):

Refer: Learning Spark - Zaharia M., et al. (trang 184)

This post is licensed under CC BY 4.0 by the author.