In this post, we consider a small example with Spark Streaming. My work is creating a project with Spark Streaming listen in port 7777 and filter line contain “error” word and print it to console.
Project preparation
This is a simple project write by Scala. You can see all project in https://github.com/demanejar/socket-stream, please clone this project to your local before run it.
There are 2 main file in this project, first is SocketStream.scala
file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds
object SocketStream {
def main(args : Array[String]){
val conf = new SparkConf().setAppName("Socket-Stream")
val ssc = new StreamingContext(conf, Seconds(1))
val lines = ssc.socketTextStream("localhost", 7777)
val errorLines = lines.filter(_.contains("error"))
errorLines.print()
ssc.start()
ssc.awaitTermination()
}
}
- If you run project in cluster with multi node, change hostname from
localhost
to master node IP. - The task of this file is creating
StreamingContext
listening in port 7777, filter line containerror
word and print it to console.
The remaining file is build.sbt
:
name := "socket-stream"
version := "0.0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.4.1"
)
- We must build project to file
.jar
before submit it to Spark cluster with spark-submit. - If your computer no install
sbt
, you can refer how to installsbt
in https://www.scala-sbt.org/download.html.
Run project and see result
Start Spark and check status via UI address of master node in port 8080 (spark://PC0628:7077
):
Build project to .jar
file with sbt
command:
1
sbt clean package
Run spark-submit with .jar
file just create:
1
spark-submit --master spark://PC0628:7077 --class SocketStream target/scala-2.11/socket-stream_2.11-0.0.1.jar
master
is address of master nodeclass
is path to main function of project
Open other terminal and run command below to start sending text through port 7777:
1
nc -l 7777
Result in console is very fast:
Open port 4040 to see again detail job just done (localhost:4040
):
Refer: Learning Spark - Zaharia M., et al. (trang 184)