Spark is
-Light weight fast cluster computing
-A fast in memory data processing engine.
-Developmet api available in scala, java , python
-Can be used when we need efficiently execute machine learning algorithm
-Written in scala , Sacala is one implementation of Spark
-Integrates well with the Hadoop ecosystem and data sources (HDFS, Amazon S3, Hive, HBase, Cassandra, etc.)
Here we will go through a sample that primarily scan a log file and then it will give stats based on the need.
public class ErrorScanner {
public static void main(String[] args) {
String logFile = "/../../error_sample.log";
SparkConf conf = new SparkConf().setAppName("The Techie house spark app").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> inputFile = sc.textFile(logFile);
JavaRDD<String> errors = inputFile.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("ERROR");
}
});
errors.count();
errors.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("MySQL");
}
}).count();
errors.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("MySQL");
}
}).collect();
for (String word : errors.collect()) {
System.out.println(word);
}
}
}
Get earthquake alert!!
TwitterUtils.createStream(...)
.filter(
_.getText.contains("Earthquake now!")
|| _.getText.contains("earthquake") && _.getText.contains("shaking"))
No comments:
Post a Comment