Sunday, January 17, 2016

Spark How to use?

Spark is 

-Light weight fast cluster computing
-A fast in memory data processing engine.
-Developmet api available in scala, java , python
-Can be used when we need efficiently execute machine learning algorithm
-Written in scala , Sacala is one implementation of Spark

-Integrates well with the Hadoop ecosystem and data sources (HDFS, Amazon S3, Hive, HBase, Cassandra, etc.)

Here we will go through a sample that primarily scan a log file and then it will give stats based on the need.



public class ErrorScanner {
    public static void main(String[] args) {
        String logFile = "/../../error_sample.log";
        SparkConf conf = new SparkConf().setAppName("The Techie house spark app").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> inputFile = sc.textFile(logFile);
        JavaRDD<String> errors = inputFile.filter(new Function<String, Boolean>() {
            public Boolean call(String s) {
                return s.contains("ERROR");
            }
        });
        errors.count();
        errors.filter(new Function<String, Boolean>() {
            public Boolean call(String s) {
                return s.contains("MySQL");
            }
        }).count();
        errors.filter(new Function<String, Boolean>() {
            public Boolean call(String s) {
                return s.contains("MySQL");
            }
        }).collect();
        for (String word : errors.collect()) {
            System.out.println(word);
        }
    }
}
Get earthquake alert!!

TwitterUtils.createStream(...)
            .filter(
_.getText.contains("Earthquake now!") 
|| _.getText.contains("earthquake") && _.getText.contains("shaking"))

No comments:

Post a Comment