Spark How to use?

Spark is

-Light weight fast cluster computing

-A fast in memory data processing engine.

-Developmet api available in scala, java , python

-Can be used when we need efficiently execute machine learning algorithm

-Written in scala , Sacala is one implementation of Spark

-Integrates well with the Hadoop ecosystem and data sources (HDFS, Amazon S3, Hive, HBase, Cassandra, etc.)

Here we will go through a sample that primarily scan a log file and then it will give stats based on the need.


public class ErrorScanner {

    public static void main(String[] args) {

        String logFile = "/../../error_sample.log";

        SparkConf conf = new SparkConf().setAppName("The Techie house spark app").setMaster("local");

        JavaSparkContext sc = new JavaSparkContext(conf);

        JavaRDD<String> inputFile = sc.textFile(logFile);

        JavaRDD<String> errors = inputFile.filter(new Function<String, Boolean>() {

            public Boolean call(String s) {

                return s.contains("ERROR");

            }

        });

        errors.count();

        errors.filter(new Function<String, Boolean>() {

            public Boolean call(String s) {

                return s.contains("MySQL");

            }

        }).count();

        errors.filter(new Function<String, Boolean>() {

            public Boolean call(String s) {

                return s.contains("MySQL");

            }

        }).collect();

        for (String word : errors.collect()) {

            System.out.println(word);

        }

    }



}

Get earthquake alert!!

TwitterUtils.createStream(...)
            .filter(

_.getText.contains("Earthquake now!") 
|| _.getText.contains("earthquake") && _.getText.contains("shaking"))

System Design | Implementation Rule Book | CheatSheet

Sunday, January 17, 2016

Spark How to use?

No comments:

Post a Comment