Sunday, January 17, 2016

Spark : Best Use Case

The world is moving towards data oriented app that the end user quickly get the analysed result and business decisions are made subsequently.

Below are the use case that requires spark based fast computing platform.
  • Early alert: eg Earthquake alert from the tweets.
  • Finance industry: identify fraud detection system.
  • E commerce: Analyse the users/ review and customer comments and take a business decision.
  • Sports: Identify the patterns and responds accordingly.
  • Wether forecasting: Use the capability to process huge amount of data.




Users: Amazon, eBay, and Yahoo.
Largest known cluster to run scala is 8000.

Spark How to use?

Spark is 

-Light weight fast cluster computing
-A fast in memory data processing engine.
-Developmet api available in scala, java , python
-Can be used when we need efficiently execute machine learning algorithm
-Written in scala , Sacala is one implementation of Spark

-Integrates well with the Hadoop ecosystem and data sources (HDFS, Amazon S3, Hive, HBase, Cassandra, etc.)

Here we will go through a sample that primarily scan a log file and then it will give stats based on the need.



public class ErrorScanner {
    public static void main(String[] args) {
        String logFile = "/../../error_sample.log";
        SparkConf conf = new SparkConf().setAppName("The Techie house spark app").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> inputFile = sc.textFile(logFile);
        JavaRDD<String> errors = inputFile.filter(new Function<String, Boolean>() {
            public Boolean call(String s) {
                return s.contains("ERROR");
            }
        });
        errors.count();
        errors.filter(new Function<String, Boolean>() {
            public Boolean call(String s) {
                return s.contains("MySQL");
            }
        }).count();
        errors.filter(new Function<String, Boolean>() {
            public Boolean call(String s) {
                return s.contains("MySQL");
            }
        }).collect();
        for (String word : errors.collect()) {
            System.out.println(word);
        }
    }
}
Get earthquake alert!!

TwitterUtils.createStream(...)
            .filter(
_.getText.contains("Earthquake now!") 
|| _.getText.contains("earthquake") && _.getText.contains("shaking"))

Monday, January 11, 2016

Code coverage

Code coverage plays an essential part in development and helps to ensure is code is delivered with quality.

Below are the code coverage tool that can be used during development:

1. eclipse coverage plugin : http://eclemma.org/