As a red blooded American Capitalist why read Karl Marx?

So some of you may be asking yourself why I have read Marx and why am I even willing to go see his statue while I’m in town in Berlin, the answer is simple: “Know thy self, know thy enemy. A thousand battles, a thousand victories.” – Sun Tzu; The Art of War. And yes Socialism in all its forms is my enemy. Why because socialism is the enemy of individual freedom by its very definition. And an enemy of freedom is also my enemy…

Know thy Enemy…

~Robert; Germany, July 2018

Posted in Philosophy, Politics, Society | Leave a comment

We Need An Internet First Amendment NOW!

Let’s get the Hashtag: #InternetFirstAmendment trending!

Today, Alex Jones’ InfoWars was removed from the social media platforms: Facebook, Spotify, YouTube, and Apple iTunes PodCasts. This is a major attack on the freedom of expression on today’s Internet.

You don’t need to like Alex Jone’s or agree with him, but he does have the right to his opinions and he should have the protected right to say what he wishes on the Internet.

It should be up to the consumer to “change the channel” if they don’t like what his brand of content says. We cannot give up our right to share ideas and read other people’s ideas to a handful of big tech firms. The Internet has become the new Public Forum, and therefore our speech and writing on the Internet needs to be protected by the First Amendment.

When I started creating content on the Internet back in 1994, I was in high school and was creating HTML pages by hand on a web hosting platform was known as GeoCities.

On Today’s Internet, users do not need to know how to code even a simple markup language like HTML, and instead can use “Social Media” tools like Facebook and Twitter, or post Videos to YouTube and other video hosting sites.

It has become easier than ever to use the Internet to share our ideas and for the most of us, at least in the Western world, the majority of our daily communications is now done on the Internet, and usually it is made via a couple of dozen web sites at most.

Internet Censorship is on the rise, and we need to put a stop to it once an for all.

I am a capitalist through and through, but more than that I’m an American and somewhat of a Constitutionalist, especially when it comes to our rights, like Freedom of Speech and Freedom of the Press.

I know the First Amendment is meant to protect speech in the public square, however the new public square are these Social Media sites.

So I’m asking everyone who believes in their own right to their ability to share your thoughts, your passions, and your opinions, to start making calls and writing to your Senators and your Representatives, at the Federal level, but also at the State less as well. Please ask them to work towards a bill and hopefully an Amendment that basically says if a Technology Company, Web Hosting Service, Domain Registrar, Social Media Platform, and Media Sharing Platform like YouTube and Apple’s iTunes, as well as Search Engines like Google and Bing, that if they want to continue to operate within the United States, they MUST respect the first amendment.

We aren’t talking about making private companies government owned, but just like any other telecommunications company like your Land Line Phone Company, Cellular Phone Company, your Cable Company, and Broadcast Radio and TV, among others, they need to be regulated to prevent them from removing anyone’s content.

Let it fall to the realm of the US Courts to determine if someone’s account violates an actual Law. This way everyone’s Due Process Rights are protected, and everyone’s rights to a fair and free Public Forum on the Internet are protected.

Level’s of Internet Censorship Slide:

 

 

Posted in Politics, Society, Technology | Leave a comment

Red Tide: Prologue

The Democrat Party in the United States around the year 2045 was in shambles; a mere shadow of its former self. After approximately 4 major US Senate Election Cycles and 7 US Presidential Election Cycles the Democratic Party only continued to fracture between what in 2018 was known as the Neo-Liberals and the “Alt-Left” (also known as the Far Left in certain circles). The DSA (Democratic Socialists of America) continued to gain traction among the Alt-Left but have not been able to gain more than a handful of seats in both houses of congress and never had a truly viable contender for the Presidency. However the Alt-Left on both coasts of the United States and a handful of other states where liberals, in overwhelming numbers have migrated to, brining their socialist ideas along with them, have becoming ever more disgruntled with their battle to try to wage a political war using the System. There have been cries by the Alt-Left since before President Donald J. Trump’s first election win in November 2016 that the “System” is too broken to fix from within, and protesters and the far-left have said the “Only Solution is Revolution”, and that call has grown only stronger in the past 29 years since that monumental election of 2016. While the policies President Trump, the Republicans, and those few Democrats that decided to break with party lines, reaching across the aisle joining the new Renaissance in America, creating a very powerful Economy for the American People. The picture was not as rosy for the rest of the world, and where ever Socialism gained power, those nations fell into Economic and Social decay. Still with all this proof, both at home and abroad, the Alt-Left ignored the prosperity they saw around them, especially in the more classically liberal and therefore libertarian States within the Union. The arrogance that the “Alt-Left” and all other socialists (coastal elites) “knew” what’s better for our country, and somehow the “corrupt” system of Capitalism, at least in their view was somehow keeping itself afloat by the top 5% of incoming earners pushing down on the “bottom” 95%. What started with a fight against the 1% by Occupy in 2011, slowly turned into the “other 98%” then into the fight against the 5%. Some of us knew this was to be inevitable, and the socialists try to push for a fight against the top 10% but face resistance at this level, but were able to sustain somewhat of a political battle against the top 5% in the United States, but starting and continuing to wage a Class War in the Alt-Left Strongholds like California and New York. While the country seemed united under economic prosperity, dissent, disdain, and even hatred continued to grow in these Socialist States, brought by a sense that Socialism will never take hold within the United States at a Federal Level. By 2045 this hatred from the Socialist States on the coasts for the Capitalist Libertarian States in the rest of the country reach a fever pitch. Movements like Calexit were coming up in political debates and major protests, sometimes violent, across all the States where Socialism took hold of their major cities; as it was only in the cities where lack of true freedom of the individual and instead Groupthink prevailed…

Posted in Red Tide | Leave a comment

What is Red Tide?

Red Tide is a Fictional Universe, I’m creating based on this premise: The year the main story starts in is 2045. Since the first election of Donald J. Trump, America as enjoyed a very strong economy with lower taxes and deregulation. For the more libertarian or “classical liberal” States within the Union, prosperity was clearly visible, while States especially on the two coasts that embraced a more Socialist attitude, economy and social decay advanced. Within the 29 years between 2016 and 2045, the Alt-Left has continued to become disenfranchised with the American System of Government, because although with all their work trying to gain political power by using our free elections, they only managed to gain a handful of seats in either of the Congressional Houses and they never put forth a viable candidate for the Presidency. So the Alt-Left turned more inward continuing to espouse their slogan they have used since 2016 that “The only Solution is Revolution”.

The Red Tide fictional universe is about how the Alt-Left destroyed the Democratic Party in the United States and follows their attempt to tear apart our great nation from within; attempting to replace our legitimate elected Government with a Communist Regime using subversive social tactics to rile up their small base of Socialist to lead a legion of made up of students and illegal migrants who have been brainwashed by the ever growing and dangerous group of Marxist Radicals within the Public K-12 Education System, the University System, and the Far-Left Leaning State Governments (Coastal Elites) within the United States.

This story is supposed to be controversial. It’s supposed to make you think. And my hope in the end is that it helps to establish a dialog on why Socialism in all it’s forms is inherently against our basic human rights to Life, Liberty, and the Pursuit of Happiness.

Posted in Red Tide | Leave a comment

Calculating the value of Phi aka the Golden Ratio in Perl

I was watching a video by 3blue1brown and decided to code up the Continued Fraction of “phi” aka “The Golden Ratio“. It’s pretty simple, it’s basically a recursive function (Here’s a link to the Mathematical Definition of a Recursion and the Programming Definition of Recursion) of 1 + 1/x. But you can code it in a loop. I did a quick Perl Script using an infinite loop. You first hit the golden ratio at 9 iterations, and at 11 iterations of the loop the value becomes metastable at a precision of 6. Here’s the script and output:

Perl Script to calculate Phi using the Continued Fraction (a function call in a loop here):

Standard Output paused at 21 iterations of the loop:

 

Posted in Science | Leave a comment

Wisdom is Real and Meaningful

Back in College, a humanities professor I had, postulated that Wisdom is meaningless, and there is an assumption that just because someone is older they are wise and someone who is younger and educated via the modern western education system is “smarter” than an older person who is wise. At the time I agree with him, however given my own life experience, I believe he and I were both wrong in this assumption and instead while not all older people are wise, true wisdom can exist.

We see similar terms all the time, such as Street Smarts or Common Sense. These are real. And this type of knowledge is sometimes difficult to explain via writing in the traditional sense of how western education now takes place. In the past we valued apprenticeships, and I believe this was a type of education that imparted wisdom and knowledge onto the apprentice by the master. So Yes, Wisdom is real, and is extremely valuable. Wisdom and teaching styles such as Apprenticeship extremely useful tools in the passing and development of knowledge.

Happy Easter!

~Robert

Posted in Philosophy | Leave a comment

Master Map-Reduce Job – The One and Only ETL Map-Reduce Job you will ever have to write!

It’s fitting that my first article on Big Data would be titled the “Master Map-Reduce Job”. I believe it truly is the one and only Map-Reduce job you will every have to write, at least for ETL (Extract, Transform and Load) Processes. I have been working with Big Data and specifically with Hadoop for about two years now and I achieved my Cloudera Certified Developer for Apache Hadoop (CCDH) almost a year ago at the writing of this post.

So what is the Master Map-Reduce Job? Well it is a concept I started to architect that would become a framework level Map-Reduce job implementation that by itself is not a complete job, but uses Dependency Injection AKA a Plugin like framework to configure a Map-Reduce Job specifically for ETL Load processes.

Like most frameworks, you can write your process without them, however what the Master Map-Reduce Job (MMRJ) does is break down certain critical sections of the standard Map-Reduce job program into plugins that are named more specific to ETL processing, so it makes the jump from non-Hadoop based ETL to Hadoop based ETL easier for non-Hadoop-initiated developers.

I think this job is also extremely useful for the Map-Reduce pro who is implementing ETL jobs, or groups of ETL developers that want to create consistent Map-Reduce based loaders, and that’s the real point of the MMRJ. To create a framework for developers to use that will enable them to create robust, consistent, and easily maintainable Map-Reduce based loaders. It follows my SFEMS – Stable, Flexible, Extensible, Maintainable, Scalable development philosophy.

The point of the Master Map Reduce concept framework is to breaks down the Driver, Mapper, and Reducer into parts that non-Hadoop/Map-Reduce programmers are well familiar with; especially in the ETL world. It is easy for Java developers who build Loaders for a living to understand vocabulary like Validator, Transformer, Parser, OutputFormatter, etc. They can focus on writing business specific logic and they do not have to worry about the finer points of Map-Reduce.

As a manager you can now hire a single senior Hadoop/Map-Reduce developer and hire normal core Java developers for the rest of your team or better yet reuse your existing team and you can have the one senior Hadoop developer maintain your version of the Master Map-Reduce Job framework code, and the rest of your developers focus on developing feed level loader processes using the framework. In the end all developers can learn Map-Reduce, but you do not need to know Map-Reduce to get started writing loaders that will work on the Hadoop cluster by using this framework.

The design is simple and can be show by this one diagram:

Master_Map-Reduce_Job_Diagram

One of the core concepts that separates the Master Map-Reduce Job Conceptual Framework from a normal Map-Reduce Job, is how the Mapper and Reducer are structured and the logic that normally would be written directly in the map and reduce functions are now externalized into classes that use vocabulary that is natively familiar to ETL Java Developers, such as Validator, Parser, Transformer, Output Formatter. It is this externalization that simplifies the ETL job Map-Reduce development. I believe that what confuses developers about how to make Map-Reduce jobs work as robust ETL processes is that it’s too low level. You take a look at a map function and a reduce function, and a developer who does not have experience with writing complex map-reduce jobs, will take one look and say it’s too low level and perhaps even I’m not sure exactly what they expect me to do with this. Developers can be quickly turned off by the raw low level interface, although tremendously power that Map-Reduce exposes.

It is this code below that makes the most valuable architectural asset of the framework. The fact that we in the Master Map-Reduce Job Conceptual Framework have broken down the map method of the Mapper class into a very simple process flow of FIVE steps that will make sense to any ETL Developer. Please read through the comments, for each step. Also note that the same thing is done for the Reducer, but only the Transform and Output Formatter are used.

Map Function turn into a ETL Process Goldmine:


@Override

  public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

    String record;

    String[] fields;

    try {

      //First validate the record

      record = value.toString();

      if (validator.validateRecord(record)) {

        //Second Parse valid records into fields

        fields = (String[]) parser.parse(record);

        //Third validate individual tokens or fields

        if (validator.validateFields(fields)) {

          //Fourth run transformation logic

          fields = (String[]) transformer.runMapSideTransform(fields);

          //Fifth output transformed records

          outputFormatter.writeMapSideFormat(key, fields, output);

        }

        else {

          //One or more fields are invalid!

          //For now just record that

          reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

        }

      } //End if validator.validateRecord 

      else {

        //Record is invalid!

        //For now just record, but perhaps more logic

        //to stop the loader if a threshold is reached

        reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

      }

    } //End try block

    catch (MasterMapReduceException e) {

      throw new IOException(e);

    }

  }

Source Code for the Master Map-Reduce Concept Framework:

The source code here should be considered a work in progress. I make no statements to if this actually works, nor has it been stress tested in anyway, and should only be used as a reference. Do not use it directly in mission critical or production applications.

All Code on this page is released under the following open source license:

Copyright 2016 Robert C. Ilardi
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

MasterMapReduceDriver.java – This class is a generic Map-Reduce Driver program, which makes use of two classes from the MasterMapReduce concept framework, which are the “MasterMapReduceConfigDao” and “PluginController”. Both are responsible for returning configuration data to the MasterMapReduceDriver, as well as (we will see later on) the Master Mapper and Master Reducer. The MasterMapReduceConfigDao, is a standard Data Access Object implementation that wraps data access to HBase, where configuration tables are created that make use of a “Feed Name” as the row keys, and have various columns that represent class names, or other configuration information such as Job Name, Reducer Task number, etc. The PluginController is a higher level wrapper around the DAO itself, whereas the DAO is responsible for low level data access to HBase, the PluginController, does the class creation and other high level functions that make use of the data returned by the DAO. We do not present the implementations for the DAO or the PluginController here because they are simple PoJos that you should implement based on your configuration strategy. Instead of HBase for example, it can be done via a set of plain text files on HDFS or even the local file system.

The Master Map Reduce Driver is responsible for setting up the Map-Reduce Job just like any other standard Map-Reduce Driver. The main difference is that it has been written to make use the Plugin architecture to configure the job’s parameters dynamically.

/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.RunningJob;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

/**

 * @author Robert C. Ilardi

 *

 */

public class MasterMapReduceDriver extends Configured implements Tool {

  public static final String MMR_FEED_NAME = "RL.MasterMapReduce.FeedName";

  private MasterMapReduceConfigDao confDao;

  private PluginController pluginController;

  private String feedName;

  private String mmrJobName;

  private String inputPath;

  private String outputPath;

  public MasterMapReduceDriver() {

    super();

  }

  public synchronized void init(String feedName) {

    System.out.println("Initializing MasterMapReduce Driver for Feed Name: " + feedName);

    this.feedName = feedName;

    //Create MMR Configuration DAO (Data Access Object)

    confDao = new MasterMapReduceConfigDao();

    confDao.init(feedName); //Initialize Config DAO for specific Feed Name

    //Read Driver Level Properties

    mmrJobName = confDao.getLoaderJobNameByFeedName();

    inputPath = confDao.getLoaderJobInputPath();

    outputPath = confDao.getLoaderJobOutputPath();

    //Configure MMR Plugin Controller

    pluginController = new PluginController();

    pluginController.setConfigurationDao(confDao);

    pluginController.init();

  }

  @Override

  public int run(String[] args) throws Exception {

    JobConf jConf;

    Configuration conf;

    int res;

    conf = getConf();

    jConf = new JobConf(conf, this.getClass());

    jConf.setJarByClass(this.getClass());

    //Set some shared parameters to send to Mapper and Reducer

    jConf.set(MMR_FEED_NAME, feedName);

    configureBaseMapReduceComponents(jConf);

    configureBaseMapReduceOutputFormat(jConf);

    configureBaseMapReduceInputFormat(jConf);

    res = startMapReduceJob(jConf);

    return res;

  }

  private void configureBaseMapReduceInputFormat(JobConf jConf) {

    Class clazz;

    clazz = pluginController.getInputFormat();

    jConf.setInputFormat(clazz);

    FileInputFormat.setInputPaths(jConf, new Path(inputPath));

  }

  private void configureBaseMapReduceOutputFormat(JobConf jConf) {

    Class clazz;

    clazz = pluginController.getOutputKey();

    jConf.setOutputKeyClass(clazz);

    clazz = pluginController.getOutputValue();

    jConf.setOutputValueClass(clazz);

    clazz = pluginController.getOutputFormat();

    jConf.setOutputFormat(clazz);

    FileOutputFormat.setOutputPath(jConf, new Path(outputPath));

  }

  private void configureBaseMapReduceComponents(JobConf jConf) {

    Class clazz;

    int cnt;

    //Set Mapper Class

    clazz = pluginController.getMapper();

    jConf.setMapperClass(clazz);

    //Optionally Set Custom Reducer Class

    clazz = pluginController.getReducer();

    if (clazz != null) {

      jConf.setReducerClass(clazz);

    }

    //Optionally explicitly set number of reducers if available

    if (pluginController.hasExplicitReducerCount()) {

      cnt = pluginController.getReducerCount();

      jConf.setNumReduceTasks(cnt);

    }

    //Set Partitioner Class if a custom one is required for this Job

    clazz = pluginController.getPartitioner();

    if (clazz != null) {

      jConf.setPartitionerClass(clazz);

    }

    //Set Combiner Class if a custom one is required for this Job

    clazz = pluginController.getCombiner();

    if (clazz != null) {

      jConf.setCombinerClass(clazz);

    }

  }

  private int startMapReduceJob(JobConf jConf) throws IOException {

    int res;

    RunningJob job;

    job = JobClient.runJob(jConf);

    res = 0;

    return res;

  }

  public static void main(String[] args) {

    int exitCd;

    MasterMapReduceDriver mmrDriver;

    Configuration conf;

    String feedName;

    if (args.length < 1) {

      exitCd = 1;

      System.err.println("Usage: java " + MasterMapReduceDriver.class + " [FEED_NAME]");

    }

    else {

      try {

        feedName = args[0];

        conf = new Configuration();

        mmrDriver = new MasterMapReduceDriver();

        mmrDriver.init(feedName);

        exitCd = ToolRunner.run(conf, mmrDriver, args);

      } //End try block

      catch (Exception e) {

        exitCd = 1;

        e.printStackTrace();

      }

    }

    System.exit(exitCd);

  }

}


Code Formatted by ToGoTutor

BaseMasterMapper.java – This class is an abstract base class that implements the configure method of the Mapper implementation, to make use of the DAO and PluginController already described above. It should be extended by all your Mapper implementations you use when creating a Map-Reduce job using the Master Map Reduce concept framework. In the future we might create additional helper functions in this class for the mappers to use. In the end you only need a finite number of Mapper implementations. It is envisioned that the number of mappers is related more to the number of file formats you have, not the number of feeds. The idea of the framework is not to have to write the lower level components of a Map-Reduce job at the feed level, and instead developers should focus on the business logic such as Validation logic and Transformation logic. The fact that this logic runs in a Map-Reduce job is simply because it needs to run on the Hadoop cluster. Otherwise these loader jobs execute logic like any other standard Loader job running outside of the Hadoop cluster.

/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

/**

 * @author Robert C. Ilardi

 *

 */

public abstract class BaseMasterMapper extends MapReduceBase {

  protected String feedName;

  protected MasterMapReduceConfigDao confDao;

  protected PluginController pluginController;

  protected Validator validator; //Used to validate Records and Fields

  protected Parser parser; //Used to parse records into fields

  protected Transformer transformer; //Used to run transformation logic on fields

  protected OutputFormatter outputFormatter; //Used to write out formatted records

  public BaseMasterMapper() {

    super();

  }

  @Override

  public void configure(JobConf conf) {

    feedName = conf.get(MasterMapReduceDriver.MMR_FEED_NAME);

    confDao = new MasterMapReduceConfigDao();

    confDao.init(feedName);

    pluginController = new PluginController();

    pluginController.setConfigurationDao(confDao);

    pluginController.init();

    validator = pluginController.getValidator();

    parser = pluginController.getParser();

    transformer = pluginController.getTransformer();

    outputFormatter = pluginController.getOutputFormatter();

  }

}


Code Formatted by ToGoTutor

BaseMasterReducer.java – Just like on the Mapper side, this class is the base class for all Reducers implementations that are used with the Master Map-Reduce Job framework. Like the BaseMasterMapper class it implements the configure method and provides access to the DAO and PluginController for reducer implementations. Again in the future we may expand this to include additional helper functions.

/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

/**

 * @author Robert C. Ilardi

 *

 */

public abstract class BaseMasterReducer extends MapReduceBase {

  protected String feedName;

  protected MasterMapReduceConfigDao confDao;

  protected PluginController pluginController;

  protected Transformer transformer; //Used to run transformation logic on fields

  protected OutputFormatter outputFormatter; //Used to write out formatted records

  public BaseMasterReducer() {

    super();

  }

  @Override

  public void configure(JobConf conf) {

    feedName = conf.get(MasterMapReduceDriver.MMR_FEED_NAME);

    confDao = new MasterMapReduceConfigDao();

    confDao.init(feedName);

    pluginController = new PluginController();

    pluginController.setConfigurationDao(confDao);

    pluginController.init();

    transformer = pluginController.getTransformer();

    outputFormatter = pluginController.getOutputFormatter();

  }

}


Code Formatted by ToGoTutor

StringRecordMasterMapper.java – This is a example implementation of what a Master Mapper implementation would look like. Note that it has nothing to do with the Feed, instead it is related to the file format. Specifically this class would make sense as a mapper for a delimited text file format.


/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

/**

 * @author Robert C. Ilardi

 *

 */

public class StringRecordMasterMapper extends BaseMasterMapper implements Mapper {

  public StringRecordMasterMapper() {

    super();

  }

  @Override

  public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

    String record;

    String[] fields;

    try {

      //First validate the record

      record = value.toString();

      if (validator.validateRecord(record)) {

        //Second Parse valid records into fields

        fields = (String[]) parser.parse(record);

        //Third validate individual tokens or fields

        if (validator.validateFields(fields)) {

          //Fourth run transformation logic

          fields = (String[]) transformer.runMapSideTransform(fields);

          //Fifth output transformed records

          outputFormatter.writeMapSideFormat(key, fields, output);

        }

        else {

          //One or more fields are invalid!

          //For now just record that

          reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

        }

      } //End if validator.validateRecord 

      else {

        //Record is invalid!

        //For now just record, but perhaps more logic

        //to stop the loader if a threshold is reached

        reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

      }

    } //End try block

    catch (MasterMapReduceException e) {

      throw new IOException(e);

    }

  }

}


Code Formatted by ToGoTutor

StringRecordMasterReducer.java – This is an example implementation of what the Master Reducer would look like. It compliments the StringRecordMasterMapper from above, in that it works well with text line / delimited file formats. The idea here is that the Mapper parses and transforms raw feed data into a conical data model and outputs that transformed data in a similar delimited text file format. Most likely the Reducer implementation can simply be a pass through. It’s possible that a reducer in this case is not even needed, and we can configure the Master Map Reduce Driver to be a Map-Only job.


/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

/**

 * @author Robert C. Ilardi

 *

 */

public class StringRecordMasterReducer extends BaseMasterReducer implements Reducer {

  public StringRecordMasterReducer() {

    super();

  }

  @Override

  public void reduce(LongWritable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

    String data;

    Text txt;

    try {

      while (values.hasNext()) {

        txt = values.next();

        data = txt.toString();

        //First run transformation logic

        data = (String) transformer.runReduceSideTransform(data);

        //Second output transformed records

        outputFormatter.writeReduceSideFormat(data, output);

      } //End while (values.hasNext()) 

    } //End try block

    catch (MasterMapReduceException e) {

      throw new IOException(e);

    }

  }

}


Code Formatted by ToGoTutor

Conclusion

In the end, some make ask how much value those a framework like this add? Isn’t Map-Reduce simple enough? Well the truth is, we need to ask this for all frameworks and wrappers we use. Are their inclusion worth it? I think in this case the Master Map Reduce framework does add value. It breaks down the Driver, Mapper, and Reducer into parts that non-Hadoop/Map-Reduce programmers are well familiar with; especially in the ETL world. It is easy for Java developers who build Loaders for a living to understand vocabulary like Validator, Transformer, Parser, OutputFormatter, etc. They can focus on writing business specific logic and they do not have to worry about the finer points of Map-Reduce. Combine this with the fact that this framework creates an environment where you can create hundreds of Map-Reduce programs, one for each feed you are loading, and each program will have the exact same Map-Reduce structure, I believe this framework is well worth it.

Just Another Stream of Random Bits…
– Robert C. Ilardi
Posted in Big Data, Development | Leave a comment

Synthetic Transactions and Capability Monitoring of your Enterprise Architecture

Back in my days at Lehman Brothers, I was introduced to the concept of “Synthetic Transactions”. That is an automated action that is scheduled to execute periodically to monitor performance and availability of one of more components in your enterprise architecture.

Most architects will use SNMP, and simple pinging of servers, routers, networks, etc, and monitoring things like Disk Space, CPU Usage and Memory Usage. Pretty much anything that can be recorded via HP OpenView / HP BTO (Business Technology Optimization) I believe this is ok for infrastructure monitoring, but for application monitor, which I believe gives you a better view into the health of your Enterprise Architecture, that matters to the real users and clients, Synthetic Transactions are far more superior.

Synthetic Transactions go further than simple network or infrastructure monitoring and it goes further than even simple application performance metrics monitoring with say a tool like ITRS’s Geneos. A Synthetic Transaction is really about testing the capabilities of your systems and applications from the view point of a end user or a calling client system, to ensure that the system is available with the capabilities and performance profile agree upon by the contract set in your requirements.

Synthetic Transactions are not always easy to implement, and great care must be put into planning the inclusion of Synthetic Transactions from the beginning of system design and architecture analysis and should be part of Non-Functional Requirements.

Also in terms of Information Security, and Intrusion Detection, Synthetic Transactions are a way to start implementing the next phase of network defenses. As you all know in today’s work, firewalls are no longer sufficient to keep the hackers out of your systems. More and more hackers have already turned to attacking specific application weaknesses instead of going after the raw network infrastructure as the infrastructure was the first and easiest way for organizations to shore up their security.

While Synthetic Transactions won’t prevent cyber attacks, or increase security by themselves, the detailed level component monitoring and performance metrics collection that Synthetic Transactions provide can potentially help identify applications or components of applications that are under attack or have been compromised due to potential performance or application behavioral issues caused by hackers attacking your applications.

Microsoft has a good outline of what a Synthetic Transaction is, although they related it to their Operations Manager product, the general information is valid regardless if you use a tool or develop your own Synthetic Transaction Agents. Specifically Microsoft states in this article: “Synthetic transactions are actions, run in real time, that are performed on monitored objects. You can use synthetic transactions to measure the performance of a monitored object and to see how Operations Manager reacts when synthetic stress is placed on your monitoring settings. For example, for a Web site, you can create a synthetic transaction that performs the actions of a customer connecting to the site and browsing through its pages. For databases, you can create transactions that connect to the database. You can then schedule these actions to occur at regular intervals to see how the database or Web site reacts and to see whether your monitoring settings, such as alerts and notifications, also react as expected.”

Another good definition however more of just a summary than what Microsoft outlined, is available on Wikipedia in the Operational Intelligence article, specifically the section on System Monitoring where they state: “Capability monitoring usually refers to synthetic transactions where user activity is mimicked by a special software program, and the responses received are checked for correctness.”

Although, Wikipedia does not have a lot of direct information about Synthetic Transactions, I do like their term “Capability Monitoring”, which is exactly what Synthetic Transactions attempts to do, monitor the capabilities of your system at any given moment, to give you, your developers and your operations support staff a dashboard level view into how your system is performing and what components are available and their through the performance measures, what is the health of each of your system’s components and therefore the overall health of your system and applications.

Back at Lehman, and if you look at the Microsoft description, most times a Synthetic Transaction focuses on a single aspect of the System; for example, checking if you are able to open a connection to a database. While this is a valid Synthetic Transaction, it is extremely simple, and may not provide you with enough information to tell if you application is actually available from an end user or client system standpoint.

What I developed as a model for Synthetic Transactions back in 2006, was they ability for my Transaction to interact with multiple-tiers of my architecture, if not all tiers.

The application which I was developing Synthetic Transactions for was a Reference Data system that included a Desktop and Web base Front Ends, a JavaEE (J2EE at the time) based Middleware, a Relational Database, a Workflow Engine, and a Message Publisher, among other various supporting components such as ETL processes, and other batch processing.

The most useful test in this case would be one that touched the Middleware, interacted with the workflow engine, retrieved data from the database and potentially updated test records, and had those test messages published and received by the Synthetic Transaction Agent to verify the full flow of the system.

Creating the Agent:

To create the Agent that would initiate the Transactions, I used a Job schedule such as Autosys or Control-M to schedule the process to kick off every couple of hours to collect metrics (Since the application was a global app used 24 x 7, it was important that the application was not only available but was performant around the clock, and we needed to be alerted if the application was performing out of an acceptable range, and which component was affected).

The Agent itself was a client of the middleware. Since all services such as the Database and the Workflow Engine were wrapped by the middleware, we could have the agent invoke different APIs that would perform a Database Search and record metrics, and call an API that would create a Workflow request, and move it automatically through the workflow steps.

At the end of the workflow, we were able to trigger the messaging publisher to broadcast a message. Since our Data Model allowed for Test records, and we built into our requirements that consumers generally filter out or otherwise ignore Test records in the message flow, we were able to send out test messages in the production environment that would not affect any of our downstream clients.

However, our Agent process could start up a message listener and listen for test records specifically. The Agent then by recording the start time of the workflow transaction to the receive time of the test record message, could calculate the round trip time of data flowing through the system.

Each individual API call from invocation to return can also be timed to test how each different API was performing.

In terms of ETL, since the Data Model again allowed for test records, we were able to create a small file of test records and trigger the ETL process as well to load the test records. The records in the database would be updated, in some cases with just a timestamp update, but it would still be a valid test, and valid metrics can still be collected.

Together this gave us good dashboard view of the system’s availability and performance at a given time. If we wanted to increase the resolution all we had to do was decrease the period between each job start of the Agents.

We recorded the metrics in a database table, and created a simple web page, which production support teams could use to monitor the Synthetic Transactions and their reported metrics.

On a side note: If your APIs and libraries are written in Java, and already record metrics that your developers used for debugging, and Unit Testing, you can expose these directly via JMX, which can be accessed and used directly if your Synthetic Transaction Agent process(es) are also written in Java. Or you can create a separate function or API that returns the internal metrics recorded by your libraries, frameworks and API deployments.

A number of years ago, I developed a Performance Metrics object model and small set of helper functions for Java that I have been using for over a decade and I find that even today they are still the most useful performance metrics I can collect. Perhaps I will write up an article on collecting performance metrics in the applications you develop and share that simple object model and helper functions.

Automated alerts, such as paging the on call support staff could also be accomplished by simply specifying how many seconds or milliseconds a call to an API should take, and if that period is exceeded, the Agent would send out emails and paging alerts.

In the end a lot of organizations have a Global Technology and Architecture Principal that mandates all their applications have some sort of automated system testing.

This can be accomplished by using the Synthetic Transaction paradigm.

It is worth noting that creating an architecture that supports Synthetic Transactions is not simply. You need to ensure that all components, especially your data and information models allow for test records.

A way around the information model requirement is to Rollback all transactions on your database instead of committing them. This would force you to have a flag or special API separate from the normal data flow in your system to ensure data is not permanently written to your database. However, the issue here, is if you implement it this way, you cannot have a true end to end flow in production of test records. Still you will be able to get most of the metrics you need.

Also if you organization only mandates a certain level of automated testing or performance and availability monitoring, than perhaps true end to end data flow through your system is not required.

It is my experience however, that even if my company I work for does not mandate true end to end testing, as a responsible application owner, I prefer to have the capabilities to have true end to end data flow testing available to me, so I can monitor my systems more accurately and give proper answers to stakeholders when users and client systems complain about performance or system availability.

Just Another Stream of Random Bits…
– Robert C. Ilardi
Posted in Architecture | Leave a comment

Lightweight User Reference Object for Securing APIs

Back in 2005, I was face with developing a Secure Set of APIs that could run in multiple deployment configurations. At the time we were heavily developing EJB’s, specifically Stateless Session Beans. We were also starting to deploy SOAP based Web Services, and we were also packaging these same APIs in the form of standalone Libraries.

On a side note this will be my first article on Information Security Topics and developing Secure Applications. I recently have become increasingly interested in Penetration Testing and other Information Security topics, and I am even enrolled in classes and other forms of training. I have created the Security Category on my blog to organize security related topics on this web site. Hope they will help all of us create more secure applications.

Combined with what I call the Data Services Architecture and the Resource Bundle / App Resource Manager framework, I was able to create an architecture leveraging Factories, Mediators, Data Access Objects, and Facades to hide from the calling clients which “Mode” the APIs were running in, whether it was EJB, Web Services, or simply running from a Locally deployed Library on the classpath.

I was faced with the challenge of ensuring that no matter which operating or deployment mode these APIs which numbers in the hundreds of individual API methods, were all secure. Not only did a calling application or user have to Authenticate with a Single Sign On Services provided by the firm, I also needed to create an Entitlements framework that would allow fine grain, down to the individual method level Authorizations for each API.

As any good Developer that has any exposure to basic Information Security and Defensive Programming Techniques, this means that we only want to login once, so that we do not have to pass credentials to each API we call, and in doing so the established design for doing so is the assignment of a Securely Randomized Unguessable Session ID. This ID does NOT have to be the HTTP Session ID, which in the case of my requirements was only available technically when developing the Web Services.

Also, depending on your Application Server configuration and firm standards, you probably are running on a multi node cluster and some load balancers do not work very friendly with HTTP Session Replication and again depending on firm development standards they make not even allow you to turn on Session Replication. And some may even have a requirement NOT to turn on session stickiness.

My solution was to develop two components, one is called the Stateless User Cache, which is responsible for creating and management Sessions across Clusters of Application Servers without App Server Session Replication, and also allows for the use of this Stateless User Cache to operate correctly in Standalone locally classpath deployed environments such as Library Mode.

We will go over Stateless User Cache in a future blog article in more detail, but I wanted to mention it hear, because it is tied to the Lightweight User Reference Object.

So basically I provide an API usually called ssoLogin which wraps the firms Single Sign On Service, whether its something like authenticating against LDAP or Active Directory, or something like a vendor product such as Site Minder.

The ssoLogin method will NOT return a User object which contains all entitlements, but instead will leverage the Stateless User Cache to create a new “Session” store the User object in that session, and return a “Reference” or “Pointer” object to that session.

In this case you can thing of it as an Object form of a Session ID.

The Class looks something like this:

public class UserRef implements Serializable

{

private String sessionId;

private long loginTimestamp;

private long lastTouchTimestamp;

private String userId; //Insecure if the user id is Private, see notes below.

//Getter and Setter methods…

//HashCode and Equals methods…

}

Basically as you can see the UserRef object provides 3 to 4 bits of information. The fourth, being the userId, can be the username or a unique surrogate key or even better a transient key that does not map to the real database stored user id.

However it can be the real username or surrogate key depending on the application security requirements. Let’s take for example the case of a Instant Messaging Application. The Username is public information an it makes sense for the client to have a list of usernames the currently login user has on their buddy or friends or contact list. In this case there is no real secure issue for storing the username in this field because it is public shared information.

However in applications where usernames and ids are not required or never needed to be shared, that we should leave this field Null, or remove it from the UserRef object itself.

One advantage to having the userId in the UserRef is if you have the same user or application logging in more than once, and you want to tie together different Session Ids to the same user, and for whatever requirement you have, it is needed by the client to be able to lookup the other sessions or in some way communicate with the other sessions.

Now as a side note, technically this user id whether real or transient and secure generated and mapped on the server side to the real underlying user id, does not need to be send back to the client. The unique session id is good enough, and you can store the user id for same user owned session ids on the server side, which is much more secure, but I have found by experience that in my enterprise applications sometimes I need to expose the username or user id to the client side, and I usually do this through UserRef. Again you need to perform Security Use Cases to determine if having this bit of information opens any vulnerabilities on your applications and any potential exploits can be created to take advantage of that vulnerability. One vulnerability this make open up is Username recon and collection and potential Spear Fishing attacks, or User Id enumeration if the Ids are insecurely generated such as simple sequence numbers.

In any case, the UserRef with at minimum the sessionId field is required, and the other information can be added or removed as you require for your applications, however the more the client side needs to do without communicating with the server, especially if the API suite is used by not only Web Applications but Desktop Applications, or perhaps Batch Applications and Server side Daemon Processes, the more information you may need to include in your lightweight User Reference Object.

The next step is to required all developers on your team to include the mandatory field UserRef in all their API methods as a required Parameter.

Than you can a combination of the Stateless User Cache if you have something similar to it, or the HTTP Session to use the UserRef object as the Key to lookup the full User object which contains User Entitlements.

In a future post I will do a write up on my Entitlements Object Model so you can see how I store Entitlements or Authorization information in memory.

Usually I will create methods such as public boolean hasAccess(UserRef userRef, String apiName) throws AppSecurityException;

And require all my developers to ensure in the Mediator or Facade code that hides all Data Access Objects and other Service Handler objects to firm check if the user has access to the method by ensuring they make a call to “hasAccess” first.

Its easy to do a code review or even write a script that automatically scans your source code to ensure every method has a call to hasAccess.

One important node is, of course the login method, in this case “ssoLogin” would normally be the only method that does not make a call to hasAccess as all users should have implicit access to this method, and even users that do not exist in your security databases or LDAP directories, will simply get a Login failed message.

Remember do not give potential hackers hints, if they guessed a username correctly. Instead use the generic Login Failure Message: “Username and/or Password are invalid.”

In this case the system does not give them a hint whether the username actual exists or if they simply got the password incorrect.

Finally the since the UserRef object is small, it has a smaller impact on I/O when transferring the object remotely via EJB or Web Services calls. Much smaller I/O footprint then passing the entire User object, which besides being highly insecure, also can be a performance issue.

Let me know what you think of my User Reference Object and solutions to securing APIs or for that matter any method you want secure. I would love to hear from Developers and Penetration Testers alike!

Finally, and I will probably write an entire post on this, but you can find plenty of information out their on the web. Make sure when you generate your own session id, you use secure randomization so the Session Id token is unguessable and incapable of being enumerated through a simple algorithm.

In Java there’s two very simple solutions, use the SecureRandom class and NOT the Math.random or Random class, or you can even use the UUID class to create a globally unique identifier.

Just another stream of Random Bits…

-Robert C. Ilardi

Posted in Architecture, Development, Security | Leave a comment

Writing a Good Job Description for Hiring Core Java Developers

As a Development Manager or a Team Lead, you often need to write up Job Descriptions which include a brief description of the Team, and the Role’s responsibilities. Some people also include a description of the company, but I usually don’t find this necessary if you work for a large well known company. However if you are a startup or smaller business you might want to include a short paragraph on your company.

I mostly focus on stating in the description the roles responsibilities and often include the following phrase: “This role requires hands on coding in [LANGUAGE] on a daily basis. This is NOT a Lead, Manager, or Architect position. However you may be required to participate in architectural discussions as needed. This is a [Senior | Mid-Level | Junior] Software Developer Role.”

Often when hiring Senior Developers, you will fine many candidates have already ventured into the leadership or management or architecture roles and do not really want to code on a daily basis, and you need to make it clear to all perspective candidates that you require hard core development.

When hiring you want to ensure you not only get candidates that CAN code, but candidates that WANT to code. I can’t tell you how many times I’ve seen groups hire very smart people who can develop, but just don’t have the drive anymore to code on a daily basis, and when you need developers you have to ensure you hire committed coders.

In addition to Team Description, Role Responsibilities, and potentially a short description on the company, the most important section of the Job Description write up is the required skill set.

When hiring Core Java Developers, which is my preferred developer title to hire for both Back End and Middleware work, I normally use a list of skills that include the following. I added comments to the list for the purpose of this article to help not only hiring managers but help potential candidates as well understand the method behind my madness:

  • Required Skills
    • Core Java (Java Version: 7+ ; or whatever version your organization requires)
    • OOP/OOD in Java
      • Interfaces, Classes, Polymorphism, Inheritance
      • Question I usually ask: Can you have an Empty Interface (Also called a Marker Interface), and what is it used for. What is the most popular Marker Interface that is standard with Java.
      • Design Patterns (GoF Patterns)
        • I always test for the Singleton Pattern.
        • Other common ones I frequently test for: Factory, Observer/Observable, Visitor
        • Less Frequently I’ll test for: Command and Chain of Responsibility
        • There are many other GoF (Gang of Four) Patterns, but I would get a hang of Java first.
        • Note: GoF Patterns are not limited to Java. You can implement these design patterns in any Language. The 4 authors of GoF (where the Four in Gang of Four come from) even state, they really just gave names to Design Patterns programmers have been using for decades, and they are right, a lot of the patterns you will recognize as designs you implemented yourself without even knowing the “official” name.
    • Collections (Lists, Maps, Sets)
      • Also able to code with Arrays directly
      • Able to use Arrays in the place of: Lists, Maps, and Sets efficiently
      • We test deep knowledge of collections. Like how to man a custom key work correctly in a HashMap.
      • Note on Arrays:
        • Because of complex Data Structures I use Arrays are also very important, even though Arrays may seem simpler, they are actually harder to use correctly and to your advantage, which is why collections were invented in the first place.
        • In a traditional Computer Science program you will learn and use Arrays before you even talk about collections.
        • Simple collects can be made from Arrays, which is why I state above, “able to use Arrays in place of the various collections”.
    • Exception Handling
      • Knowing how to handle the different types of Exceptions:
        • Checked (Any Exception that Extends from the class Exception)
        • Unchecked (Any Exception that Extends from the class RuntimeException)
      • What a Throwable is
      • What and Error object is and how it is different from an Exception.
      • Logging of exceptions. It’s a Plus to know: Log4J, but I usually don’t test for it. As long as the person knows why it’s important to log exceptions.
    • Direct JDBC experience
      • Ability to use JDBC to Call Direct SQL Statements for both Query and Updating
      • Ability to use JDBC to Call Stored Procedures
      • Knowledge of Transaction Control using JDBC
      • Basically know what the following classes are and how to use them: DriverManager, Connection, Statement, PreparedStatement, CallableStatement, ResultSet, big plus: ResultsetMetadata.
      • I don’t usually care about ORM (object-relational mapping) Frameworks, such as Hibernate. Actually I hate Hibernate and frameworks like them, even though in some projects they are very popular.
      • Note on Connection Pools:
        • When you are building Web Apps and other applications hosted on an Web Server or more generally an Application Server like WebLogic or WebSphere or JBoss, you will usually use a JDBC Connection Pool. Knowledge of Connection Pools technically fall into the JavaEE space, and not the Core Java space, but in the real world, you will most likely be using connection pools to create your connections and NOT the JDBC DriverManager directly. However besides Connection creation, everything else is the same when doing Direct JDBC or JDBC via a Connection Pool.
        • I state “Direct JDBC Experience” because this is how you can separate the Men from the Boys. A real developer will know how to create Connections directly from the JDBC DriverManager. More junior programmers may be part of a team that hides all the JDBC stuff from them and they just somehow magically get a connection to a Database in their code. Perhaps by using a Utility class a more senior developer on the team created for the team to use. In my projects we always do this, I architected an entire framework called App Resource Manager to handle JDBC connection management, whether it’s using Pools or Direct DriverManager.
    • Strings and I/O
      • Ability to read large raw data files and parse them into usable tokens for DB Loading or other processing
      • String Matching and Manipulation
        • Matching: Basically the String object’s indexOf, startsWith, endsWith, lastIndexOf, plus RegEx (Regular Expressions) ie. the “matches” method.
        • Manipulation: Building of Strings using StringBuffer or StringBuilder, plus the String’s split, substring, trim and replace methods.
        • String Parsing / Tokenizing
          • The String’s “Split” method is used more and more over the StringTokenizer class these days.
          • Basically if you are reading a Delimited Text File, like a “|” Pipe/Vertical Bar or a Comma Delimited File, or anything similar, you will be splitting or tokenizing each line.
          • It’s all about File Parsing or User Input Parsing.
        • Reading and Writing from/to Properties Files, XML, Plain Text Files.
    • Experience with Multi-Threading
      • Synchronization
        • Block Level
        • Method Level
        • Static Method / Class Level
      • Thread creation and control
        • Runnables and the Thread class
          • Creating a Thread using the Runnable Interface verses extending the Thread Class.
        • Wait, Notify, NotifyAll
          • The Class Consumer/Producer Problem.
    • Basic SQL Knowledge is a must.
      • DML: Ability to write Select, Insert, Update, Delete, statements
      • DDL: Ability to Create Tables is a Plus but not required. I usually either do the data modeling and table creation myself or one of a hand full of trusted developers I can trust to design good tables, indexes, and constraints.
      • Stored Procedures – Writing Stored Procs is a plus, but not required, I usually have a good SQL developer on my teams.
    • XML
      • Familiarity with JAXB
      • Knowledge of : SAX, DOM, STAX
    • Knowledge of Java Annotations
      • I used to test for this less often, but Annotations have become so wildly used, I started testing this more.
      • Mostly someone just needs to know how to apply annotations, not create their own annotations. I have created my own annotations in projects before, but most development solutions probably won’t need custom annotations, just use the ones that come with Java or were created for a particular Library

After the Required Skills sections, I usually add a Optional Skills sections that are a Plus, which usually include more JavaEE, Web Services, Linux/Unix Scripting, and Command Line, and depending on what products we are using, a DBMS, Hadoop Ecosystem Components, Workflow Engines, Messaging Services, and whatever else is specific to the project I’m hiring for.

Hope this post helps both hiring managers write better Job Descriptions and Candidates or Students of Programming who are looking for a syllabus to study to get ready to apply to jobs and enter the Software Development Job Market.

Just Another Stream of Random Bits…

– Robert C. Ilardi

Posted in Software Management | Leave a comment