avro to pojo converter

AVRO to POJO converter

AVRO to POJO converter is a code generation tool which converts AVRO schema to POJO /java classes. for more details about AVRO schema please refer http://www.cloudera.com/blog/2009/11/avro-a-new-format-for-data-interchange/ and http://avro.apache.org/docs/1.3.2/spec.html

Example AVRO schema

 {
  "type":"map",
  "values":
  {
   "type":"record",
   "name":"person",
   "fields":
    [
     {
      "name":"personal",
      "type":
      {
       "name":"personalData",
       "type":"record",
       "fields":
       [
       {"name":"first_name","type":"string"},
       {"name":"last_name","type": "string"},
       {"name":"age","type": "integer"},
       {"name":"marital_status","type":"string", "default" : "single"},
       {"name":"address","type":["string","null"]},
       ]
      }
     },
     {
       "name":"professional",
       "type":
       {
        "name":"professionalData",
        "type":"record",
        "fields":
        [
         {"name":"school","type":"String"},
         {"name":"college","type":["String","null]},
         {"name":"occupation","type":"String"},
         {"name":"job_role","type":"String"},
         {"name":"achievements","type":"String","default":"none"},
        ]
       }
      }
    ]
   }
  }

the above is a AVRO schema of type Map with an instance id key ,and value with a Record which in turns consist of 2 Records, one for the person’s personal details and one for his professional details.

High level details of this tool

– parsing the AVRO schema files defined in JSON. a JSON parsing API (Jackson API) is used to parse the input AVRO.

– writing the code using JCodeModel API.

Advantages of this tool

– auto-generate POJO code for a list of schema files, one POJO class per schema file which when manually written can be time consuming for POJO having lot of attributes. whenever the data model is changing frequently, POJO generator tool can help to model the changes very quickly and hence save time and improve productivity.

– a custom builder to build and populate objects from an input map.

– integrating relational model into the POJO based on a known parent child relationship from a metamodel represented in graphML format. this integration can be controlled by a tool configuration property

– this tool can further be extended by getting input from different sources and not restricting it to only AVRO schema input. for example a database table description output can be input to this tool.

A JSON parsing API to parse AVRO

– I have chosen Jackson API as one among the many JSON parsing API/libraries available.

– build the root JsonNode of the schema using the Jackson’s ObjectMapper API. A code snippet of the same looks like below

 InputStream is = new File("src/test/resources/schema/PERSON").toURI().toURL().openStream();
 JsonNode node = new ObjectMapper().readTree(is);

– iterate over the {key,value} mappings in the JSON node, and for every key type, add a handler to process a particular key. for example when a schema name is encountered while parsing the JSON start creating the java class, and whenever a field is encountered , add a corresponding field to the java class and also add a getter and setter method for the new field.

– to check if a JSON node has a key property, it can be done using JsonNode.has(“key”) call.

– to get the contents of a existing key in a JSON node, use JsonNode.get(“key”) call.

– to check if a JSON node is an array node, use JsonNode.isArray() , and to check if a node is null valued or not , use JsonNode.isNull() check.

– to get the text value of a JSON node, use JsonNode.getTextValue() method.

– to get the elements of an array JSON node, use JsonNode.getElements() call.

– explore more on the org.codehaus.jackson.JsonNode API here at http://jackson.codehaus.org/1.7.9/javadoc/org/codehaus/jackson/JsonNode.html to find out needed methods, or other utility and convenience methods available in this API.

– add a builder method based on a configuration property when set to true , which can assist in populating objects from a input attributes map parameter.

Usage of the JCodeModel API

– JCodeModel is the standard API used for code generation. the API documentation is here – http://codemodel.java.net/nonav/apidocs/com/sun/codemodel/JCodeModel.html

– this tool uses the JCodeModel API to auto-generate the datamodel.

– the JCodeModel API can be used to create a new java class, add fields to the class, add getter and setter methods, business logic methods, and all programming language constructs.

– to create a new class inside a package, code it as below

         JCodeModel cm = JCodeModel();
         JPackage jp = cm._package("com.example");
         jp._class("MyClass");

– to add a javadoc comment at the class level code it as below

        JDocComment jd = jc.javadoc();
        jd.add("my class");

– to add a java field to the class code it as below

        JFieldVar v = jc.field(JMod.PRIVATE, jc.owner().ref(Integer.class).unboxify(), "var1", JExpr.lit(5));
        JFieldVar v2 = jc.field(JMod.PRIVATE, jc.owner().ref(String.class), "var2", JExpr.lit(100));

the above code snippet will add an integer , and string variable to the class like below

        int var1 = 5;
        String var2 = "100";

– to invoke a static method on a class, invoke it as below

        JInvocation i1 = cm.ref(Integer.class).staticInvoke("parseInt").arg(v2);

– to add a method to a class

          JMethod jm = jc.method(JMod.PUBLIC, String.class, "m1");
          jm.param(String.class,"data");

this defines a public method with name “m1” and returns a String data type, and takes a String input parameter.
– to add statements to the body of the method

          JBlock jb = jm.body();
          JVar var1 = jb.decl(cm.ref(Integer.class).unboxify(), "i1");
          jb.assign(var1, i1);

– to add a direct statement to be written to the output use

        JBlock.directStatement("System.out.println("Hello world"));

– to add a print statement

         JClass stringRef = cm.ref(String.class); 
         JVar stringVar = jb.decl(stringRef, "stringVar");
         stringVar.init(JExpr.lit("abc"));
         JInvocation print = cm.directClass("System").staticRef("out").invoke("println").arg(stringVar);      
         jb.add(print);

– to add conditional statements to the method

         JConditional _if = jb._if(stringVar.ne(JExpr._null())   
         JBlock then = _if._then();
         JInvocation valueOf = stringVar.invoke("valueOf");
         valueOf = valueOf.arg(JExpr.lit(200));
         then.assign(stringVar, valueOf);

– to define a data type with generics included define it as below

        JClass propertiesMapType = cm.ref(Map.class);
        propertiesMapType = propertiesMapType.narrow(cm.ref(String.class), cm.ref(Object.class) );
        JClass propertiesMapImplType = jclass.owner().ref(HashMap.class);
        propertiesMapImplType = propertiesMapImplType.narrow(cm.ref(String.class), cm.ref(Object.class));
       JFieldVar field = jclass.field(JMod.PRIVATE, propertiesMapType, "properties");
       field.init(JExpr._new(propertiesMapImplType));

the above code snippet will generate code as below

         Map<String,Object> properties = new HashMap<String,Object>();

– to return a value from a method call

        jb._return(stringVar);

– and to finally build the code model use

        cm.build(new File("target.path"));

explore more on this above JCodeModel API to explore other methods, adding annotations at class, variable and method level, support for generics, utility methods and convenience methods available in this API.

Add a custom builder method

– a custom builder method can also be generated inside the POJO based on a configuration property which can facilitate in populating objects based on a input map parameter.

– advantage of a builder method is it is much more faster compared to other methods such as reflection to populate objects, and since it is auto-generated, there will be no need to manually write a builder.

– again JCodeModel API is used to write the builder method ,and it gets JSON properties from an input map, and invokes the corresponding setter method in the POJO.

Example Builder method

the builder method used to populate objects can look like this

 public static Person buildPerson(Map<String,Object> personMap) {
   Person p = new Person();
   p.setFirstName(personMap.get("first_name"));
   p.setLastName(personMap.get("last_name"));
   p.setAge(personMap.get("age"));
   p.setMaritalStatus(personMap.get("marital_status"));
   return p;
  }

Integrate relational model into the POJO

– based on a metamodel information enumerating parent child relationships, we can further extend this tool to integrate the relationships into the POJO.

– metamodel data is required to be in graphML format, and using a prefuse library which provides the API to parse graphML data.

– pare the graphML using prefuse API , get all the nodes in the graph.

– iterate over the nodes, and for each node , get the outDegree /outNeighbors, and inDegree/ inNeighbors. based on this information build a map containing parent to child mappings for a context node. for example key of the map = context node’s objectName, value = list of child objectNames.

– set a configuration property = build.relational.pojo = true or false. if true, then initialize the metamodel and build the parent to child map.

– and when a schema file is parsed and the corresponding POJO for it is being created, add instance variables representing the parent and children for the context POJO which is being built. children will be one to many mapping ,so it will be of type list, and parent will be one to one mapping ,so it will be of the parent type.

Tool configuration properties

a tool-config.properties file is written to configure the properties for this tool. below are the defined properties

source.directory= the source directory where the AVRO schema files are present

target.directory = the target directory where the POJO classes should be auto-generated

target.package = the target package for the POJO classes

word.delimiters = the word delimiter in the JSON/AVRO property names which should be used as a delimiter when converting JSON property names to java bean style property names in camel case, for example if _ (underscore) character is to be used as word delimiter in the JSON property name, mention the same character here. and a JSON property name like attr_name will reflect as attrName in the POJO.

generate.builders = true/false , if true will generate builder methods.

generate.relational.pojo = true, if true will generate relational POJO’s. Note – Relational POJO can be built only if metamodel information enumerating parent child relationship is available in graphML format.