Camel development series – Part 2

CSV file map to Json file

The is the second part of the series. If you are a complete beginner go to part 1 . This series will touch upon mapping, specifically how to map from a very simple CSV (comma separated file) to a Json structure.

I will keep everything simple and add more complexity in subsequent series. So some prerequisites:

  1. The CSV file will only contain a single row. Handling mulitple rows will be in part 3 of the series.
  2. The Json structure will only contain field names and values.
  3. No error handling is done.

So, what do we want to do? We want to pick up a file from a specific folder, map the single row to an object in order to extract certain values and then map it to the output json format. Finally we save the json string to a file.

Camel vs IBM Integration on mapping

There is something to be said for those like me coming from another background in the integration world, specifically having use IBM WMB/Integration Bus. I think learning Camel and how routes and routebuilder works is somewhat ok once you get used to it. The error handling and logging is ok too, but the big jump from a conceptual point of view is how you do mapping.

Message Parsers

In the IBM Integration Bus you use parsers. Parsers are the key to mapping. You have the XMLNSC parser for parsing XML data, JSON parser for parsing JSON data and DFDL parser for parsing anything else. These are extremely powerful parsers and specifically the DFDL parser is incredible because you are able to model any kind of data as long as it can be described in some way. You can in practise model EDIFACT, MARC, or other complicated formats. Yes, it takes time but it can be done and then you use the parser to match the incoming data with the model you created. The DFDL parser can be debugged and tested as well which makes it very versatile.  What is the big drawback? Price. It is great for large corporations who don’t mind paying large license fees and have a heavy IBM presence. Off course ultimately it means you are locked down to IBM as well since it is proprietary tool. These are factors to consider when you choose your integration tool.

Everything is an object

In the Camel world, since it is java based, there is no concept of parsers or data modelling. This was a big conceptual change for me. Instead it is going back to basics and understanding that everything is an object. This includes a single row in a CSV file. That row is an object is as well. Multiple rows are simple list of objects. With objects we off course mean java objects defined in a class. Understanding this helps to understand the mapping. The big advantages are that for relatively simple formats there is built in support in Camel such as bindy or beanio so you can create your object model and do your mapping. However these components lack to the powerful modelling that exist in the DFDL parser which can model any type of data format. So for very complicated data formats you may need to go back to standard java code implemented in a Processor. There is off course an advantage to this as well. It is easy to read, it is not tied to a tool and it is cheaper. Again, this is something to consider when selecting your integration tool.

Project POM file

So let’s start with the actual code. As in series 1, I am basing this on structure of letting blueprint start my routebuilder and inject beans whilst I write the main code in java.

My POM file dependencies look as follows:

<dependency>
      <groupId>org.apache.camel</groupId>
      <artifactId>camel-core</artifactId>
      <version>2.16.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.camel</groupId>
      <artifactId>camel-blueprint</artifactId>
      <version>2.16.1</version>
    </dependency>
        <dependency>
      <groupId>org.apache.camel</groupId>
      <artifactId>camel-bindy</artifactId>
      <version>2.16.1</version>
    </dependency>
        <dependency>
      <groupId>org.apache.camel</groupId>
      <artifactId>camel-jackson</artifactId>
      <version>2.16.1</version>
    </dependency>

The two new dependencies are camel-bindy for modeling the CSV data and camel-jackson for creating the Json structure.

Blueprint file

My blueprint.xml is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"        xsi:schemaLocation="        http://www.osgi.org/xmlns/blueprint/v1.0.0 http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd        http://camel.apache.org/schema/blueprint http://camel.apache.org/schema/blueprint/camel-blueprint.xsd">

  <bean id="CsvToJsonRouteBuilder" class="org.souciance.integration.CsvToJson.CSVToJsonRouteBuilder">
  </bean>
  <bean id="IdentityToJson" class="org.souciance.integration.CsvToJson.IdentityToJson">
  </bean>

  <camelContext xmlns="http://camel.apache.org/schema/blueprint">
    <routeBuilder ref="CsvToJsonRouteBuilder" />
  </camelContext>
</blueprint>

The main thing to notice is that we are using the routeBuilder tag to reference our CsvToJsonRouteBuilder class and injecting the bean IdentityToJson which is referring to the class IdentityToJson. This class is where we map to the Json structure.

RouteBuilder class

The CsvToJsonRouteBuilder class looks like this:

package org.souciance.integration.CsvToJson;

import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.dataformat.bindy.csv.BindyCsvDataFormat;

/**
 * A bean which we use in the route
 */
public class CsvToJsonRouteBuilder extends RouteBuilder {

	@Override
	public void configure() throws Exception {
		// TODO Auto-generated method stub
		BindyCsvDataFormat bindy = new BindyCsvDataFormat(org.souciance.integration.CsvToJson.Identity.class);
		from("file:C:/test/?fileName=input.csv")
		.unmarshal(bindy)
		.to("IdentityToJson")
		.to("file:C:/test/?fileName=output.json")
		.log("done!")
		.end();

	}
}

A couple of things to notice here.

Firstly, we are creating a new bindy data format which is CSV based and we are referring to our Identity class. Then we pick up the file using the file component and the file is called input.csv.

Secondly, in the dsl we do an unmarshal(bindy) which means, we want to map the incoming data to the structure defined in the Identity class.

Thirdly, we sent the Identity object to the bean ”IdentityToJson” as defined in the blueprint.xml to map to the Json structure.

Finally we save it as as file called output.json and log ”done!”.

That is all that is needed.

Data model class : Identity

Let us look at the Identity model class where bindy is used.

package org.souciance.integration.CsvToJson;

import org.apache.camel.dataformat.bindy.annotation.CsvRecord;
import org.apache.camel.dataformat.bindy.annotation.DataField;

@CsvRecord(separator = ",")
public class Identity {

	@DataField(pos=1)
	private int identity;
	@DataField(pos=2)
	private String firstname;
	@DataField(pos=3)
	private String lastname;
	@DataField(pos=4)
	private int phone;
	@DataField(pos=5)
	private String country;
	public int getIdentity() {
		return identity;
	}
	public void setIdentity(int identity) {
		this.identity = identity;
	}
	public String getFirstname() {
		return firstname;
	}
	public void setFirstname(String firstname) {
		this.firstname = firstname;
	}
	public String getLastname() {
		return lastname;
	}
	public void setLastname(String lastname) {
		this.lastname = lastname;
	}
	public int getPhone() {
		return phone;
	}
	public void setPhone(int phone) {
		this.phone = phone;
	}
	public String getCountry() {
		return country;
	}
	public void setCountry(String country) {
		this.country = country;
	}

}

This is a very simple way of modelling data in Camel using the bindy component.

Step 1), you use the annotation @CsvRecord(separator = ”,”) to say that this is a CSV model and the separator to delimit each field is a comma. This needs to be done above the class declaration.

Step 2) you write down the fields in the CSV row. Above each field you annotate it with @DataField(pos=X) where pos is the order in which the field appears in the actual row.

Step 3) You add the getter and setters for each field.

That is all that is required for bindy for our simple row of data. For more on bindy and all the annotations you can use see here.

Mapping to Json

Finally, we have the IdentityToJson class where we map to Identity object to json format. Here is how that class looks like:

package org.souciance.integration.CsvToJson;

import java.io.ByteArrayOutputStream;

import org.apache.camel.Exchange;
import org.apache.camel.Processor;

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.JsonNodeFactory;
import com.fasterxml.jackson.databind.node.ObjectNode;

public class IdentityToJson implements Processor {

@Override
public void process(Exchange exchange) throws Exception {
// TODO Auto-generated method stub

//initialize Jackson
JsonNodeFactory factory = new JsonNodeFactory(false);
JsonFactory jsonFactory = new JsonFactory();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
JsonGenerator generator = jsonFactory.createGenerator(outputStream);
ObjectMapper mapper = new ObjectMapper();

//get the Pojo with the CSV data
Identity identity = (Identity)exchange.getIn().getBody();

//map to Json
ObjectNode id = factory.objectNode();
id.put("identity", identity.getIdentity());
id.put("firstname", identity.getFirstname());
id.put("lastname", identity.getLastname());
id.put("phone", identity.getPhone());
id.put("country", identity.getCountry());

//write the json string to the exchange
mapper.writeTree(generator, id);
String json = new String(outputStream.toString());
exchange.getIn().setBody(json);
}
}

The main aspects are related to Jackson code. The first part is Jackson initialization where we create a JsonNodeFactory, JsonFactory and ByteOutputStream because we want to write the data to the stream which will be transformed to a string at the end.

In the second part we convert the body in the exchange to an instance of Identity. This is the crucial part because now we have access to the row data and can do our mapping.

The next part creates a node and starts putting in values in the json structure.

Finally we write the json string to the exchange.

Input and Output

When you run this with input.csv containing the row:
12345,souciance,eqdam rashti,012458478,Sweden

your output.json should be
{
”identity”:12345,
”firstname”:”souciance”,
”lastname”:”eqdam rashti”,
”phone”:12458478,
”country”:”Sweden”
}

Feel free to leave comments. In part 3 we will go through mapping multiple rows of data. Stay tuned!

Annonser

En reaktion på ”Camel development series – Part 2

Kommentera

Fyll i dina uppgifter nedan eller klicka på en ikon för att logga in:

WordPress.com Logo

Du kommenterar med ditt WordPress.com-konto. Logga ut / Ändra )

Twitter-bild

Du kommenterar med ditt Twitter-konto. Logga ut / Ändra )

Facebook-foto

Du kommenterar med ditt Facebook-konto. Logga ut / Ändra )

Google+ photo

Du kommenterar med ditt Google+-konto. Logga ut / Ändra )

Ansluter till %s