Convert Flat File to XML

Previous tutorials explained how to use XQuery to deal with EDI messages, for example, how to transform a directory of EDI messages, create EDI messages out of your database, converting tab delimited files into EDI, etc. But sometimes you're not dealing with EDI; companies still frequently rely on proprietary flat file formats to exchange or circulate information.

A Flat File to XML Example

Consider the following example of a user who needed to convert a file in a proprietary flat file format into a specific XML structure. The sample file looks like this:

10       0123456789MR JOHN            SMITH
20       0123456789ACCT1
30       0123456789SALARY       500000
30       0123456789BONUS          1000
20       0123456789ACCT2
30       0123456789OTHER           100
10       1234566790MR DAVID           DOE
20       1234567890ACCT1
30       1234567890SALARY        10000

This isn't EDI, CSV, tab delimited or any of the other standard or semi-standard ways to represent information in text files; it's a proprietary format based on a specific interpretation of the size and position of the fields. The following is the description of records and fields in the proprietary flat file format:

one block is made of:
  one record 10 (customer)
  1 to N record 20 per record 10 (accounts)
  1 to N record 30 per record 20 (transactions)

record 10
  record type on 10 characters
  customer id on 10 characters
  customer lastname on 20 characters
  customer firstname on 30 characters

record 20
  record type on 10 characters
  customer id on 10 characters
  account id on 10 characters

record 30
  record type on 10 characters
  customer id on 10 characters
  label on 10 characters
  value on 10 characters

How can you deal with such a flat file, without building our own parser? DataDirect XML Converters and Stylus Studio can definitely help! XML Converters support "custom conversions"; a custom conversion defines how files belonging to the same format family should be converted to XML by XML Converters. As the definition of a custom conversion can be a tedious task, Stylus Studio provides an editor that allows you to create custom conversion through an intuitive graphical interface. The easiest way to explain how that would work is to watch short online video tutorial which describes how to convert a flat file to XML using Stylus Studio.

Thanks to the custom conversion created using Stylus Studio, we are now able to manipulate the proprietary format file as XML; but, as mentioned above, the user needs to convert the file into a specific XML format. Using the example above, the desired XML result should look like this:

 <root>
<record_10>
<type>10</type>
<customer_id>0123456789</customer_id>
<lastname>MR JOHN</lastname>
<firstname>SMITH</firstname>
</record_10>
<record_20>
<type>20</type>
<customer_id>0123456789</customer_id>
<account_id>ACCT1</account_id>
</record_20>
<record_30>
<type>30</type>
<customer_id>0123456789</customer_id>
<label>SALARY</label>
<value>500000</value>
</record_30>
<record_30>
<type>30</type>
<customer_id>0123456789</customer_id>
<label>BONUS</label>
<value>1000</value>
</record_30>
<record_20>
<type>20</type>
<customer_id>0123456789</customer_id>
<account_id>ACCT2</account_id>
</record_20>
<record_30>
<type>30</type>
<customer_id>0123456789</customer_id>
<label>OTHER</label>
<value>100</value>
</record_30>
<record_10>
<type>10</type>
<customer_id>1234566790</customer_id>
<lastname>MR DAVID</lastname>
<firstname>DOE</firstname>
</record_10>
<record_20>
<type>20</type>
<customer_id>1234567890</customer_id>
<account_id>ACCT1</account_id>
</record_20>
<record_30>
<type>30</type>
<customer_id>1234567890</customer_id>
<label>SALARY</label>
<value>10000</value>
</record_30></root>

Now that we've converted the proprietary flat file into a specific XML stucture, the question becomes, what next?

Processing Converted Flat Files from XQuery

Once a flat file to XML custom conversion has been configured, the combination of DataDirect XQuery and XML Converters proves particularly helpful: not only we are able to move from a proprietary flat file format to well formed XML, but we also have all the power, performance and scalability of XQuery to deal with such XML to transform it (potentially, even augment it!) into the final format that we need. The following XQuery — which implicitly relies on the custom converter described above — does the trick; notice that we created a couple of functions to take care of the positional grouping which is required to generate the desired output; that makes the XQuery much easier to read and more reusable:

declare function local:getRelated20s($item) {
let $nextItem :=$item/following-sibling::*[local-name()!="record_20"and local-name()!="record_30"][1]
for $related in $item/following-sibling::*[local-name()="record_20"]
where if ($nextItem) then $related << $nextItem else true()
return $related
};
declare function local:getRelated30s($item) {
let $nextItem := $item/following-sibling::*[local-name()!="record_30"][1]
for $related in $item/following-sibling::*[local-name()="record_30"]
where if($nextItem) then $related << $nextItem else true()
return $related
};
<DataSets>
{
for $record_10 in doc('converter:file:///c:/blog1/sample.conv?sampleinput.txt')/root/record_10
return
<DataSet>
<CustomerId>
{$record_10/customer_id/text()}
</CustomerId>
<CustomerName>
{$record_10/lastname/text()}
</CustomerName>
<CustomerFirstName>
{$record_10/firstname/text()}
</CustomerFirstName>
<Accounts>
{
for $record_20 in local:getRelated20s($record_10)
return
<Account>
<AccountId>
{$record_20/account_id/text()}
</AccountId>
<Transactions> {
for $record_30 in local:getRelated30s($record_20)
return
<Transaction>
<Label>{$record_30/label/text()}</Label>
<Value>{$record_30/value/text()}</Value>
</Transaction>
} </Transactions>
</Account>
}
</Accounts>
</DataSet>
}
</DataSets>

Once again, thanks to the fact that DataDirect XQuery and XML Converters make it possible to manipulate a wide variety of data sources (even proprietary flat files!), you are able to deal with transformation (and possibly aggregation) problems from an XML point of view, shielding as much as possible those tasks from the low level details of the data sources.

How to Convert Proprietary Flat Files to XML

In conclusion, DataDirect XML Converters and/or DataDirect XQuery greatly simplifies accessing and working with any non-XML data sources directly from your Java or .NET applications. Download a free trial today!