txt2xml

txt2xml is a simple Java library for parsing arbitrarily structured text input into well-formed XML output as SAX, DOM, JDOM, or through an OutputStream.

See:
          Description

Packages
org.txt2xml.cli  
org.txt2xml.config  
org.txt2xml.core  
org.txt2xml.driver  
org.txt2xml.gui  

 

txt2xml is a simple Java library for parsing arbitrarily structured text input into well-formed XML output as SAX, DOM, JDOM, or through an OutputStream. The project was inspired by Using SAX to Read Other Formats by Claude Duguay in XML Magazine March 2002.

txt2xml is useful in integration problems in which a variety of text formats need to be handled in a uniform manner: XML is a reasonable common ground since it is well supported by APIs and tooling.

The strategy in txt2xml is to allow "Processors" to parse the input, writing XML as they go, and allowing matched fragments to be further processed by sub-processors. Processors can be configured to repeat across an entire input text, or to match only once before passing control to a subsequent Processor.

There is a simple configuration mechanism that allows a conversion to be easily configured in an XML document: see below for examples.

Output of the resulting XML is handled by Drivers in a number of ways including creation of a DOM or JDOM DOcument, driving a SAX ContentHandler, or output as text to an OutputStream.

Example

To turn a "comma separated values" file into XML, configure txt2xml as follows:

<txt2xml>

    <!-- Processor to split into lines -->
    <processor type="RegexDelimited">
        <element>line</element>
        <regex>\n</regex>

        <!-- Sub-processor to process each line by splitting at commas -->
        <processor type="RegexDelimited">
            <element>field</element>
            <regex>,\\s*</regex>
        </processor>

    </processor>

</txt2xml>
This will act on the following comma separated values text:
1, 2, 3
5, 6, 7
To produce the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<txt2xml>
    <line>
        <field>1</field>
        <field>2</field>
        <field>3</field>
    </line>
    <line>
        <field>5</field>
        <field>6</field>
        <field>7</field>
    </line>
</txt2xml>



Copyright © 2002 Steve Meyfroidt. All Rights Reserved.