Modeling CSV Data

The CData JDBC Driver for CSV hides the complexity of processing local and remote CSV data, from connecting over wire protocols to modeling the data as tables. However, you also have control over these layers.

Customizing Automatic Schema Discovery

The driver dynamically derives schemas from CSV based on the connection properties specified. The available connection properties give you control over many aspects of how CSV data is modeled as tables. See Connecting to CSV Data Sources for more information on configuring the connection. When working with local CSV, you can also configure the column definitions and file format with Schema.ini, the configuration used by the Microsoft Jet driver.

For more granular control over the columns reported and other aspects of modeling the data as tables, you can define your own schemas or extend the generated ones. Schemas are defined in extendable configuration files. See Generating Schema Files to save the detected schemas to configuration files, which you can then easily edit.

The following sections show how to customize schemas or write your own from scratch.

Authoring Custom Schemas

Tables and views are defined by authoring schema files in API Script. API Script is a simple configuration language that allows you to define the columns and the behavior of the table. It also has built-in operations that enable you to process CSV.

In addition to these data processing primitives, API Script is a full-featured language with constructs for conditionals, looping, etc. However, as shown by the example schema, for most table definitions you will not need to use these features.

Below is a fully functional table schema that models the Person entity in the popular Northwind sample database. It contains all the components you will need to access your data source through SQL. You can find more information on using these components in Column Definitions and SELECT Execution.

<api:script>
  <!-- See Column Definitions to define column behavior. -->
   <api:info title="CSVPersons" desc="Parse the CSV Persons feed.">
    <attr name="ID"           xs:type="int" key="true"  />
    <attr name="EmployeeID"   xs:type="int"             />
    <attr name="Name"         xs:type="string"          />
    <attr name="TotalExpense" xs:type="double"          />
    <attr name="HireDate"     xs:type="datetime"        />
    <attr name="Salary"       xs:type="int"             />
  </api:info>

  <api:set attr="uri" value="http://pathtocsvstream" /> 
  
  <!-- The GET method corresponds to SELECT. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->
  <api:script method="GET" >
    <api:call  op="csvproviderGet"/>
  </api:script>

  <!-- Not implemented -->
  <api:script method="POST">
    <api:call op="csvproviderInsert">
      <api:push/>
    </api:call>
  </api:script>

  <!-- Not implemented -->
  <api:script method="MERGE">
    <api:call op="csvproviderUpdate">
      <api:push/>
    </api:call>
  </api:script>

  <!-- Not implemented -->
  <api:script method="DELETE">
    <api:call op="csvproviderDelete">
      <api:push/>
    </api:call>
  </api:script>
</api:script>