データモデル

CData SSIS Components for Elasticsearch は、Elasticsearch エンティティをリレーショナルテーブル、ビュー、およびストアドプロシージャにモデル化します。

テーブル

テーブル定義は動的に取得されます。接続すると、本製品はElasticsearch に接続し、スキーマ、テーブルのリストおよびテーブルのメタデータをElasticsearch REST サーバーをクエリすることで取得します。

SQL での検索では、テーブルが動的に取得される方法を詳細に説明します。

ビュー

Views are created from Elasticsearch aliases and the definitions are dynamically retrieved. When you connect, the 本製品 connects to Elasticsearch and retrieves the list of views and the metadata for the views by querying the Elasticsearch REST server.

Views are treated in a similar manner to Tables and thus exhibit similar behavior. There are some differences in the background though which are a direct result of how aliases work within Elasticsearch. (Note: In the following description, 'alias', 'index', 'type', and 'field' are referring to the Elasticsearch objects and not directly to anything within the 本製品).

Views (aliases) are tied to an index and thus span all the types within an index. Additionally aliases can span multiple indices. Therefore you may see an alias (view) listed multiple times under different schemas (index). When querying the view, regardless of the schema specified, data will be retrieved and returned for all indices and types associated with the corresponding alias. Thus the generated metadata will contain a column for each field within each type of each index associated with the alias.

SQL での検索では、ビューが動的に取得される方法を詳細に説明します。

The ModifyIndexAliases stored procedure can be used to create index aliases within Elasticsearch.

In addition to the Elasticsearch aliases, an '_all' view is returned which enables querying the _all endpoint to retrieve data for all indices in a single query. Given how many indices and documents the _all view could cover, certain queries agains the '_all' view could be very expensive. Additionally, for scanning for table metadata, as governed by RowScanDepth, will be less accurate for '_all' views that cover very large or very heterogenous indices. See 本製品は、Elasticsearch タイプのマッピングを取得することで、リレーショナルスキーマを自動的に推測します。カラムおよびデータ型は取得されたマッピングから生成されます。

配列の検出

Elasticsearch のあらゆるフィールドは、値の配列となることができますが、これはマッピング中では明示的に定義されてはいません。これに対応するため、本製品はデータをクエリして、配列を含むフィールドがあるかどうかを調べます。この配列スキャンのために取得するElasticsearch ドキュメントの数はRowScanDepth プロパティで設定されます。

Elasticsearch のネストされたタイプは、オブジェクトの配列を表す特別なタイプであり、そのためメタデータの生成ではそのように扱われます。

カラムの検出

検出プロセスで特定されるカラムはFlattenArrays およびFlattenObjects によります。

データセットの例

これらのオプションがどう動作するかの例として、次のマッピングを考えます（テーブル名は'insured' です）。

{
  "insured": {
    "properties": {
      "name": { "type":"string" },
      "address": {
        "street": { "type":"string" },
        "city": { "type":"string" },
        "state": { "type":"string" }
      },
      "insured_ages": { "type": "integer" },
      "vehicles": {
        "type": "nested",
        "properties": {
          "year": { "type":"integer" },
          "make": { "type":"string" },
          "model": { "type":"string" },
          "body_style" { "type": "string" }
        }
      }
    }
  }
}

また、上の例において、次のサンプルデータを考えてください：

{
  "_source": {
    "name": "John Smith",
    "address": {
      "street": "Main Street",
      "city": "Chapel Hill",
      "state": "NC"
    },
    "insured_ages": [ 17, 43, 45 ], 
    "vehicles": [
      {
        "year": 2015,
        "make": "Dodge",
        "model": "RAM 1500",
        "body_style": "TK"
      },
      {
        "year": 2015,
        "make": "Suzuki",
        "model": "V-Strom 650 XT",
        "body_style": "MC"
      },
      {
        "year": 2012,
        "make": "Honda",
        "model": "Accord",
        "body_style": "4D"
      }
    ]
  }
}

FlattenObjects の使用

FlattenObjects が設定されている場合、すべてのネストされたオブジェクトは連続したカラムにフラット化されます。上記の例は、次のカラムとして表示されます：


カラム名	データ型	サンプル値
name	String	John Smith
address.street	String	Main Street
address.city	String	Chapel Hill
address.state	String	NC
insured_ages	String	[ 17, 43, 45 ]
vehicles	String	[ { "year":"2015", "make":"Dodge", ... }, { "year":"2015", "make":"Suzuki", ... }, { "year":"2012", "make":"Honda", ... } ]

FlattenObjects が設定されていない場合、address.street、address.city、およびaddress.state カラムは別々にはなりません。文字列型の住所カラムは一つのオブジェクトとして表されます。値は次のようになります：

{street: "Main Street", city: "Chapel Hill", state: "NC"}

JSON アグリゲートの詳細についてはJSON 関数を参照してください。

FlattenArrays の使用

FlattenArrays プロパティは配列の値をフラット化してそれぞれのカラムとするために使われます。これは配列が短い場合にのみ推奨されます。アンバウンドの配列をそのままにしておき、必要な際にJSON 関数を使ってデータを取り出すことをお勧めします。

Note：一番上の配列のみがフラット化されます。サブ配列は、配列全体として表示されます。

FlattenArrays プロパティは3に設定して上の例の配列を次のように表すことができます（この例ではFlattenObjects は設定されていません）：


カラム名	データ型	サンプル値
insured_ages	String	[ 17, 43, 45 ]
insured_ages.0	Integer	17
insured_ages.1	Integer	43
insured_ages.2	Integer	45
vehicles	String	[ { "year":"2015", "make":"Dodge", ... }, { "year":"2015", "make":"Suzuki", ... }, { "year":"2012", "make":"Honda", ... } ]
vehicles.0	String	{ "year":"2015", "make":"Dodge", "model":"RAM 1500", "body_style":"TK" }
vehicles.1	String	{ "year":"2015", "make":"Suzuki", "model":"V-Strom 650 XT", "body_style":"MC" }
vehicles.2	String	{ "year":"2012", "make":"Honda", "model":"Accord", "body_style":"4D" }

FlattenObjects とFlettenArrays を両方使う

FlattenObjects とFlattenArrays が同時に設定されている（brevity は1）場合、vehicle フィールドは次のように表されます：


カラム名	データ型	サンプル値
vehicles	String	[ { "year":"2015", "make":"Dodge", ... }, { "year":"2015", "make":"Suzuki", ... }, { "year":"2012", "make":"Honda", ... } ]
vehicles.0.year	String	2015
vehicles.0.make	String	Dodge
vehicles.0.model	String	RAM 1500
vehicles.0.body_style	String	TK

for more information about this.

ストアドプロシージャ

ストアドプロシージャは、Elasticsearch のファンクションライクなインターフェースであり、さまざまなタスクの実行に使われます。