CSV Database

1 Connecting to CSV Data Sources

The connector for CSV allows connecting to local and remote CSV resources. Set the URI property to the CSV resource location, in addition to any other properties necessary to connect to your data source.

1.1 Connecting to Local Files

Set the URI to a folder containing CSV files.

Below is an example connection string:

URI=C:\folder1;

You can also connect to multiple CSV files which share the same schema. Below is an example connection string:

URI=C:\folder; AggregateFiles=True;

If you would prefer to expose all of the individual CSV files as tables instead, leave this property False.

URI=C:\folder; AggregateFiles=False;

1.2 Connecting to HTTP CSV Streams

Set the URI to the HTTP or HTTPS URL of the CSV resource you want to access as a table. For example:

URI=http://www.host1.com/streamname1;

To authenticate, set AuthScheme and the corresponding properties. Specify additional headers in CustomHeaders to modify the query string, set CustomUrlParams.

To query the CSV stream, reference streamedtable as the table name.

SELECT * FROM streamedtable

1.3 Connecting to Amazon S3

Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:

  • AWSAccessKey: Set this to an Amazon Web Services Access Key (a username).

  • AWSSecretKey: Set this to an Amazon Web Services Secret Key.

For example:

URI=s3://bucket1/folder1; AWSAccessKey=token1;

AWSSecretKey=secret1; AWSRegion=OHIO;

Optionally, specify AWSRegion in addition.

Note: It is also possible to connect to S3-compatible services by specifying its base StorageBaseURL. For example, if the StorageBaseURL conn prp is set to http://s3.%region%.myservice.com and Region is region-1, then we will generate request URLs like https://s3.region-1.myservice.com/bucket/... (or like https://bucket.s3.region-1.myservice.com/..., if the UseVirtualHosting property is true).

1.4 Connecting to Oracle Cloud Object Storage

Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:

  • AWSAccessKey: Set this to an Oracle cloud Access Key.

  • AWSSecretKey: Set this to an Oracle cloud Secret Key.

  • OracleNamespace: Set this to an Oracle cloud namespace.

For example:

URI=os://bucket/remotePath/; AccessKey=token1; SecretKey=secret1;

OracleNamespace=myNamespace; Region=us-ashburn-1;

Optionally, specify Region in addition.

1.5 Connecting to Wasabi

Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:

  • AWSAccessKey: Set this to a Wasabi Access Key (a username)

  • AWSSecretKey: Set this to a Wasabi Secret Key.

Optionally, specify AWSRegion in addition.

For example:

URI=wasabi://bucket1/folder1; AWSAccessKey=token1;

AWSSecretKey=secret1; AWSRegion=OHIO;

1.6 Connect to Azure Blob Storage

Set the URI to the name of your container and the name of the blob. Additionally, set the following properties to authenticate:

  • AzureStorageAccount: Set this to the account associated with the Azure blob.

  • AzureAccessKey: Set this to the access key associated with the Azure blob.

For example:

URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount;

AzureAccessKey=myKey;

You can also use the OAuth authentication to connect with Azure Blob Storage. For example:

URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount;

AuthScheme=AzureAD; InitiateOAuth=GETANDREFRESH;

If you are connecting from an Azure VM with permissions for Azure Blob storage, you can simply use the AzureMSI AuthScheme For example:

URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount;

AuthScheme=AzureMSI;

If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate.

  • InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.

  • AzureTenant: Set this to the tenant you wish to connect to.

  • OAuthGrantType: Set this to CLIENT.

  • OAuthClientId: Set this to the Client Id in your app settings.

  • OAuthJWTCert: Set this to the JWT Certificate store.

  • OAuthJWTCertType: Set this to the type of the certificate store specified by OAuthJWTCert.

For example:

AuthScheme=AzureServicePrincipal;InitiateOAuth=GETANDREFRESH;

OAuthClientId=MyClientId;;AzureTenant=MyAzureTenant;

OAuthJWTCert=MyOAuthJWTCert;OAuthJWTCertType=PFXFile

1.7 Connect to Azure Data Lake Store Gen 2

Set the URI to the name of the file system and the name of the folder which contacts your CSV files. Additionally, set the following properties to authenticate:

  • AzureStorageAccount: Set this to the account associated with the Azure data lake store.

  • AzureAccessKey: Set this to the access key associated with the Azure data lake store.

For example:

URI=abfs://myfilesystem/folder1; AzureStorageAccount=myAccount;

AzureAccessKey=myKey;

URI=abfss://myfilesystem/folder1; AzureStorageAccount=myAccount;

AzureAccessKey=myKey;

You can also use the OAuth authentication to Connect with Azure Data Lake Store Gen 2. For example:

URI=abfss://myfilesystem/folder1; AzureStorageAccount=myAccount;

AuthScheme=AzureAD; InitiateOAuth=GETANDREFRESH;

If you are connecting from an Azure VM with permissions to connect to Azure Data Lake Store Gen 2, you can simply set AuthScheme to AzureMSI. For example:

URI=abfss://myfilesystem/folder1; AzureStorageAccount=myAccount; AuthScheme=AzureMSI;

If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate.

  • InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.

  • AzureTenant: Set this to the tenant you wish to connect to.

  • OAuthGrantType: Set this to CLIENT.

  • OAuthClientId: Set this to the Client Id in your app settings.

  • OAuthJWTCert: Set this to the JWT Certificate store.

  • OAuthJWTCertType: Set this to the type of the certificate store specified by OAuthJWTCert.

For example:

AuthScheme=AzureServicePrincipal;InitiateOAuth=GETANDREFRESH;

OAuthClientId=MyClientId;;AzureTenant=MyAzureTenant;

OAuthJWTCert=MyOAuthJWTCert;OAuthJWTCertType=PFXFile

1.8 Connect to Azure File Storage

Set the URI to the name of your azure file share and the name of the resource. Additionally, set the following properties to authenticate:

  • AzureStorageAccount (Required): Set this to the account associated with the Azure file.

  • AzureAccessKey: Set this to the access key associated with the Azure file.

  • AzureSharedAccessSignature: Set this to the shared access signature associated with the Azure file.

For example:

URI=azurefile://fileShare/remotePath/; AzureStorageAccount=myAccount;

AzureAccessKey=myAccessKey;

URI=azurefile://fileShare/remotePath/; AzureStorageAccount=myAccount;

AzureSharedAccessSignature=mySharedSignature;

1.9 Connecting to Box

Set the URI to the path to a folder containing CSV files. To authenticate to Box, use the OAuth authentication standard. See the Box Connector for an authentication guide.

For example:

URI=box://folder1; InitiateOAuth=GETANDREFRESH;

OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1;

CallbackUrl=http://localhost:12345;

1.10 Connecting to Dropbox

Set the URI to the path to a folder containing CSV files. To authenticate to Dropbox, use the OAuth authentication standard. See the Dropbox Connector for an authentication guide. You can authenticate with a user account or a service account. In the user account flow, you do not need to set any connection properties for your user credentials, as shown in the connection string below:

URI=dropbox://folder1/; InitiateOAuth=GETANDREFRESH;

OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1;

CallbackUrl=http://localhost:12345;

1.11 Connecting to Google Drive

Set the URI to the path to the name of the file system and the name of the folder which contacts your CSV files. To access shared files, set SharedWithMe as the name of the folder which contains your Excel files. For example URI=gdrive://SharedWithMe/remotePath. To authenticate to Google APIs, use the OAuth authentication standard. You can authorize the provider to connect to Google APIs on behalf of individual users or on behalf of a domain. See the Google Drive Connector data source.

For example:

URI=gdrive://folder1;InitiateOAuth=GETANDREFRESH;

1.12 Connecting to SharePoint Online SOAP

Set the URI to a document library containing CSV files. To authenticate, set User and Password and StorageBaseURL.

For example:

URI=sp://Documents/folder1; User=user1; Password=password1;

StorageBaseURL=https://subdomain.sharepoint.com;

Note that this connection method may not work if the StorageBaseURL ends with "-my.sharepoint.com". You should use the onedrive:// scheme when connecting to these sites because they do not support the components that of SharePoint that the provider needs to download files.

1.13 Connecting to SharePoint Online REST

Set the URI to a document library containing CSV files. StorageBaseURL is optional. If not provided, the driver will work with the root drive. To authenticate, use the OAuth authentication standard.

For example:

URI=sp://Documents/folder1; InitiateOAuth=GETANDREFRESH;

StorageBaseURL=https://subdomain.sharepoint.com;

Note that this connection method may not work if the StorageBaseURL ends with "-my.sharepoint.com". You should use the onedrive:// scheme when connecting to these sites because they do not support the components that of SharePoint that the provider needs to download files.

1.14 Connecting to FTP

Set the URI to the address of the server followed by the path to the folder to be used as the root folder. To authenticate, set User and Password.

For example:

URI=ftps://localhost:990/folder1; User=user1; Password=password1;

1.15 Connecting to Google Cloud Storage

Set the URI to the path to the name of the file system and the name of the folder which contacts your CSV files. To authenticate to Google APIs, provide a ProjectId.

For example:

URI=gs://bucket/remotePath/; ProjectId=PROJECT_ID;

2 Securing CSV Connections

By default, the connector attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store. To specify another certificate, see the SSLServerCert property for the available formats to do so.

The following are the connection properties for CSVDatabase. Not all properties are required. Enter only property values pertaining to your installation. Several properties will be automatically initialized with the appRules defaults.

Property

Description

Authentication

AuthScheme

The type of authentication to use when connecting to remote services.

AWSAccessKey

Your AWS account access key. This value is accessible from your AWS security credentials page.

AWSRegion

The hosting region for your Amazon Web Services.

AWSRoleARN

The Amazon Resource Name of the role to use when authenticating.

AWSSecretKey

Your AWS account secret key. This value is accessible from your AWS security credentials page.

Password

The password used to authenticate the user.

URL

The URL of the cloud storage service provider.

User

The CSV user account used to authenticate.

AWS Authentication

MFASerialNumber

The serial number of the MFA device if one is being used.

MFAToken

The temporary token available from your MFA device.

Azure Authentication

AzureAccessKey

The storage key associated with your Azure Blob storage account.

AzureAccount

The name of your Azure Blob storage account.

AzureSharedAccessSignature

A shared access key signature that may be used for authentication.

AzureTenant

The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used.

Caching

CacheTolerance

The tolerance for stale data in the cache specified in seconds when using AutoCache .

Connection

OracleNamespace

The Oracle Cloud Object Storage namespace to use.

Region

The hosting region for your S3-like Web Services.

URI

This property specifies a URI for the CSV resource location.

UseVirtualHosting

If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified.

Data

IgnoreBlankRows

Indicates whether to skip the empty rows.

NullValues

A comma separated list which will be replaced with nulls if there are found in the CSV file.

PushEmptyValuesAsNull

Indicates whether to read the empty values as empty or as null.

Firewall

FirewallPassword

A password used to authenticate to a proxy-based firewall.

FirewallPort

The TCP port for a proxy-based firewall.

FirewallServer

The name or IP address of a proxy-based firewall.

FirewallType

The protocol used by a proxy-based firewall.

FirewallUser

The user name to use to authenticate with a proxy-based firewall.

JWTOAuth

OAuthJWTCert

The JWT Certificate store.

OAuthJWTCertPassword

The password for the OAuth JWT certificate.

OAuthJWTCertSubject

The subject of the OAuth JWT certificate.

OAuthJWTCertType

The type of key store containing the JWT Certificate.

OAuthJWTIssuer

The issuer of the Java Web Token.

OAuthJWTSubject

The user subject for which the application is requesting delegated access.

Kerberos

KerberosKDC

The Kerberos Key Distribution Center (KDC) service used to authenticate the user.

KerberosKeytabFile

The Keytab file containing your pairs of Kerberos principals and encrypted keys.

KerberosRealm

The Kerberos Realm used to authenticate the user with.

KerberosServiceKDC

The Kerberos KDC of the service.

KerberosServiceRealm

The Kerberos realm of the service.

KerberosSPN

The service principal name (SPN) for the Kerberos Domain Controller.

KerberosTicketCache

The full file path to an MIT Kerberos credential cache file.

Logging

Logfile

A filepath which designates the name and location of the log file.

LogModules

Core modules to be included in the log file.

MaxLogFileCount

A string specifying the maximum file count of log files. When the limit is hit, a new log is created in the same folder with the date and time appended to the end and the oldest log file will be deleted.

MaxLogFileSize

A string specifying the maximum size in bytes for a log file (for example, 10 MB). When the limit is hit, a new log is created in the same folder with the date and time appended to the end.

Verbosity

The verbosity level that determines the amount of detail included in the log file.

Misc

AggregateFiles

When set to true, the provider will aggregate all of the files located in the URI directory into a single table called AggregatedFiles .

AzureEnvironment

The Azure Environment to use when establishing a connection.

ConnectionLifeTime

The maximum lifetime of a connection in seconds. Once the time has elapsed, the connection object is disposed.

ConnectionString

***

Culture

This setting can be used to specify culture settings that determine how the provider interprets certain data types that are passed into the provider. For example, setting Culture='de-DE' will output German formats even on an American machine.

CustomHeaders

Other headers as determined by the user (optional).

CustomUrlParams

The custom query string to be included in the request.

DirectoryRetrievalDepth

Limit the subfolders recursively scanned when IncludeSubdirectories is enabled.

ExcludeFileExtensions

Set to true if file extensions should be excluded from table names.

ExtendedProperties

The Microsoft Jet OLE DB 4.0-compatible extended properties for text files.

FMT

The format to be used to parse all text files.

GenerateHiveDDL

Specifies a directory in which the provider will store the DDL statements required to query the data generated by INSERT queries. This is only valid for the S3 target.

GenerateSchemaFiles

Indicates the user preference as to when schemas should be generated and saved.

HDR

Whether to get column names from the first line of the specified files.

IncludeColumnHeaders

Whether to get column names from the first line of the specified files.

IncludeFiles

Comma-separated list of file extensions to include into the set of the files modeled as tables.

IncludeSubdirectories

Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names.

MaxRows

Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.

MetadataDiscoveryURI

Used together with AggregateFiles , this property specifies a specific file to read the schema of the AggregatedFiles result set.

Other

These hidden properties are used only in specific use cases.

PoolIdleTimeout

The allowed idle time for a connection before it is closed.

PoolMaxSize

The maximum connections in the pool.

PoolMinSize

The minimum number of connections in the pool.

PoolWaitTime

The max seconds to wait for an available connection.

ProjectId

The id of the project where your Google Cloud Storage instance resides.

PseudoColumns

This property indicates whether or not to include pseudo columns as columns to the table.

QuoteCharacter

Determines the character which will be used to quote values.

QuoteEscapeCharacter

Determines the character which will be used to escape quotes.

Readonly

You can use this property to enforce read-only access to CSV from the provider.

RowDelimiter

The character which will be used to detect the end of a CSV row.

RowScanDepth

The number of rows to scan when dynamically determining columns for the table.

SharepointURL

The URL required for the Sharepoint cloud storage service provider.

SimpleUploadLimit

This setting specifies the threshold, in bytes, above which the provider will choose to perform a multipart upload rather than uploading everything in one request.

SkipHeaderComments

If set to true, skips rows at the top of the file beginning with #.

SkipTop

Skips the amount of rows specified starting from the top.

SSLServerCert

The certificate to be accepted from the server when connecting using TLS/SSL.

SupportEnhancedSQL

This property enhances SQL functionality beyond what can be supported through the API directly, by enabling in-memory client-side processing.

Timeout

The value in seconds until the timeout error is thrown, canceling the operation.

TrimSpaces

Set to True if you want the provider to trim preceeding and trailing spaces in a cell containing a quoted value.

TruncateOnInserts

Set to True if you want the provider to truncate on every (batch) insert.

TypeDetectionScheme

Determines how to determine the data types of columns.

UseConnectionPooling

This property enables connection pooling.

UseRowNumbers

Set this to true if you are deleting or updating in CSV and you do not want to specify a custom schema. This will create a new column with the name RowNumber which will be used as key for that table.

UseTempFile

Set to True if you want to use temp files when inserting in a CSV file.

OAuth

AuthKey

The authentication secret used to request and obtain the OAuth Access Token.

AuthToken

The authentication token used to request and obtain the OAuth Access Token.

CallbackURL

The OAuth callback URL to return to when authenticating. This value must match the callback URL you specify in your app settings.

InitiateOAuth

Set this property to initiate the process to obtain or refresh the OAuth access token when you connect.

OAuthAccessToken

The access token for connecting using OAuth.

OAuthAccessTokenSecret

The OAuth access token secret for connecting using OAuth.

OAuthAccessTokenURL

The URL to retrieve the OAuth access token from.

OAuthAuthorizationURL

The authorization URL for the OAuth service.

OAuthClientId

The client ID assigned when you register your application with an OAuth authorization server.

OAuthClientSecret

The client secret assigned when you register your application with an OAuth authorization server.

OAuthExpiresIn

The lifetime in seconds of the OAuth AccessToken.

OAuthGrantType

The grant type for the OAuth flow.

OAuthParams

A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value.

OAuthRefreshToken

The OAuth refresh token for the corresponding OAuth access token.

OAuthRefreshTokenURL

The URL to refresh the OAuth token from.

OAuthRequestTokenURL

The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0.

OAuthSettingsLocation

The location of the settings file where OAuth values are saved when InitiateOAuth is set to GETANDREFRESH or REFRESH. Alternatively, this can be held in memory by specifying a value starting with memory://.

OAuthTokenTimestamp

The Unix epoch timestamp in milliseconds when the current Access Token was created.

OAuthVerifier

The verifier code returned from the OAuth authorization URL.

OAuthVersion

The version of OAuth being used.

Proxy

ProxyAuthScheme

The authentication type to use to authenticate to the ProxyServer proxy.

ProxyAutoDetect

This indicates whether to use the system proxy settings or not. This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.

ProxyExceptions

A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .

ProxyPassword

A password to be used to authenticate to the ProxyServer proxy.

ProxyPort

The TCP port the ProxyServer proxy is running on.

ProxyServer

The hostname or IP address of a proxy to route HTTP traffic through.

ProxySSLType

The SSL type to use when connecting to the ProxyServer proxy.

ProxyUser

A user name to be used to authenticate to the ProxyServer proxy.

Schema

SchemaIniLocation

A path to the directory that contains the schema.ini file.

SFTP

SSLMode

The authentication mechanism to be used when connecting to the FTP or SFTP server.

SSH

SSHAuthMode

The authentication method to be used to log on to an SFTP server.

SSHClientCert

A certificate to be used for authenticating the user.

SSHClientCertPassword

The password of the SSHClientCert certificate if it has one.

SSHClientCertType

The type of SSHClientCert certificate.

SSL

SSLClientCert

The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL).

SSLClientCertPassword

The password for the TLS/SSL client certificate.

SSLClientCertSubject

The subject of the TLS/SSL client certificate.

SSLClientCertType

The type of key store containing the TLS/SSL client certificate.

Last updated