CSV Database
1 Connecting to CSV Data Sources
The connector for CSV allows connecting to local and remote CSV resources. Set the URI property to the CSV resource location, in addition to any other properties necessary to connect to your data source.
1.1 Connecting to Local Files
Set the URI to a folder containing CSV files.
Below is an example connection string:
|
You can also connect to multiple CSV files which share the same schema. Below is an example connection string:
|
If you would prefer to expose all of the individual CSV files as tables instead, leave this property False.
|
1.2 Connecting to HTTP CSV Streams
Set the URI to the HTTP or HTTPS URL of the CSV resource you want to access as a table. For example:
|
To authenticate, set AuthScheme and the corresponding properties. Specify additional headers in CustomHeaders to modify the query string, set CustomUrlParams.
To query the CSV stream, reference streamedtable as the table name.
|
1.3 Connecting to Amazon S3
Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:
AWSAccessKey: Set this to an Amazon Web Services Access Key (a username).
AWSSecretKey: Set this to an Amazon Web Services Secret Key.
For example:
|
Optionally, specify AWSRegion in addition.
Note: It is also possible to connect to S3-compatible services by specifying its base StorageBaseURL. For example, if the StorageBaseURL conn prp is set to http://s3.%region%.myservice.com and Region is region-1, then we will generate request URLs like https://s3.region-1.myservice.com/bucket/... (or like https://bucket.s3.region-1.myservice.com/..., if the UseVirtualHosting property is true).
1.4 Connecting to Oracle Cloud Object Storage
Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:
AWSAccessKey: Set this to an Oracle cloud Access Key.
AWSSecretKey: Set this to an Oracle cloud Secret Key.
OracleNamespace: Set this to an Oracle cloud namespace.
For example:
|
Optionally, specify Region in addition.
1.5 Connecting to Wasabi
Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:
AWSAccessKey: Set this to a Wasabi Access Key (a username)
AWSSecretKey: Set this to a Wasabi Secret Key.
Optionally, specify AWSRegion in addition.
For example:
|
1.6 Connect to Azure Blob Storage
Set the URI to the name of your container and the name of the blob. Additionally, set the following properties to authenticate:
AzureStorageAccount: Set this to the account associated with the Azure blob.
AzureAccessKey: Set this to the access key associated with the Azure blob.
For example:
|
You can also use the OAuth authentication to connect with Azure Blob Storage. For example:
|
If you are connecting from an Azure VM with permissions for Azure Blob storage, you can simply use the AzureMSI AuthScheme For example:
|
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate.
InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
AzureTenant: Set this to the tenant you wish to connect to.
OAuthGrantType: Set this to CLIENT.
OAuthClientId: Set this to the Client Id in your app settings.
OAuthJWTCert: Set this to the JWT Certificate store.
OAuthJWTCertType: Set this to the type of the certificate store specified by OAuthJWTCert.
For example:
|
1.7 Connect to Azure Data Lake Store Gen 2
Set the URI to the name of the file system and the name of the folder which contacts your CSV files. Additionally, set the following properties to authenticate:
AzureStorageAccount: Set this to the account associated with the Azure data lake store.
AzureAccessKey: Set this to the access key associated with the Azure data lake store.
For example:
|
You can also use the OAuth authentication to Connect with Azure Data Lake Store Gen 2. For example:
|
If you are connecting from an Azure VM with permissions to connect to Azure Data Lake Store Gen 2, you can simply set AuthScheme to AzureMSI. For example:
|
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate.
InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
AzureTenant: Set this to the tenant you wish to connect to.
OAuthGrantType: Set this to CLIENT.
OAuthClientId: Set this to the Client Id in your app settings.
OAuthJWTCert: Set this to the JWT Certificate store.
OAuthJWTCertType: Set this to the type of the certificate store specified by OAuthJWTCert.
For example:
|
1.8 Connect to Azure File Storage
Set the URI to the name of your azure file share and the name of the resource. Additionally, set the following properties to authenticate:
AzureStorageAccount (Required): Set this to the account associated with the Azure file.
AzureAccessKey: Set this to the access key associated with the Azure file.
AzureSharedAccessSignature: Set this to the shared access signature associated with the Azure file.
For example:
|
|
1.9 Connecting to Box
Set the URI to the path to a folder containing CSV files. To authenticate to Box, use the OAuth authentication standard. See the Box Connector for an authentication guide.
For example:
|
1.10 Connecting to Dropbox
Set the URI to the path to a folder containing CSV files. To authenticate to Dropbox, use the OAuth authentication standard. See the Dropbox Connector for an authentication guide. You can authenticate with a user account or a service account. In the user account flow, you do not need to set any connection properties for your user credentials, as shown in the connection string below:
|
1.11 Connecting to Google Drive
Set the URI to the path to the name of the file system and the name of the folder which contacts your CSV files. To access shared files, set SharedWithMe as the name of the folder which contains your Excel files. For example URI=gdrive://SharedWithMe/remotePath. To authenticate to Google APIs, use the OAuth authentication standard. You can authorize the provider to connect to Google APIs on behalf of individual users or on behalf of a domain. See the Google Drive Connector data source.
For example:
|
1.12 Connecting to SharePoint Online SOAP
Set the URI to a document library containing CSV files. To authenticate, set User and Password and StorageBaseURL.
For example:
|
Note that this connection method may not work if the StorageBaseURL ends with "-my.sharepoint.com". You should use the onedrive:// scheme when connecting to these sites because they do not support the components that of SharePoint that the provider needs to download files.
1.13 Connecting to SharePoint Online REST
Set the URI to a document library containing CSV files. StorageBaseURL is optional. If not provided, the driver will work with the root drive. To authenticate, use the OAuth authentication standard.
For example:
|
Note that this connection method may not work if the StorageBaseURL ends with "-my.sharepoint.com". You should use the onedrive:// scheme when connecting to these sites because they do not support the components that of SharePoint that the provider needs to download files.
1.14 Connecting to FTP
Set the URI to the address of the server followed by the path to the folder to be used as the root folder. To authenticate, set User and Password.
For example:
|
1.15 Connecting to Google Cloud Storage
Set the URI to the path to the name of the file system and the name of the folder which contacts your CSV files. To authenticate to Google APIs, provide a ProjectId.
For example:
|
2 Securing CSV Connections
By default, the connector attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store. To specify another certificate, see the SSLServerCert property for the available formats to do so.
The following are the connection properties for CSVDatabase. Not all properties are required. Enter only property values pertaining to your installation. Several properties will be automatically initialized with the appRules defaults.
Property
|
Description
|
Authentication | |
AuthScheme | The type of authentication to use when connecting to remote services. |
AWSAccessKey | Your AWS account access key. This value is accessible from your AWS security credentials page. |
AWSRegion | The hosting region for your Amazon Web Services. |
AWSRoleARN | The Amazon Resource Name of the role to use when authenticating. |
AWSSecretKey | Your AWS account secret key. This value is accessible from your AWS security credentials page. |
Password | The password used to authenticate the user. |
URL | The URL of the cloud storage service provider. |
User | The CSV user account used to authenticate. |
AWS Authentication | |
MFASerialNumber | The serial number of the MFA device if one is being used. |
MFAToken | The temporary token available from your MFA device. |
Azure Authentication | |
AzureAccessKey | The storage key associated with your Azure Blob storage account. |
AzureAccount | The name of your Azure Blob storage account. |
AzureSharedAccessSignature | A shared access key signature that may be used for authentication. |
AzureTenant | The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used. |
Caching | |
CacheTolerance | The tolerance for stale data in the cache specified in seconds when using AutoCache . |
Connection | |
OracleNamespace | The Oracle Cloud Object Storage namespace to use. |
Region | The hosting region for your S3-like Web Services. |
URI | This property specifies a URI for the CSV resource location. |
UseVirtualHosting | If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified. |
Data | |
IgnoreBlankRows | Indicates whether to skip the empty rows. |
NullValues | A comma separated list which will be replaced with nulls if there are found in the CSV file. |
PushEmptyValuesAsNull | Indicates whether to read the empty values as empty or as null. |
Firewall | |
FirewallPassword | A password used to authenticate to a proxy-based firewall. |
FirewallPort | The TCP port for a proxy-based firewall. |
FirewallServer | The name or IP address of a proxy-based firewall. |
FirewallType | The protocol used by a proxy-based firewall. |
FirewallUser | The user name to use to authenticate with a proxy-based firewall. |
JWTOAuth | |
OAuthJWTCert | The JWT Certificate store. |
OAuthJWTCertPassword | The password for the OAuth JWT certificate. |
OAuthJWTCertSubject | The subject of the OAuth JWT certificate. |
OAuthJWTCertType | The type of key store containing the JWT Certificate. |
OAuthJWTIssuer | The issuer of the Java Web Token. |
OAuthJWTSubject | The user subject for which the application is requesting delegated access. |
Kerberos | |
KerberosKDC | The Kerberos Key Distribution Center (KDC) service used to authenticate the user. |
KerberosKeytabFile | The Keytab file containing your pairs of Kerberos principals and encrypted keys. |
KerberosRealm | The Kerberos Realm used to authenticate the user with. |
KerberosServiceKDC | The Kerberos KDC of the service. |
KerberosServiceRealm | The Kerberos realm of the service. |
KerberosSPN | The service principal name (SPN) for the Kerberos Domain Controller. |
KerberosTicketCache | The full file path to an MIT Kerberos credential cache file. |
Logging | |
Logfile | A filepath which designates the name and location of the log file. |
LogModules | Core modules to be included in the log file. |
MaxLogFileCount | A string specifying the maximum file count of log files. When the limit is hit, a new log is created in the same folder with the date and time appended to the end and the oldest log file will be deleted. |
MaxLogFileSize | A string specifying the maximum size in bytes for a log file (for example, 10 MB). When the limit is hit, a new log is created in the same folder with the date and time appended to the end. |
Verbosity | The verbosity level that determines the amount of detail included in the log file. |
Misc | |
AggregateFiles | When set to true, the provider will aggregate all of the files located in the URI directory into a single table called AggregatedFiles . |
AzureEnvironment | The Azure Environment to use when establishing a connection. |
ConnectionLifeTime | The maximum lifetime of a connection in seconds. Once the time has elapsed, the connection object is disposed. |
ConnectionString | *** |
Culture | This setting can be used to specify culture settings that determine how the provider interprets certain data types that are passed into the provider. For example, setting Culture='de-DE' will output German formats even on an American machine. |
CustomHeaders | Other headers as determined by the user (optional). |
CustomUrlParams | The custom query string to be included in the request. |
DirectoryRetrievalDepth | Limit the subfolders recursively scanned when IncludeSubdirectories is enabled. |
ExcludeFileExtensions | Set to true if file extensions should be excluded from table names. |
ExtendedProperties | The Microsoft Jet OLE DB 4.0-compatible extended properties for text files. |
FMT | The format to be used to parse all text files. |
GenerateHiveDDL | Specifies a directory in which the provider will store the DDL statements required to query the data generated by INSERT queries. This is only valid for the S3 target. |
GenerateSchemaFiles | Indicates the user preference as to when schemas should be generated and saved. |
HDR | Whether to get column names from the first line of the specified files. |
IncludeColumnHeaders | Whether to get column names from the first line of the specified files. |
IncludeFiles | Comma-separated list of file extensions to include into the set of the files modeled as tables. |
IncludeSubdirectories | Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names. |
MaxRows | Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time. |
MetadataDiscoveryURI | Used together with AggregateFiles , this property specifies a specific file to read the schema of the AggregatedFiles result set. |
Other | These hidden properties are used only in specific use cases. |
PoolIdleTimeout | The allowed idle time for a connection before it is closed. |
PoolMaxSize | The maximum connections in the pool. |
PoolMinSize | The minimum number of connections in the pool. |
PoolWaitTime | The max seconds to wait for an available connection. |
ProjectId | The id of the project where your Google Cloud Storage instance resides. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
QuoteCharacter | Determines the character which will be used to quote values. |
QuoteEscapeCharacter | Determines the character which will be used to escape quotes. |
Readonly | You can use this property to enforce read-only access to CSV from the provider. |
RowDelimiter | The character which will be used to detect the end of a CSV row. |
RowScanDepth | The number of rows to scan when dynamically determining columns for the table. |
SharepointURL | The URL required for the Sharepoint cloud storage service provider. |
SimpleUploadLimit | This setting specifies the threshold, in bytes, above which the provider will choose to perform a multipart upload rather than uploading everything in one request. |
SkipHeaderComments | If set to true, skips rows at the top of the file beginning with #. |
SkipTop | Skips the amount of rows specified starting from the top. |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
SupportEnhancedSQL | This property enhances SQL functionality beyond what can be supported through the API directly, by enabling in-memory client-side processing. |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
TrimSpaces | Set to True if you want the provider to trim preceeding and trailing spaces in a cell containing a quoted value. |
TruncateOnInserts | Set to True if you want the provider to truncate on every (batch) insert. |
TypeDetectionScheme | Determines how to determine the data types of columns. |
UseConnectionPooling | This property enables connection pooling. |
UseRowNumbers | Set this to true if you are deleting or updating in CSV and you do not want to specify a custom schema. This will create a new column with the name RowNumber which will be used as key for that table. |
UseTempFile | Set to True if you want to use temp files when inserting in a CSV file. |
OAuth | |
AuthKey | The authentication secret used to request and obtain the OAuth Access Token. |
AuthToken | The authentication token used to request and obtain the OAuth Access Token. |
CallbackURL | The OAuth callback URL to return to when authenticating. This value must match the callback URL you specify in your app settings. |
InitiateOAuth | Set this property to initiate the process to obtain or refresh the OAuth access token when you connect. |
OAuthAccessToken | The access token for connecting using OAuth. |
OAuthAccessTokenSecret | The OAuth access token secret for connecting using OAuth. |
OAuthAccessTokenURL | The URL to retrieve the OAuth access token from. |
OAuthAuthorizationURL | The authorization URL for the OAuth service. |
OAuthClientId | The client ID assigned when you register your application with an OAuth authorization server. |
OAuthClientSecret | The client secret assigned when you register your application with an OAuth authorization server. |
OAuthExpiresIn | The lifetime in seconds of the OAuth AccessToken. |
OAuthGrantType | The grant type for the OAuth flow. |
OAuthParams | A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value. |
OAuthRefreshToken | The OAuth refresh token for the corresponding OAuth access token. |
OAuthRefreshTokenURL | The URL to refresh the OAuth token from. |
OAuthRequestTokenURL | The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0. |
OAuthSettingsLocation | The location of the settings file where OAuth values are saved when InitiateOAuth is set to GETANDREFRESH or REFRESH. Alternatively, this can be held in memory by specifying a value starting with memory://. |
OAuthTokenTimestamp | The Unix epoch timestamp in milliseconds when the current Access Token was created. |
OAuthVerifier | The verifier code returned from the OAuth authorization URL. |
OAuthVersion | The version of OAuth being used. |
Proxy | |
ProxyAuthScheme | The authentication type to use to authenticate to the ProxyServer proxy. |
ProxyAutoDetect | This indicates whether to use the system proxy settings or not. This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings. |
ProxyExceptions | A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer . |
ProxyPassword | A password to be used to authenticate to the ProxyServer proxy. |
ProxyPort | The TCP port the ProxyServer proxy is running on. |
ProxyServer | The hostname or IP address of a proxy to route HTTP traffic through. |
ProxySSLType | The SSL type to use when connecting to the ProxyServer proxy. |
ProxyUser | A user name to be used to authenticate to the ProxyServer proxy. |
Schema | |
SchemaIniLocation | A path to the directory that contains the schema.ini file. |
SFTP | |
SSLMode | The authentication mechanism to be used when connecting to the FTP or SFTP server. |
SSH | |
SSHAuthMode | The authentication method to be used to log on to an SFTP server. |
SSHClientCert | A certificate to be used for authenticating the user. |
SSHClientCertPassword | The password of the SSHClientCert certificate if it has one. |
SSHClientCertType | The type of SSHClientCert certificate. |
SSL | |
SSLClientCert | The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL). |
SSLClientCertPassword | The password for the TLS/SSL client certificate. |
SSLClientCertSubject | The subject of the TLS/SSL client certificate. |
SSLClientCertType | The type of key store containing the TLS/SSL client certificate. |
Last updated