csvDB Documentation
Sections
What is csvDB?

csvDB is an SQL engine for CSV (comma separated values) and Apache web server log files, allowing files to be searched as if they were database tables. It includes both a library and a client, allowing it to be used programatically and from the command line.

Back to Menu

Basic Info
csvdb [-f <file>]  [-a <file>] [-n <alias>] [-e <query>]

If no -e query is given on the command line, the csvdb interactive shell will open.

-f <file>
    Load <file> as a table. Multiple -f arguments can be used to include multiple files

-a <file>
    Load apache log file <file> as a table. Multiple -a arguments can be used to include multiple files

-n <alias>
    Set <alias> as a name alias for the previous file loaded with -f or -a

-e <query>
    Execute <query>

Back to Menu

Examples
csvdb -f test.csv -e "DESCRIBE FILE"

    Load test.csv and describe (list) it's columns

csvdb -f test.csv -n test

    Load test.csv with alias test, and start the csvdb shell

csvdb -f test.csv -f /home/user/example.csv

    Load test.csv and /home/user/example.csv, and start the csvdb shell

csvdb

    Start the csvdb shell

Back to Menu

Queries

csvDB has a fair - but nowhere near complete - subset of SQL available.

The query documentation that follows applies to csvDB version 0.4 and above.

Back to Menu

SELECT
SELECT
    [ALL | DISTINCT]
    select_expr [, select_expr ...]
    FROM {table_reference | file_reference} [AS tbl_alias] [{ , | JOIN} {table_reference | file_reference} [AS tbl_alias]]
    [WHERE where_condition]
    [GROUP BY {col_name | expr} [, ... ]
      [HAVING where_condition]]
    [ORDER BY {col_name | expr}
      [AS {INT | STRING}] [ASC | DESC], ...]
    [LIMIT {[offset,] row_count | row_count OFFSET offset}]
    [INTO OUTFILE 'file_name']

SELECT is used to retrieve rows selected from tables list in table_reference

The most commonly used clauses of SELECT statements are:

The list of select_expr terms comprises the list of columns to retrieve. Terms specify a column or expression, or * can be used as shorthand to select all columns:

SELECT * FROM table;

table_name.* can be used to select all columns from the named table:

SELECT tbl1.* , tlb2.col1 FROM tbl1 JOIN tbl2;

The * expression is specially optimised, so is more efficient, and faster than specifying all rows individually.

A select_expr can be given an alias using AS alias_name. The alias is used as the expression's column name and can be used in GROUP BY, ORDER BY, and WHERE clauses. For example:

SELECT columna AS c FROM mytable ORDER BY c;

The AS keyword is required when aliasing a select_expr with an identifier, therefore this will cause an error:

SELECT columna c FROM mytable ORDER BY c;

Column aliases on dynamic values (such as COUNT()) cannot be used in a WHERE clause.

Columns can be either a column name, a dynamic value (such as COUNT()) or a position reference using COLUMN(num) where num is a numeric value indicating the position of the column in the table.

SELECT COLUMN(1) FROM mytable;

This selects the first column from mytable.

Column names and position references can optionally be prefixed with table_name.

SELECT tbl1.COLUMN(1) , tlb2.col1 FROM tbl1 JOIN tbl2;

The DISTINCT keyword will select only rows where the column has a unique value. ALL will select all rows regardless of duplicate values, this is the default behaviour.

The table_reference or file_reference indicates the table or file from which to retrieve rows.

table_reference or file_reference can be either a filename, an alias for a table or FILE. The FILE keyword references the first table in the current session's table list, see SHOW TABLES to check the table list.

A table_reference can be given an alias using AS alias_name. The alias is used as the table's name and can be used in GROUP BY, ORDER BY, and WHERE clauses. For example:

SELECT columna FROM mytable AS t;

Columns from multiple tables can be retrieved using implicit or explicit JOINs:

SELECT * FROM table1 , table2;
SELECT * FROM table1 JOIN table2;

The ON keyword for conditional joins is not currently supported. Rows are always joined by the tables' internal keys.

LEFT, RIGHT, INNER, OUTER keywords for JOINs are currently ignored.

where_condition is used to compare a column with a value, or another column:

SELECT columna FROM mytable WHERE columnb = 'something';
SELECT columna FROM mytable WHERE columnb LIKE 'something';
SELECT columna FROM mytable WHERE columnb IS 'something';
SELECT columna FROM mytable WHERE columnb != 'something';
SELECT columna FROM mytable WHERE columnb NOT LIKE 'something';
SELECT columna FROM mytable WHERE columnc < 3;
SELECT columna FROM mytable WHERE columnc > 3;
SELECT columna FROM mytable WHERE columnc <= 3;
SELECT columna FROM mytable WHERE columnc >= 3;
SELECT columna FROM mytable WHERE columnb = columna;

The keyword NULL will only match against a value which does not exist. This typically only happens where a row in a csv file has less columns than the table.

SELECT columna FROM mytable WHERE columnb IS NULL;

The wildcard '%' can also be used to match part of a value:

SELECT columna FROM mytable WHERE columnb LIKE 'some%';

'%%' can be used to match a literal '%' character:

SELECT columna FROM mytable WHERE columnb = '100%%';

Multiple conditions can be used with AND/OR groupings:

SELECT columna FROM mytable WHERE columnb NOT LIKE 'something' AND columnc < 3 OR columnc > 6 AND columna IS NOT NULL;

This will return all results that match columnb NOT LIKE 'something' and columna IS NOT NULL and either columnc < 3 or columnc > 6

col_name or expr in ORDER BY and GROUP BY clauses can be the column's name, an alias as specified in a select_expr, or a position reference using COLUMN(num) where num is a numeric value indicating the position of the column in the table.

SELECT columna AS c FROM mytable ORDER BY c;

ORDER BY can order the values using string or integer comparison.

SELECT columna AS c FROM mytable ORDER BY c AS STRING;
SELECT columna AS c FROM mytable ORDER BY c AS INT;

Where neither is specified, STRING is the default behaviour.

The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments.

With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.

SELECT * FROM table LIMIT 2,10;

Will return 10 rows, starting from the 3rd row. To retrieve all rows from a certain offset up to the end of the result set, you can use -1 for the second parameter.

SELECT * FROM table LIMIT 95,-1;

With one argument, the value specifies the number of rows to return from the beginning of the result set, so that offset is 0:

SELECT * FROM table LIMIT 5;
SELECT * FROM table LIMIT 0,5;

These queries are equivalent and will return the same results.

file_name should be a string containing a valid filename, in a writable directory.

Back to Queries Menu

LOAD DATA
LOAD DATA INFILE [WITHOUT NAMES | APACHE] 'file_name'
    [AS alias]
    [INTO TABLE 'outfile_name']
    [FIELDS
        [TERMINATED BY 'char']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]

The LOAD DATA INFILE statement reads rows from a text file into a table. file_name must be a string containing a path and file name for an existing csv file.

LOAD DATA INFILE 'data.csv';
LOAD DATA INFILE '/home/user/example.csv';

An alias may be specified for a table using the AS keyword:

LOAD DATA INFILE 'data.csv' AS data;

The table may then be referenced as alias in queries.

By default csvDB will read the first line of the file as the column names, stripping any non-numeric characters. If WITHOUT NAMES is specified when loading all lines will be loaded as table rows, and csvDB will build a set of column names to suit.

LOAD DATA INFILE WITHOUT NAMES 'data.csv' AS data;

The APACHE keyword is used to specify that the file is an Apache web server log file, csvDB will load the file and automatically assign column names based on the file's format.

LOAD DATA INFILE APACHE /var/log/apache2/access.log

outfile_name should be a string containing a valid filename, in a writable directory, this will effectively copy the data into the new file, then load that file.

Back to Queries Menu

DROP
DROP [TEMPORARY | PERMANENT] TABLE [IF EXISTS]
    tbl_name

DROP TABLE will unload a table from the current csvDB session.

The PERMANENT keyword will also delete the table's file from the file system. The file will only be deleted from the file system if the table is successfully unloaded first.

The TEMPORARY keyword will only unload the table, without deleting it from the filesystem, this is the default behaviour if neither keyword is specified.

DROP TABLE table;

The IF EXISTS clause will prevent an error occuring if the table is not currently loaded. The table must still be loaded for PERMANENT to take effect, IF EXISTS does not effect the behaviour.

tbl_name can be either a filename, an alias for a table or FILE. The FILE keyword references the first table in the current session's table list, see SHOW TABLES to check the table list.

Back to Queries Menu

SHOW
SHOW TABLES
    [LIKE 'pattern']

SHOW TABLES lists tables loaded in the current csvDB session, and any aliases set for those tables.

The LIKE clause, if present, will result in only tables matching pattern being listed.

SHOW SETTINGS

SHOW SETTINGS lists the current session settings.

Back to Queries Menu

DESCRIBE
SHOW COLUMNS FROM  tbl_name
{DESCRIBE | DESC} tbl_name

SHOW COLUMNS FROM lists columns in a table.

DESCRIBE is a shortcut for SHOW COLUMNS FROM.

tbl_name can be either a filename, an alias for a table or FILE. The FILE keyword references the first table in the current session's table list, see SHOW TABLES to check the table list.

Back to Queries Menu

UPDATE
UPDATE [IGNORE] table_reference
    SET col_name1=expr1 [, col_name2=expr2] ...
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]

The UPDATE statement updates columns of existing rows in the named table with new values. The SET clause indicates which columns to modify and the values they should be given. Each value can be given as an expression.

The IGNORE keyword currently has no effect.

table_reference can be either a filename, an alias for a table or FILE. The FILE keyword references the first table in the current session's table list, see SHOW TABLES to check the table list.

col_name* can be the column's name, or a position reference using COLUMN(num) where num is a numeric value indicating the position of the column in the table.

The WHERE clause, if given, specifies the conditions that identify which rows to update. With no WHERE clause, all rows are updated.

If the ORDER BY clause is specified, the rows are updated in the order that is specified.

The LIMIT clause places a limit on the number of rows that can be updated.

Back to Queries Menu

INSERT
INSERT [IGNORE]
    [INTO] tbl_name (col_name,...)
    {VALUES | VALUE} ({value},...),(...),...

INSERT inserts new rows into an existing table.

tbl_name can be either a filename, an alias for a table or FILE. The FILE keyword references the first table in the current session's table list, see SHOW TABLES to check the table list.

col_name can be the column's name, or a position reference using COLUMN(num) where num is a numeric value indicating the position of the column in the table.

Where not all column values are explicitly set, the column will be set to '' (empty string).

Back to Queries Menu

CREATE
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name
    (create_definition,...)

CREATE TABLE creates a table with the name tbl_name.

The TEMPORARY keyword currently has no effect.

tbl_name must be a unique identifier or file name, if IF NOT EXISTS is specified, and a table named tbl_name no error will be generated.

create_definition should be a column name, csvDB does not use datatypes and will ignore any datatype keywords, datatype keywords will not however cause an error.

Back to Queries Menu

DELETE
DELETE [IGNORE] FROM tbl_name
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]

The DELETE statement deletes rows from tbl_name and returns a count of the number of deleted rows.

The IGNORE keyword currently has no effect.

tbl_name can be either a filename, an alias for a table or FILE. The FILE keyword references the first table in the current session's table list, see SHOW TABLES to check the table list.

The WHERE clause, if given, specifies the conditions that identify which rows to delete. With no WHERE clause, all rows are deleted.

If the ORDER BY clause is specified, the rows are deleted in the order that is specified.

The LIMIT clause places a limit on the number of rows that can be deleted.

Back to Queries Menu

ALTER TABLE
ALTER [IGNORE] TABLE tbl_name
  { ADD [COLUMN] col_name column_definition
        [FIRST | AFTER col_name ]
  | CHANGE [COLUMN] col_name new_col_name column_definition
        [FIRST|AFTER col_name]
        [FIRST | AFTER col_name]
  | DROP [COLUMN] col_name
  | RENAME [TO] new_tbl_name}

ALTER TABLE changes the structure of a table. For example, you can add or delete columns, or rename columns or the table itself.

tbl_name can be either a filename, an alias for a table or FILE. The FILE keyword references the first table in the current session's table list, see SHOW TABLES to check the table list.

col_name must refer to an existing column in the table.

new_col_name is the name to change the column to, new_col_name must be unique within the table.

If the IGNORE keyword is set, no error will be generated if a specified modification has previously been applied.

new_tbl_name can either be a string containing a valid filename, in a writable directory, or a new alias for the table.

If new_tbl_name is a filename (recognised by a .csv ending), the table data will be copied to the new file, the old file deleted, and the table dropped from the current session. The new file will then be loaded. Renaming with a filename is therefore functionally equivalent to:

SELECT * FROM tbl_name INTO OUTFILE new_tbl_name;
DROP PERMANENT TABLE tbl_name;
LOAD DATA INFILE new_tbl_name;

column_definition is mostly ignored, however if a DEFAULT is set when adding a column, all rows will have the new column set to this default value.

Back to Queries Menu

SET
SET setting {ON | OFF}

SET is used to change the state of a session configuration setting.

setting may be one of:

In all cases OFF is the default setting, and settings will not be kept across sessions.

Back to Queries Menu