General Usage Guide#
dbastable intends to create and manipulate tables in SQLite databases in the same way you interact with Table from astropy package.
But, as an SQLite database can contain numerous tables, you need first to create an SQLDatabase object, which will be the entry point to the database. This database can be created in memory or on disk. If you pass a filename to the SQLDatabase constructor, the database will be created on disk. If you pass ':memory:' as filename, the database will be created in memory.
>>> from dbastable import SQLDatabase
>>> # this will create a database on disk
>>> db = SQLDatabase('mydatabase.db')
>>> # this will create a database in memory
>>> db = SQLDatabase(':memory:')
Another useful argument is the autocommit, which enable sqlite3 to commit changes automatically after each command. If you set it to False, you will need to call commit() method to commit changes to the database. While autocommit reduces the need of manual method calls, it can reduce the performance of the database if a lot of small operations are done.
Data Creation, Manipulation and Deletion#
Creating Tables#
Once you have a database, you can create a table in it. The creation is performed by the SQLDatabase.add_table method. This method takes a table name as the main argument.
>>> # this creates a table called 'table_1' in the database
>>> db.add_table('table_1')
>>> # to access the created table, you can use the '[]' operator
>>> db['table_1']
SQLTable 'table_1' in database ':memory:':(0 columns x 0 rows)
<No columns>
You can also create a table with a list of columns and/or data. The columns can be created as a list of strings, which are the column names. The data is a list of tuples, where each tuple represents a row of data. The rows must be ordered in the same way as the columns.
>>> # this creates a table called 'table_2' with 2 columns and 2 rows
>>> db.add_table('table_with_data', columns=['name', 'birth'],
... data=[('Someone', 2001), ('No one', 2002)])
>>> db['table_with_data']
SQLTable 'table_with_data' in database ':memory:':(2 columns x 2 rows)
name birth
------- -----
Someone 2001
No one 2002
>>> # this creates a table called 'table_with_columns' with 2 columns but not data
>>> db.add_table('table_with_columns', columns=['name', 'birth'])
>>> db['table_with_columns']
SQLTable 'table_with_columns' in database ':memory:':(2 columns x 0 rows)
name birth
---- -----
Adding Columns and Data#
To add columns to your table, there is a method called SQLDatabase.add_column. The main arguments are the table name and the column name. You can also pass a data argument, which is a list of data to be added to the column. The data must be ordered in the same way as the rows of the already existing table. If no data is passed, the column will be created with None values.
>>> # column 'email' will be filled with None
>>> db.add_column('table_with_data', 'email')
>>> db.add_column('table_with_data', 'age', data=[22, 21])
>>> db['table_with_data']
SQLTable 'table_with_data' in database ':memory:':(4 columns x 2 rows)
name birth email age
------- ----- ----- ---
Someone 2001 None 22
No one 2002 None 21
You can also create new columns using __setitem__ features.
>>> # create a new column 'email' with some data
>>> db['table_with_columns']['email'] = ['test@test.org']
>>> db['table_with_columns']
SQLTable 'table_with_columns' in database ':memory:':(3 columns x 1 rows)
name birth email
---- ----- -------------
None None test@test.org
Adding Rows to the Table#
To add rows to the table, you can use the SQLDatabase.add_rows method. A table name must be passed as the first argument. The data fo the rows can be one of two types:
A
listoftuple, where each tuple represents a row of data. The rows must be ordered in the same way as the columns.>>> # this will add 2 rows to the table using a list of tuples >>> db.add_rows('table_with_data', [('Someone else', 2003, 'test@yes.no', 20), ... ('Another one', 2004, 'test@no.yes', 19)]) >>> db['table_with_data'] SQLTable 'table_with_data' in database ':memory:':(4 columns x 4 rows) name birth email age ------------ ----- ----------- --- Someone 2001 None 22 No one 2002 None 21 Someone else 2003 test@yes.no 20 Another one 2004 test@no.yes 19
A
dict, where each key represents the data of a columns. All the elements must be scalars or 1d-lists with the same length.>>> # this will add 2 rows to the table using a list of tuples >>> db.add_rows('table_with_data', {'name': ['Mr. Mr', 'Mrs. Mrs'], ... 'birth': [1990, 1991], ... 'email': [None, 'test@test.test'], ... 'age': [30, 29]}) >>> db['table_with_data'] SQLTable 'table_with_data' in database ':memory:':(4 columns x 6 rows) name birth email age ------------ ----- -------------- --- Someone 2001 None 22 No one 2002 None 21 Someone else 2003 test@yes.no 20 Another one 2004 test@no.yes 19 Mr. Mr 1990 None 30 Mrs. Mrs 1991 test@test.test 29
When adding a row with a dict, you can optionally choose to add the missing columns to the table. This is done by passing add_columns=True to the SQLDatabase.add_rows method. The missing columns will be created with None values.
>>> db.add_rows('table_with_data', {'name': 'From Future', 'birth': 3000,
... 'flying_cars': 2}, add_columns=True)
>>> db['table_with_data']
SQLTable 'table_with_data' in database ':memory:':(5 columns x 7 rows)
name birth email age flying_cars
------------ ----- -------------- ---- -----------
Someone 2001 None 22 None
No one 2002 None 21 None
Someone else 2003 test@yes.no 20 None
Another one 2004 test@no.yes 19 None
Mr. Mr 1990 None 30 None
Mrs. Mrs 1991 test@test.test 29 None
From Future 3000 None None 2
Acessing the Data#
The access of data wase designed to use as mutch as possible the [] (__getitem__) operator. But, as SQLite interacts with a very different way with the data, we developed a series of viewer class that will interface the queries in the database. These classes are SQLTable, SQLColumn and SQLRow. These three viewer classes are also returned by the SQLDatabase.get_table, SQLDatabase.get_column and SQLDatabase.get_row methods.
When working with the [] operator, it’s mandatory that the first element of the item to get is a table name, just like we did above to show the results. But you can access a column, a row or a cell with the same way.
To get a table:
>>> # get the table
>>> table = db['table_with_data']
>>> print(table)
SQLTable 'table_with_data' in database ':memory:':(5 columns x 7 rows)
...
To work with a column of a table:
>>> col = db['table_with_data', 'name']
>>> print(col)
SQLColumn 'name' in table 'table_with_data' (7 rows)
>>> print(col.values)
['Someone', 'No one', 'Someone else', 'Another one', 'Mr. Mr', 'Mrs. Mrs', 'From Future']
>>> # or, you can get it from the SQLTable directly
>>> col = table['name']
>>> print(col)
SQLColumn 'name' in table 'table_with_data' (7 rows)
>>> print(col.values)
['Someone', 'No one', 'Someone else', 'Another one', 'Mr. Mr', 'Mrs. Mrs', 'From Future']
To work with a row of a table:
>>> row = db['table_with_data', 0]
>>> print(row)
SQLRow 0 in table 'table_with_data' {'name': 'Someone', 'birth': 2001, 'email': None, 'age': 22, 'flying_cars': None}
>>> print(row.values)
('Someone', 2001, None, 22, None)
>>> # or, you can get it from the SQLTable directly
>>> row = table[3]
>>> print(row)
SQLRow 3 in table 'table_with_data' {'name': 'Another one', 'birth': 2004, 'email': 'test@no.yes', 'age': 19, 'flying_cars': None}
When the depth level of the _getitem__ reaches a cell, only the value is returned and not a full new viewer object. So, when you call for a row of a column or a column of a row, you will get the value of the cell.
>>> print(db['table_with_data', 0, 'name'])
Someone
>>> print(db['table_with_data', 'birth', 2])
2003
>>> # Also, you can operate rows as slices!
>>> print(db['table_with_data', 'name', 1:3])
['No one', 'Someone else']
Changing the Data#
In the same way, you can use the [] operator (__setitem___) to set data for cells, entire columns or entire rows. This operator, as mentioned earlier, can also be used to create new columns. It will perform the data setting by calling SQLDatabase.set_item, SQLDatabase.set_column or SQLDatabase.set_row methods.
>>> # everyone has a flying car now!
>>> db['table_with_data', 'flying_cars'] = [2, 1, 3, 4, 1, 2, 6]
>>> # Someone else was born in 1995
>>> db['table_with_data', 2] = ('Someone else', 1995, 'someone@noplace.org', 28, 1)
>>> # Mr. Mr now has an email
>>> db['table_with_data', 4, 'email'] = 'mistermister@nomail.com'
>>> print(db['table_with_data'])
SQLTable 'table_with_data' in database ':memory:':(5 columns x 7 rows)
name birth email age flying_cars
------------ ----- ----------------------- ---- -----------
Someone 2001 None 22 2
No one 2002 None 21 1
Someone else 1995 someone@noplace.org 28 1
Another one 2004 test@no.yes 19 4
Mr. Mr 1990 mistermister@nomail.com 30 1
Mrs. Mrs 1991 test@test.test 29 2
From Future 3000 None None 6
It’s not possible to change the name of a column or a table. If you need to do that, you will need to create a new column or table with the new name and copy the data from the old one. Also, it’s not possible to change all the data of a table at once. If you need to do that, you will need to drop the table and create a new one with the same name.
Deleting The Data#
You can delete (drop) a table, delete columns or rows. This is done by calling the SQLDatabase.drop_table, SQLDatabase.delete_column or SQLDatabase.delete_row methods. The del methods where not implemented due to data integrity concerns. So, by calling these methods directly you are more aware of what you are doing.
>>> # we don't need the table_with_columns anymore
>>> db.drop_table('table_with_columns')
>>> # I don't care if they have a flying car or not
>>> db.delete_column('table_with_data', 'flying_cars')
>>> # From Future returned to future
>>> db.delete_row('table_with_data', 6)
>>> print(db['table_with_data'])
SQLTable 'table_with_data' in database ':memory:':(4 columns x 6 rows)
name birth email age
------------ ----- ----------------------- ---
Someone 2001 None 22
No one 2002 None 21
Someone else 1995 someone@noplace.org 28
Another one 2004 test@no.yes 19
Mr. Mr 1990 mistermister@nomail.com 30
Mrs. Mrs 1991 test@test.test 29
Additional Features and Properties#
The SQLDatabase class has some additional features and properties that can be useful. You can interact with the table in a more SQL-like way, but without the need to write SQL commands. You can also get the list of tables in the database and the list of columns in a table, or the lenght of a table.
Properties#
The two main properties that you may use to interact with the database are SQLDatabase.table_names and SQLDatabase.column_names.
To get the list of tables in the database, you can use the SQLDatabase.table_names property. This property returns a list of strings, where each string is a table name. I the first time you call it, it will query the database to get the list of tables, and them, build a internal cache with the names. The next time you call it, it will use the cache instead of querying the database again. Also, modifying the table also leads to changes in the cache. Because of this, do not change the database externally while the db is opened here! This reduces the need of querying the database every time you need the list of tables, speeding up a lot the process.
>>> # let's add some more tables to get a list of tables
>>> db.add_table('table_2')
>>> db.add_table('table_3')
>>> # get the list of tables in the database
>>> db.table_names
['table_1', 'table_with_data', 'table_2', 'table_3']
A similar method is used to get the list of columns in a table. The SQLDatabase.column_names method receives a table name and return the list of the columns in that table. It also caches the names to speed up the process, so the same warning mentioned earlier is also valid here.
>>> # get the list of columns in a table
>>> db.column_names('table_with_data')
['name', 'birth', 'email', 'age']
SQL Methods#
There are two high level methods that you may want to use: SQLDatabase.select and SQLDatabase.count.
select#
SQLDatabase.select perform the SELECT operation inside the database. However, do not need to write the command by yourself. You pass a bunch of pre-defined arguments to the method and it will build the command for you. The main arguments are:
table: The name of the table to be selected.columns: A list of columns to be selected. If not passed, all the columns will be selected.where: Conditions to select a row or not.order: Columns to be used to sort the rows.limitandoffset: Limits the number of rows to be selected.
Only values will be returned, so do not expect a full table. The result will be a list of tuple, with each tuple being a row of data.
Look that not all features of SQL SELECT can be used. This is intentional to keep the method simple. If you need more complex queries, you need to write the commands by yourself and use the proper functions described below.
>>> # select all the columns from the table
>>> db.select('table_with_data', columns=['name', 'birth', 'age'],
... order=['birth'], limit=4, offset=1)
[('Mrs. Mrs', 1991, 29), ('Someone else', 1995, 28),
('Someone', 2001, 22), ('No one', 2002, 21)]
Where Statemens can be used. A better destription is found in the link, but here is a quick example:
>>> from dbastable import Where
>>> # select all the columns from the table where age is greater than 25
>>> db.select('table_with_data', columns=['name', 'birth', 'age'],
... where=Where('age', '>', 25), order=['age'])
[('Someone else', 1995, 28), ('Mrs. Mrs', 1991, 29), ('Mr. Mr', 1990, 30)]
count#
SQLDatabase.count perform the COUNT operation inside the database. In addition to the table name, it just receives a where argument, which is a Where Statement.
>>> # count the number of rows in the table
>>> db.count('table_with_data')
6
>>> # count the number of rows where age is greater than 25
>>> db.count('table_with_data', where=Where('age', '>', 25))
3
execute and executemany#
It you have an SQL command that you manually wrote and want to execute, ou can use the SQLDatabase.execute and SQLDatabase.executemany method. They are the lower level methods called by all other methods and wraps execute and executemany methods. The first one is used to execute a single command, while the second one is used to execute a command multiple times, with different parameters. Both accept the arguments parameter if you want to use the ? placeholder in the command and avoid SQL injection hacks. Learn more with Little Bob.
The direct use of these methods may not be needed, since we already handle almost all operations you may need. But, if you need a very specific case, like SUM columns, you can use them.
>>> # sum the age of all people in the table
>>> db.execute('SELECT SUM(age) FROM table_with_data')
[(149,)]
>>> # sum the age of all people in the table where age is greater than 25
>>> db.execute('SELECT SUM(age) FROM table_with_data WHERE age > ?',
... arguments=[25])
[(87,)]
commit#
The SQLDatabase.commit method is used to apply changes to the database. When a database is changed, sqlite3 do not change the original database as default. It save the changes to a separated diff file and only merge it when the commit command is called. It’s called automatically if autocommit is set to True. If not, you will need to call it manually.