Exploring the IBL Data Pipeline
Here we will introduce some useful DataJoint tools and concepts to help you explore the IBL data pipeline. Before proceeding make sure that you have installed the IBL python environment and set up your Datajoint credentials
A few definitions
First of all, let’s define a few basic definitions:
Table - collection of rows and columns that contain data
Schema - a collection of related tables
Module - script where schema and associated tables are defined
Pipeline - collection of schemas
Example nomenclature would be to say that we want to get data from the Subject table stored in the ibl_subjects schema which are together defined in the subject module in the IBL pipeline.
In practice, modules are often referred to as schemas and so we would refer to the subject module as the subject schema
Let’s now proceed by importing Datajoint and some schemas from the IBL pipeline
[ ]:
import datajoint as dj
dj.config['display.limit'] = 5
from ibl_pipeline import reference, subject, action, acquisition, data, behavior
from ibl_pipeline.analyses import behavior as behavior_analyses
1. Browsing schemas
The IBL pipeline contains a number of different schemas. To browse which schemas are available, we can use the dj.list_schemas() command,
[ ]:
dj.list_schemas()
Major schemas include:
ibl_reference, ibl_subject, ibl_action, ibl_acquisition, and ibl_dataibl_behavior and ibl_ephysibl_analyses_behavior2. Browsing tables in a schemas
We can see what tables are defined in a schema using the dj.Diagram command. For example, to see the tables defined in the subject schema we can type,
dj.Diagram(subject)
Note
For more information about the what the different colours and line types mean, please refer to this more comprehensive tutorial
We can also use the following code snippet to list the tables that are defined in a schema
[ ]:
from datajoint.user_tables import UserTable
import inspect
def list_tables(schema):
for k in dir(schema):
t = getattr(schema, k)
if inspect.isclass(t) and issubclass(t, UserTable):
print(k)
[ ]:
list_tables(subject)
3. Getting the detailed definition of a table
To find out details about a table, we can use the describe method
[ ]:
subject.Subject.describe()
4. Browsing data in tables - queries
Query all subjects
[ ]:
subject.Subject()
Restriction &: filtering data
Restriction: Query one subject
[ ]:
# restrict by string
subject.Subject & 'subject_nickname="ibl_witten_10"'
[ ]:
# restrict by dictionary
from uuid import UUID
subject.Subject & {'subject_uuid': UUID('00c60db3-74c3-4ee2-9df9-2c84acf84e92')}
Restriction: Query all male subjects
[ ]:
subject.Subject & {'sex': 'm'}
Restriction: Query subjects born after a date
[ ]:
subject.Subject & 'subject_birth_date > "2019-01-01"'
Restriction: Query subjects within a range of dates
[ ]:
subject.Subject & 'subject_birth_date between "2019-01-01" and "2019-04-01"'
Restriction: Query subjects on multiple attributes
[ ]:
subject.Subject & 'subject_birth_date > "2019-01-01"' & 'sex="M"'
Restriction: Query subjects restricted by other tables
[ ]:
# subjects with Behavioural sessions
subject.Subject & acquisition.Session
[ ]:
# subjects without Behavioural sessions
subject.Subject - acquisition.Session
Join *: gather information from different tables
[ ]:
subject.Subject * acquisition.Session
Projection .proj(): focus on attributes of interest
[ ]:
subject.Subject.proj()
[ ]:
subject.Subject.proj('subject_birth_date', 'sex')
rename attribute with proj()
[ ]:
subject.Subject.proj('sex', dob='subject_birth_date')
perform simple computations with proj
Example 1: Get the date of a session
[ ]:
sessions_with_date = acquisition.Session.proj(session_date='date(session_start_time)')
[ ]:
sessions_with_date
Example 2: Get the age of the animal at each session
[ ]:
# First get the date of birth and the session date into the same query
q = subject.Subject * acquisition.Session
[ ]:
# Then compute the age
q_with_age = q.proj(age='datediff(session_start_time, subject_birth_date)')
q_with_age
Aggregation .aggr(): simple computation of one table against another table
Example: How many sessions has each subject done so far?
[ ]:
subject.Subject.aggr(acquisition.Session, 'subject_nickname', n='count(*)')
5. Fetching data
Fetch all fields: fetch()
[ ]:
# fetch all data from a table
subjs = subject.Subject.fetch()
subjs[:5]
[ ]:
subjs['subject_uuid'][:5]
[ ]:
subjs['subject_birth_date'][:5]
[ ]:
# fetch as a list of dictionaries
subjs_dict = subject.Subject.fetch(as_dict=True)
subjs_dict[:5]
[ ]:
# fetch as pandas dataframe
subjs_df = subject.Subject.fetch(format='frame')
subjs_df[:5]
[ ]:
# fetch the primary key
pk = subject.Subject.fetch('KEY')
pk[:5]
[ ]:
# fetch specific attributes
dob, sex = subject.Subject.fetch('subject_birth_date', 'sex')
dob[:5]
[ ]:
info = subject.Subject.fetch('subject_birth_date', 'sex', as_dict=True)
info[:5]
Fetch data from only one entry: fetch1
[ ]:
ibl_witten_10 = (subject.Subject & {'subject_nickname': 'ibl_witten_10'}).fetch1('KEY')
[ ]:
ibl_witten_10
[ ]:
IBL_10 = (subject.Subject & {'subject_nickname': 'IBL_10'}).fetch1()
[ ]:
IBL_10