Exploring the IBL Data Pipeline

Here we will introduce some useful DataJoint tools and concepts to help you explore the IBL data pipeline. Before proceeding make sure that you have installed the IBL python environment and set up your Datajoint credentials

A few definitions

First of all, let’s define a few basic definitions:

Table - collection of rows and columns that contain data
Schema - a collection of related tables
Module - script where schema and associated tables are defined
Pipeline - collection of schemas

Example nomenclature would be to say that we want to get data from the Subject table stored in the ibl_subjects schema which are together defined in the subject module in the IBL pipeline.

In practice, modules are often referred to as schemas and so we would refer to the subject module as the subject schema

Let’s now proceed by importing Datajoint and some schemas from the IBL pipeline

[ ]:

import datajoint as dj
dj.config['display.limit'] = 5
from ibl_pipeline import reference, subject, action, acquisition, data, behavior
from ibl_pipeline.analyses import behavior as behavior_analyses

1. Browsing schemas

The IBL pipeline contains a number of different schemas. To browse which schemas are available, we can use the dj.list_schemas() command,

[ ]:

dj.list_schemas()

Major schemas include:

Meta data from Alyx: ibl_reference, ibl_subject, ibl_action, ibl_acquisition, and ibl_data
Imported data from FlatIron: ibl_behavior and ibl_ephys
Computed analzyed results: ibl_analyses_behavior

2. Browsing tables in a schemas

We can see what tables are defined in a schema using the dj.Diagram command. For example, to see the tables defined in the subject schema we can type,

dj.Diagram(subject)

Note

For more information about the what the different colours and line types mean, please refer to this more comprehensive tutorial

We can also use the following code snippet to list the tables that are defined in a schema

[ ]:

from datajoint.user_tables import UserTable
import inspect


def list_tables(schema):
    for k in dir(schema):
        t = getattr(schema, k)
        if inspect.isclass(t) and issubclass(t, UserTable):
            print(k)

[ ]:

list_tables(subject)

3. Getting the detailed definition of a table

To find out details about a table, we can use the describe method

[ ]:

subject.Subject.describe()

4. Browsing data in tables - queries

Query all subjects

[ ]:

subject.Subject()

Restriction `&`: filtering data

Restriction: Query one subject

[ ]:

# restrict by string
subject.Subject & 'subject_nickname="ibl_witten_10"'

[ ]:

# restrict by dictionary
from uuid import UUID
subject.Subject & {'subject_uuid': UUID('00c60db3-74c3-4ee2-9df9-2c84acf84e92')}

Restriction: Query all male subjects

[ ]:

subject.Subject & {'sex': 'm'}

Restriction: Query subjects born after a date

[ ]:

subject.Subject & 'subject_birth_date > "2019-01-01"'

Restriction: Query subjects within a range of dates

[ ]:

subject.Subject & 'subject_birth_date between "2019-01-01" and "2019-04-01"'

Restriction: Query subjects on multiple attributes

[ ]:

subject.Subject & 'subject_birth_date > "2019-01-01"' & 'sex="M"'

Restriction: Query subjects restricted by other tables

[ ]:

# subjects with Behavioural sessions
subject.Subject & acquisition.Session

[ ]:

# subjects without Behavioural sessions
subject.Subject - acquisition.Session

Join `*`: gather information from different tables

[ ]:

subject.Subject * acquisition.Session

Projection `.proj()`: focus on attributes of interest

[ ]:

subject.Subject.proj()

[ ]:

subject.Subject.proj('subject_birth_date', 'sex')

rename attribute with proj()

[ ]:

subject.Subject.proj('sex', dob='subject_birth_date')

perform simple computations with proj

Example 1: Get the date of a session

[ ]:

sessions_with_date = acquisition.Session.proj(session_date='date(session_start_time)')

[ ]:

sessions_with_date

Example 2: Get the age of the animal at each session

[ ]:

# First get the date of birth and the session date into the same query
q = subject.Subject * acquisition.Session

[ ]:

# Then compute the age
q_with_age = q.proj(age='datediff(session_start_time, subject_birth_date)')
q_with_age

Aggregation `.aggr()`: simple computation of one table against another table

Example: How many sessions has each subject done so far?

[ ]:

subject.Subject.aggr(acquisition.Session, 'subject_nickname', n='count(*)')

5. Fetching data

Fetch all fields: `fetch()`

[ ]:

# fetch all data from a table
subjs = subject.Subject.fetch()
subjs[:5]

[ ]:

subjs['subject_uuid'][:5]

[ ]:

subjs['subject_birth_date'][:5]

[ ]:

# fetch as a list of dictionaries
subjs_dict = subject.Subject.fetch(as_dict=True)
subjs_dict[:5]

[ ]:

# fetch as pandas dataframe
subjs_df = subject.Subject.fetch(format='frame')
subjs_df[:5]

[ ]:

# fetch the primary key
pk = subject.Subject.fetch('KEY')
pk[:5]

[ ]:

# fetch specific attributes
dob, sex = subject.Subject.fetch('subject_birth_date', 'sex')
dob[:5]

[ ]:

info = subject.Subject.fetch('subject_birth_date', 'sex', as_dict=True)
info[:5]

Fetch data from only one entry: `fetch1`

[ ]:

ibl_witten_10 = (subject.Subject & {'subject_nickname': 'ibl_witten_10'}).fetch1('KEY')

[ ]:

ibl_witten_10

[ ]:

IBL_10 = (subject.Subject & {'subject_nickname': 'IBL_10'}).fetch1()

[ ]:

IBL_10