{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring the IBL Data Pipeline\n", "\n", "Here we will introduce some useful DataJoint tools and concepts to help you explore the IBL data pipeline. Before proceeding make sure that you have installed the [IBL python environment](../../02_installation.md) and set up your [Datajoint credentials](../../dj_docs/dj_credentials.md)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A few definitions\n", "First of all, let's define a few basic definitions:\n", "\n", "- Table - collection of rows and columns that contain data\n", "- Schema - a collection of related tables \n", "- Module - script where schema and associated tables are defined\n", "- Pipeline - collection of schemas\n", "\n", "Example nomenclature would be to say that we want to get data from the `Subject table` stored in the `ibl_subjects schema` which are together defined in the `subject module` in the `IBL pipeline`. \n", "\n", "In practice, `modules` are often referred to as schemas and so we would refer to the `subject module` as the `subject schema`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now proceed by importing Datajoint and some schemas from the IBL pipeline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datajoint as dj\n", "dj.config['display.limit'] = 5\n", "from ibl_pipeline import reference, subject, action, acquisition, data, behavior\n", "from ibl_pipeline.analyses import behavior as behavior_analyses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Browsing schemas\n", "The IBL pipeline contains a number of different schemas. To browse which schemas are available, we can use the `dj.list_schemas()` command," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dj.list_schemas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Major schemas include: \n", "Meta data from **Alyx**: `ibl_reference`, `ibl_subject`, `ibl_action`, `ibl_acquisition`, and `ibl_data` \n", "Imported data from **FlatIron**: `ibl_behavior` and `ibl_ephys` \n", "Computed analzyed results: `ibl_analyses_behavior`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Browsing tables in a schemas\n", "We can see what tables are defined in a schema using the `dj.Diagram` command. For example, to see the tables defined in the `subject` schema we can type," ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "dj.Diagram(subject)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Note\n", "\n", "For more information about the what the different colours and line types mean, please refer to this more comprehensive [tutorial](https://github.com/int-brain-lab/IBL-pipeline/blob/master/notebooks/notebooks_tutorial/201909_code_camp/1-Explore%20IBL%20data%20pipeline%20with%20DataJoint.ipynb)\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use the following code snippet to list the tables that are defined in a schema" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datajoint.user_tables import UserTable\n", "import inspect\n", "\n", "\n", "def list_tables(schema):\n", " for k in dir(schema):\n", " t = getattr(schema, k)\n", " if inspect.isclass(t) and issubclass(t, UserTable):\n", " print(k)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list_tables(subject)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Getting the detailed definition of a table\n", "To find out details about a table, we can use the `describe` method" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Browsing data in tables - queries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Query all subjects" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Restriction `&`: filtering data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Restriction: Query one subject" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# restrict by string\n", "subject.Subject & 'subject_nickname=\"ibl_witten_10\"'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# restrict by dictionary\n", "from uuid import UUID\n", "subject.Subject & {'subject_uuid': UUID('00c60db3-74c3-4ee2-9df9-2c84acf84e92')}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Restriction: Query all male subjects" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject & {'sex': 'm'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Restriction: Query subjects born after a date" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject & 'subject_birth_date > \"2019-01-01\"'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Restriction: Query subjects within a range of dates" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject & 'subject_birth_date between \"2019-01-01\" and \"2019-04-01\"'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Restriction: Query subjects on multiple attributes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject & 'subject_birth_date > \"2019-01-01\"' & 'sex=\"M\"'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Restriction: Query subjects restricted by other tables" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# subjects with Behavioural sessions\n", "subject.Subject & acquisition.Session" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# subjects without Behavioural sessions\n", "subject.Subject - acquisition.Session" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Join `*`: gather information from different tables" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject * acquisition.Session" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Projection `.proj()`: focus on attributes of interest" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject.proj()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject.proj('subject_birth_date', 'sex')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### rename attribute with ***proj()***" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject.proj('sex', dob='subject_birth_date')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### perform simple computations with ***proj***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example 1: Get the date of a session**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sessions_with_date = acquisition.Session.proj(session_date='date(session_start_time)')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sessions_with_date" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example 2: Get the age of the animal at each session**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# First get the date of birth and the session date into the same query\n", "q = subject.Subject * acquisition.Session" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Then compute the age\n", "q_with_age = q.proj(age='datediff(session_start_time, subject_birth_date)')\n", "q_with_age" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregation `.aggr()`: simple computation of one table against another table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example: How many sessions has each subject done so far?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subject.Subject.aggr(acquisition.Session, 'subject_nickname', n='count(*)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Fetching data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fetch all fields: `fetch()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# fetch all data from a table\n", "subjs = subject.Subject.fetch()\n", "subjs[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subjs['subject_uuid'][:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subjs['subject_birth_date'][:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# fetch as a list of dictionaries\n", "subjs_dict = subject.Subject.fetch(as_dict=True)\n", "subjs_dict[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# fetch as pandas dataframe\n", "subjs_df = subject.Subject.fetch(format='frame')\n", "subjs_df[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# fetch the primary key\n", "pk = subject.Subject.fetch('KEY')\n", "pk[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# fetch specific attributes\n", "dob, sex = subject.Subject.fetch('subject_birth_date', 'sex')\n", "dob[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "info = subject.Subject.fetch('subject_birth_date', 'sex', as_dict=True)\n", "info[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fetch data from only one entry: `fetch1`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ibl_witten_10 = (subject.Subject & {'subject_nickname': 'ibl_witten_10'}).fetch1('KEY')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ibl_witten_10" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "IBL_10 = (subject.Subject & {'subject_nickname': 'IBL_10'}).fetch1()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "IBL_10" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }