{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Datajoint Introductory Tutorial\n",
    "In this tutorial we will use datajoint to create a simple psychometric behavior plot.\n",
    "\n",
    "This tutorial assumes that you have setup the unified ibl environment [IBL python environment](../../02_installation.md) and set up your [Datajoint credentials](../../dj_docs/dj_credentials.md).\n",
    "\n",
    "First let's import datajoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datajoint as dj\n",
    "# for the purposes of tutorial limit the table print output to 5\n",
    "dj.config['display.limit'] = 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can access datajoint tables by importing schemas from the IBL-pipeline. Let's import the `subject` schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ibl_pipeline import subject"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Within this schema there is a datajoint table called `Subject`. This holds all the information about subjects registered on Alyx under IBL projects. Let's access this table and look at the first couple of entries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subjects = subject.Subject()\n",
    "subjects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will find the entry in the table for the same subject that we looked at in the ONE tutorial, KS022. To do this we will restrict the `Subject` table by the `subject nickname`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subjects & 'subject_nickname = \"KS022\"'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now want to find information about the behavioural sessions. This information is stored in a table `Session` defined in the `acquisition` schema. Let's import this schema, access the table and display the first few entries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ibl_pipeline import acquisition\n",
    "sessions = acquisition.Session()\n",
    "sessions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we look at the primary keys (columns with black headings) in the `Subjects` and `Sessions` table, we will notice that both contain `subject_uuid` as a primary key. This means that these two tables can be joined using *****.\n",
    "\n",
    "We want to find information about all the sessions that KS022 did in the training phase of the IBL training pipeline. When combining the tables we will therefore restrict the `Subject` table by the `subject nickname` (as we did before) and the `Sessions` table by the `task protocol`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "(subjects & 'subject_nickname = \"KS022\"') * (sessions & 'task_protocol LIKE \"%training%\"')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a lot of information in this table and we are not interested in all of it for the purposes of our analysis. Let's just use the **proj** method to restrict the data presented. We do not want any columns from the `Subject` table (apart from the primary keys) and only want `session_uuid` from the `Sesssions table`. So we can write,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "((subjects & 'subject_nickname = \"KS022\"').proj() *\n",
    " (sessions & 'task_protocol LIKE \"%training%\"')).proj('session_uuid')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "Note\n",
    "\n",
    "In the above expression we first used **proj** and then joined the tables using ** * **. We could have also joined the tables first and then used **proj**,\n",
    "\n",
    "```\n",
    " ((subjects & 'subject_nickname = \"KS022\"') * (sessions & 'task_protocol LIKE \"%training%\"')\n",
    ").proj('session_uuid')\n",
    "```\n",
    "\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the `session_uuid` corresponds to the [eID string in ONE](https://int-brain-lab.github.io/ONE/notebooks/experiment_ids.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Up until now we have been inspecting the content of the tables but do not actually have access to the content. This is because we have not actually read the data from the tables into memory yet. For this we would need to use the **fetch** command. Let's fetch the `session uuid` information into a pandas dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eids = ((subjects & 'subject_nickname = \"KS022\"').proj() *\n",
    "        (sessions & 'task_protocol LIKE \"%training%\"').proj('session_uuid')).fetch(format='frame')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eids"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have access to our list of session eIDs, we can get trial information associated with these sessions. The output from the trials dataset is stored in a table called `PyschResults`. We can import this from the `analyses.behaviour` schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ibl_pipeline.analyses import behavior\n",
    "trials = behavior.PsychResults()\n",
    "trials"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's get the trial information for the first day KS022 trained. We will restrict the `Sessions` table by the day 1 eID and combine this with the `trials` table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eid_day1 = dict(session_uuid=eids['session_uuid'][0])\n",
    "trials_day1 = ((sessions & eid_day1).proj() * trials).fetch(format='frame')\n",
    "trials_day1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how a lot of the metrics that we manually computed from the trials dataset in the previous ONE tutorial have already been computed for us and are available in the `trials` table. This is one advantage of Datajoint, common computations such as performance can be computed when the data is ingested and stored in tables.\n",
    "\n",
    "We can find out which visual stimulus contrasts were presented to KS022 on day 1 and how many of each contrast by looking at the `signed_contrasts` and `n_trials_stim` attributes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "contrasts = trials_day1['signed_contrasts'].to_numpy()[0]\n",
    "n_contrasts = trials_day1['n_trials_stim'].to_numpy()[0]\n",
    "print(f\"Visual stimulus contrasts on day 1 = {contrasts * 100}\")\n",
    "print(f\"No. of each contrast on day 1 = {n_contrasts}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can easily extract the performance by typing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Correct = { trials_day1['performance'].to_numpy()[0] * 100} %\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can plot the peformance at each contrast by looking at the `prob_choose_right` attribute. The results stored in Datajoint are already expressed in terms of rightward choice, so we don't need worry about converting any computations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "contrast_performance = trials_day1['prob_choose_right'].to_numpy()[0]\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "plt.plot(contrasts * 100, contrast_performance * 100, 'o-', lw=3, ms=10)\n",
    "plt.ylim([0, 100])\n",
    "plt.xticks([*(contrasts * 100)])\n",
    "plt.xlabel('Signed contrast (%)')\n",
    "plt.ylabel('Rightward choice (%)')\n",
    "\n",
    "print(contrast_performance)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now repeat this for day 15 of training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eid_day15 = dict(session_uuid=eids['session_uuid'][14])\n",
    "trials_day15 = ((sessions & eid_day15).proj() * trials).fetch(format='frame')\n",
    "contrasts = trials_day15['signed_contrasts'].to_numpy()[0]\n",
    "n_contrasts = trials_day15['n_trials_stim'].to_numpy()[0]\n",
    "print(f\"Visual stimulus contrasts on day 1 = {contrasts * 100}\")\n",
    "print(f\"No. of each contrast on day 1 = {n_contrasts}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Correct = { trials_day15['performance'].to_numpy()[0] * 100} %\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "contrast_performance = trials_day15['prob_choose_right'].to_numpy()[0]\n",
    "\n",
    "plt.plot(contrasts * 100, contrast_performance * 100, 'o-', lw=3, ms=10)\n",
    "plt.ylim([0, 100])\n",
    "plt.xticks([*(contrasts * 100)])\n",
    "plt.xlabel('Signed contrast (%)')\n",
    "plt.ylabel('Rightward choice (%)')\n",
    "plt.xticks(rotation=90)\n",
    "\n",
    "print(contrast_performance)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we compare the results with the ONE tutorial, we will find that we have replicated those results using Datajoint, congratulations!\n",
    "\n",
    "You should now be comfortable with the basics of exploring the Datajoint IBL pipeline. More Datajoint tutorials can be found in the [IBL-Pipeline github](https://github.com/int-brain-lab/IBL-pipeline/tree/master/notebooks/notebooks_tutorial) or on the [Datajoint jupyter](https://jupyter.internationalbrainlab.org/)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}