{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas Functions\n",
    "\n",
    "So far, we have looked at the basics of using the programming language, python. As we proceed to learning how python is used in data analysis, let's study some common functions used in pandas library for data manipulation (or data cleaning). It is important to note that this list of methods are not exhastive. Let's look at some commonly used pandas functions.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This code just loads up the data we need for the examples below\n",
    "import sys\n",
    "sys.path.append('..')\n",
    "from src.data import load_data"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. [.drop()](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html?highlight=drop#pandas.DataFrame.drop): In a dataset there can be columns with missing values or columns that are not just useful for your analysis. For example, you might have a New York city taxi dataset containing information (trip id, datetime, trip distance, payment types, passenger count) but want to focus on analyzing trip trends in a day. In this case, the payment types and passenger count are not relevant for our analysis. Having them in our dataset will take up unnecessary space and processing time.\n",
    "\n",
    "When working with pandas we can solve this by using the drop() method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/workspaces/py4a/book/data/taxi_zone_lookup.csv\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Borough</th>\n",
       "      <th>Zone</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LocationID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>EWR</td>\n",
       "      <td>Newark Airport</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Queens</td>\n",
       "      <td>Jamaica Bay</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Bronx</td>\n",
       "      <td>Allerton/Pelham Gardens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Manhattan</td>\n",
       "      <td>Alphabet City</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Staten Island</td>\n",
       "      <td>Arden Heights</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Borough                     Zone\n",
       "LocationID                                        \n",
       "1                     EWR           Newark Airport\n",
       "2                  Queens              Jamaica Bay\n",
       "3                   Bronx  Allerton/Pelham Gardens\n",
       "4               Manhattan            Alphabet City\n",
       "5           Staten Island            Arden Heights"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#index, specifies that we can use the LocationID column in the data as the dataframe index\n",
    "import pandas as pd\n",
    "zone=load_data('taxi_zone_lookup',index_col='LocationID')\n",
    "\n",
    "#Here, we are dropping all the columns that are not needed for our analysis\n",
    "#axis=1 identifies the column\n",
    "#inplace=True does not return a new table but retains the existing table\n",
    "zone.drop(['service_zone'], axis=1, inplace=True)\n",
    "zone.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. [`.fillna()`][fillna]: Sometimes, instead of dropping missing values, we might want to replace cells with N/A with some value. To achieve this we use the [.fillna()][def] method. [replace()][replace] and [interpolate()][interpolate] are also used in replacing missing values.\n",
    "\n",
    "[fillna]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html?highlight=fillna#pandas.DataFrame.fillna\n",
    "[replace]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html?highlight=replace#pandas.DataFrame.replace\n",
    "[interpolate]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html?highlight=interpolate#pandas.DataFrame.interpolate"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. [.loc[]][loc_def]: This is an indexing method that is used to simply select particular rows or columns. It selects columns or rows using label names. [.loc[]][loc_def] parameter accepts strings, integers, or list. Another indexing method is [.iloc[]][iloc_def] which selects columns or rows using purely integer-location based indexing for selection by positions. For example, let's select the Borough columns with some rows from the New York City taxi zone data using loc().\n",
    "\n",
    "[loc_def]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html?highlight=iloc#pandas.DataFrame.iloc\n",
    "[iloc_def]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html?highlight=loc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Borough</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LocationID</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Bronx</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Staten Island</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Queens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Queens</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Borough\n",
       "LocationID               \n",
       "3                   Bronx\n",
       "6           Staten Island\n",
       "9                  Queens\n",
       "15                 Queens"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#displays the column \"Borough\", but only with the rows 3,6,9,15\n",
    "zone.loc[[3,6,9,15],['Borough']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Borough</th>\n",
       "      <th>Taxi Zone</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LocationID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Manhattan</td>\n",
       "      <td>Alphabet City</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Staten Island</td>\n",
       "      <td>Arden Heights</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Staten Island</td>\n",
       "      <td>Arrochar/Fort Wadsworth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Queens</td>\n",
       "      <td>Astoria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Queens</td>\n",
       "      <td>Astoria Park</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Queens</td>\n",
       "      <td>Auburndale</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Queens</td>\n",
       "      <td>Baisley Park</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Borough                Taxi Zone\n",
       "LocationID                                        \n",
       "4               Manhattan            Alphabet City\n",
       "5           Staten Island            Arden Heights\n",
       "6           Staten Island  Arrochar/Fort Wadsworth\n",
       "7                  Queens                  Astoria\n",
       "8                  Queens             Astoria Park\n",
       "9                  Queens               Auburndale\n",
       "10                 Queens             Baisley Park"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# display all the data for the rows 3-10\n",
    "zone[3:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LocationID\n",
       "1                EWR\n",
       "2             Queens\n",
       "3              Bronx\n",
       "4          Manhattan\n",
       "5      Staten Island\n",
       "           ...      \n",
       "261        Manhattan\n",
       "262        Manhattan\n",
       "263        Manhattan\n",
       "264          Unknown\n",
       "265          Unknown\n",
       "Name: Borough, Length: 265, dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# finally just display the Borough column\n",
    "zone['Borough']"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. [.rename()][rename_def]: Sometimes, column names are written in a way that it's not easy to understand. In such case, we need to rename such column names. The method [.rename()][rename_def] takes in couple parameters one of wihich is a dictionary-like object. This dictionary-like object contains a mapping of the old names and new names.\n",
    "\n",
    "[rename_def]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html?highlight=rename#pandas.DataFrame.rename"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Borough</th>\n",
       "      <th>Taxi Zone</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LocationID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>EWR</td>\n",
       "      <td>Newark Airport</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Queens</td>\n",
       "      <td>Jamaica Bay</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Bronx</td>\n",
       "      <td>Allerton/Pelham Gardens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Manhattan</td>\n",
       "      <td>Alphabet City</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Staten Island</td>\n",
       "      <td>Arden Heights</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Borough                Taxi Zone\n",
       "LocationID                                        \n",
       "1                     EWR           Newark Airport\n",
       "2                  Queens              Jamaica Bay\n",
       "3                   Bronx  Allerton/Pelham Gardens\n",
       "4               Manhattan            Alphabet City\n",
       "5           Staten Island            Arden Heights"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Rename the Zone column with Taxi Zone\n",
    "zone.rename(columns={'Zone':'Taxi Zone'}, inplace=True)\n",
    "zone.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5. `.columns`: While not technically a pandas function, this property is helpful when working with a large dataset with so many columns, you want to see all the columns in that dataset. `.columns` displays all the columns in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/workspaces/py4a/book/data/CharlesBookClub.csv\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Seq#</th>\n",
       "      <th>ID#</th>\n",
       "      <th>Gender</th>\n",
       "      <th>M</th>\n",
       "      <th>R</th>\n",
       "      <th>F</th>\n",
       "      <th>FirstPurch</th>\n",
       "      <th>ChildBks</th>\n",
       "      <th>YouthBks</th>\n",
       "      <th>CookBks</th>\n",
       "      <th>...</th>\n",
       "      <th>ItalCook</th>\n",
       "      <th>ItalAtlas</th>\n",
       "      <th>ItalArt</th>\n",
       "      <th>Florence</th>\n",
       "      <th>Related Purchase</th>\n",
       "      <th>Mcode</th>\n",
       "      <th>Rcode</th>\n",
       "      <th>Fcode</th>\n",
       "      <th>Yes_Florence</th>\n",
       "      <th>No_Florence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>1</td>\n",
       "      <td>297</td>\n",
       "      <td>14</td>\n",
       "      <td>2</td>\n",
       "      <td>22</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>29</td>\n",
       "      <td>0</td>\n",
       "      <td>128</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>46</td>\n",
       "      <td>1</td>\n",
       "      <td>138</td>\n",
       "      <td>22</td>\n",
       "      <td>7</td>\n",
       "      <td>56</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>47</td>\n",
       "      <td>1</td>\n",
       "      <td>228</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>51</td>\n",
       "      <td>1</td>\n",
       "      <td>257</td>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Seq#  ID#  Gender    M   R  F  FirstPurch  ChildBks  YouthBks  CookBks  \\\n",
       "0     1   25       1  297  14  2          22         0         1        1   \n",
       "1     2   29       0  128   8  2          10         0         0        0   \n",
       "2     3   46       1  138  22  7          56         2         1        2   \n",
       "3     4   47       1  228   2  1           2         0         0        0   \n",
       "4     5   51       1  257  10  1          10         0         0        0   \n",
       "\n",
       "   ...  ItalCook  ItalAtlas  ItalArt  Florence  Related Purchase  Mcode  \\\n",
       "0  ...         0          0        0         0                 0      5   \n",
       "1  ...         0          0        0         0                 0      4   \n",
       "2  ...         1          0        0         0                 2      4   \n",
       "3  ...         0          0        0         0                 0      5   \n",
       "4  ...         0          0        0         0                 0      5   \n",
       "\n",
       "   Rcode  Fcode  Yes_Florence  No_Florence  \n",
       "0      4      2             0            1  \n",
       "1      3      2             0            1  \n",
       "2      4      3             0            1  \n",
       "3      1      1             0            1  \n",
       "4      3      1             0            1  \n",
       "\n",
       "[5 rows x 24 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "books_df = load_data('CharlesBookClub')\n",
    "books_df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice here how their is a `...` between `Cookbks` and `ItalCook`?  This shows that not all 24 columns can be displayed in the notebook at once.  This can lead us to believe that there are only 20 columns.  Using the `.columns` property gives us a [pandas series][series], (which we can think of as a one column of a dataframe).\n",
    "\n",
    "[series]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html?highlight=series#pandas.Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index(['Seq#', 'ID#', 'Gender', 'M', 'R', 'F', 'FirstPurch', 'ChildBks',\n",
      "       'YouthBks', 'CookBks', 'DoItYBks', 'RefBks', 'ArtBks', 'GeogBks',\n",
      "       'ItalCook', 'ItalAtlas', 'ItalArt', 'Florence', 'Related Purchase',\n",
      "       'Mcode', 'Rcode', 'Fcode', 'Yes_Florence', 'No_Florence'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "#this prints all the columns in the bankruptcy dataset\n",
    "print(books_df.columns)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "6. `len()`: (Not strictly a pandas fuction).  Provides the length (number of rows) of the dataframe. For example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4000"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(books_df)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "7. `.dtypes`: Because it is useful to know the data type of our variables before analysis.  This function will show what kind of data is expected in each column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Seq#                int64\n",
      "ID#                 int64\n",
      "Gender              int64\n",
      "M                   int64\n",
      "R                   int64\n",
      "F                   int64\n",
      "FirstPurch          int64\n",
      "ChildBks            int64\n",
      "YouthBks            int64\n",
      "CookBks             int64\n",
      "DoItYBks            int64\n",
      "RefBks              int64\n",
      "ArtBks              int64\n",
      "GeogBks             int64\n",
      "ItalCook            int64\n",
      "ItalAtlas           int64\n",
      "ItalArt             int64\n",
      "Florence            int64\n",
      "Related Purchase    int64\n",
      "Mcode               int64\n",
      "Rcode               int64\n",
      "Fcode               int64\n",
      "Yes_Florence        int64\n",
      "No_Florence         int64\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "# Print the datatypes of the columns in the books dataframe\n",
    "print(books_df.dtypes)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again the list of pandas functions that are useful in data manipulation is non-exhastive. You can find more pandas functions [here](https://pandas.pydata.org/docs/reference/frame.html)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}