{
"cells": [
{
"cell_type": "markdown",
"id": "5bd00677-eab0-45c0-999b-a89aa239098e",
"metadata": {},
"source": [
"# Manipulating census data with pandas\n",
"## ZCTA5 - Age Data\n",
"\n",
"For my project, I choose the Zip Code dataset. I found the dataset from the Census Bureau [age dataset](https://data.census.gov/) \n",
"\n",
"For the raw data extraction select the above census bureau link and apply the filter 'All 5-digit zip code'. Next, select the code 'S0101AGE AND SEX' from the vast number of available datasets. The data is available from 2010-2020. In this project we have considered the last 4 years 2017-2020. Download the data by selecting the years. Once download is complete unzip the file and there are three files under the zip. First, the metadata file which has all the information about the columns. Second, table notes which describes about the table and finally the actual data which is a csv file. Inside the csv file there are 33,146 records and 434 columns. For this proect we are selecting the subset of columns which cosniders the total population across the age groups."
]
},
{
"cell_type": "markdown",
"id": "8c22c86b-e20a-4255-9b9d-28e4181f7cd1",
"metadata": {
"tags": []
},
"source": [
"Below are the steps I have followed to transform and load the dataset to the sql database:\n",
"\n",
"\n",
"Import the necessary libraries required:\n",
"1) Pandas - It presents a diverse range of utilities, ranging from parsing multiple file formats to converting an entire data table into a NumPy matrix array\n",
"2) Numpy - NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.\n",
"3) Sys - The sys module provides various functions and variables that are used to manipulate different parts of the Python runtime environment."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "45db09e5-87c4-4035-b186-2dbdac5706db",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import sys"
]
},
{
"cell_type": "markdown",
"id": "f8b6084e-fa9e-41b2-92d7-baaee56298ca",
"metadata": {},
"source": [
"\n",
"Load the csv file as dataframe using read_csv"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bed52702-df0e-414e-b84b-ce1e363800fb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Geography | \n",
" Geographic Area Name | \n",
" Estimate!!Total!!Total population | \n",
" Annotation of Estimate!!Total!!Total population | \n",
" Margin of Error!!Total MOE!!Total population | \n",
" Annotation of Margin of Error!!Total MOE!!Total population | \n",
" Estimate!!Total!!Total population!!AGE!!Under 5 years | \n",
" Annotation of Estimate!!Total!!Total population!!AGE!!Under 5 years | \n",
" Margin of Error!!Total MOE!!Total population!!AGE!!Under 5 years | \n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!Under 5 years | \n",
" ... | \n",
" Margin of Error!!Total MOE!!Total population!!AGE!!75 to 79 years | \n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!75 to 79 years | \n",
" Estimate!!Total!!Total population!!AGE!!80 to 84 years | \n",
" Annotation of Estimate!!Total!!Total population!!AGE!!80 to 84 years | \n",
" Margin of Error!!Total MOE!!Total population!!AGE!!80 to 84 years | \n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!80 to 84 years | \n",
" Estimate!!Total!!Total population!!AGE!!85 years and over | \n",
" Margin of Error!!Total MOE!!Total population!!AGE!!85 years and over | \n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!85 years and over | \n",
" Annotation of Estimate!!Total!!Total population!!AGE!!85 years and over | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 8600000US00601 | \n",
" ZCTA5 00601 | \n",
" 17599 | \n",
" NaN | \n",
" 261 | \n",
" NaN | \n",
" 938 | \n",
" NaN | \n",
" 40 | \n",
" NaN | \n",
" ... | \n",
" 96 | \n",
" NaN | \n",
" 349 | \n",
" NaN | \n",
" 83 | \n",
" NaN | \n",
" 328 | \n",
" 95 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 8600000US00602 | \n",
" ZCTA5 00602 | \n",
" 39209 | \n",
" NaN | \n",
" 128 | \n",
" NaN | \n",
" 1794 | \n",
" NaN | \n",
" 75 | \n",
" NaN | \n",
" ... | \n",
" 159 | \n",
" NaN | \n",
" 789 | \n",
" NaN | \n",
" 142 | \n",
" NaN | \n",
" 648 | \n",
" 164 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 8600000US00603 | \n",
" ZCTA5 00603 | \n",
" 50135 | \n",
" NaN | \n",
" 805 | \n",
" NaN | \n",
" 2469 | \n",
" NaN | \n",
" 92 | \n",
" NaN | \n",
" ... | \n",
" 245 | \n",
" NaN | \n",
" 1315 | \n",
" NaN | \n",
" 229 | \n",
" NaN | \n",
" 727 | \n",
" 158 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 8600000US00606 | \n",
" ZCTA5 00606 | \n",
" 6304 | \n",
" NaN | \n",
" 255 | \n",
" NaN | \n",
" 312 | \n",
" NaN | \n",
" 45 | \n",
" NaN | \n",
" ... | \n",
" 64 | \n",
" NaN | \n",
" 74 | \n",
" NaN | \n",
" 48 | \n",
" NaN | \n",
" 174 | \n",
" 87 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 8600000US00610 | \n",
" ZCTA5 00610 | \n",
" 27590 | \n",
" NaN | \n",
" 163 | \n",
" NaN | \n",
" 1256 | \n",
" NaN | \n",
" 4 | \n",
" NaN | \n",
" ... | \n",
" 146 | \n",
" NaN | \n",
" 576 | \n",
" NaN | \n",
" 112 | \n",
" NaN | \n",
" 365 | \n",
" 115 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 5 | \n",
" 8600000US00612 | \n",
" ZCTA5 00612 | \n",
" 62566 | \n",
" NaN | \n",
" 1480 | \n",
" NaN | \n",
" 3260 | \n",
" NaN | \n",
" 241 | \n",
" NaN | \n",
" ... | \n",
" 319 | \n",
" NaN | \n",
" 1433 | \n",
" NaN | \n",
" 238 | \n",
" NaN | \n",
" 1578 | \n",
" 271 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 6 | \n",
" 8600000US00616 | \n",
" ZCTA5 00616 | \n",
" 10687 | \n",
" NaN | \n",
" 1192 | \n",
" NaN | \n",
" 469 | \n",
" NaN | \n",
" 216 | \n",
" NaN | \n",
" ... | \n",
" 106 | \n",
" NaN | \n",
" 326 | \n",
" NaN | \n",
" 160 | \n",
" NaN | \n",
" 320 | \n",
" 138 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 7 | \n",
" 8600000US00617 | \n",
" ZCTA5 00617 | \n",
" 24508 | \n",
" NaN | \n",
" 296 | \n",
" NaN | \n",
" 1340 | \n",
" NaN | \n",
" 25 | \n",
" NaN | \n",
" ... | \n",
" 188 | \n",
" NaN | \n",
" 450 | \n",
" NaN | \n",
" 159 | \n",
" NaN | \n",
" 503 | \n",
" 177 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 8 | \n",
" 8600000US00622 | \n",
" ZCTA5 00622 | \n",
" 7405 | \n",
" NaN | \n",
" 1118 | \n",
" NaN | \n",
" 224 | \n",
" NaN | \n",
" 140 | \n",
" NaN | \n",
" ... | \n",
" 150 | \n",
" NaN | \n",
" 334 | \n",
" NaN | \n",
" 151 | \n",
" NaN | \n",
" 241 | \n",
" 129 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 9 | \n",
" 8600000US00623 | \n",
" ZCTA5 00623 | \n",
" 42321 | \n",
" NaN | \n",
" 1118 | \n",
" NaN | \n",
" 1945 | \n",
" NaN | \n",
" 140 | \n",
" NaN | \n",
" ... | \n",
" 266 | \n",
" NaN | \n",
" 1184 | \n",
" NaN | \n",
" 213 | \n",
" NaN | \n",
" 737 | \n",
" 189 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
10 rows × 78 columns
\n",
"
"
],
"text/plain": [
" Geography Geographic Area Name Estimate!!Total!!Total population \\\n",
"0 8600000US00601 ZCTA5 00601 17599 \n",
"1 8600000US00602 ZCTA5 00602 39209 \n",
"2 8600000US00603 ZCTA5 00603 50135 \n",
"3 8600000US00606 ZCTA5 00606 6304 \n",
"4 8600000US00610 ZCTA5 00610 27590 \n",
"5 8600000US00612 ZCTA5 00612 62566 \n",
"6 8600000US00616 ZCTA5 00616 10687 \n",
"7 8600000US00617 ZCTA5 00617 24508 \n",
"8 8600000US00622 ZCTA5 00622 7405 \n",
"9 8600000US00623 ZCTA5 00623 42321 \n",
"\n",
" Annotation of Estimate!!Total!!Total population \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" Margin of Error!!Total MOE!!Total population \\\n",
"0 261 \n",
"1 128 \n",
"2 805 \n",
"3 255 \n",
"4 163 \n",
"5 1480 \n",
"6 1192 \n",
"7 296 \n",
"8 1118 \n",
"9 1118 \n",
"\n",
" Annotation of Margin of Error!!Total MOE!!Total population \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!Under 5 years \\\n",
"0 938 \n",
"1 1794 \n",
"2 2469 \n",
"3 312 \n",
"4 1256 \n",
"5 3260 \n",
"6 469 \n",
"7 1340 \n",
"8 224 \n",
"9 1945 \n",
"\n",
" Annotation of Estimate!!Total!!Total population!!AGE!!Under 5 years \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" Margin of Error!!Total MOE!!Total population!!AGE!!Under 5 years \\\n",
"0 40 \n",
"1 75 \n",
"2 92 \n",
"3 45 \n",
"4 4 \n",
"5 241 \n",
"6 216 \n",
"7 25 \n",
"8 140 \n",
"9 140 \n",
"\n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!Under 5 years \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" ... Margin of Error!!Total MOE!!Total population!!AGE!!75 to 79 years \\\n",
"0 ... 96 \n",
"1 ... 159 \n",
"2 ... 245 \n",
"3 ... 64 \n",
"4 ... 146 \n",
"5 ... 319 \n",
"6 ... 106 \n",
"7 ... 188 \n",
"8 ... 150 \n",
"9 ... 266 \n",
"\n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!75 to 79 years \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!80 to 84 years \\\n",
"0 349 \n",
"1 789 \n",
"2 1315 \n",
"3 74 \n",
"4 576 \n",
"5 1433 \n",
"6 326 \n",
"7 450 \n",
"8 334 \n",
"9 1184 \n",
"\n",
" Annotation of Estimate!!Total!!Total population!!AGE!!80 to 84 years \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" Margin of Error!!Total MOE!!Total population!!AGE!!80 to 84 years \\\n",
"0 83 \n",
"1 142 \n",
"2 229 \n",
"3 48 \n",
"4 112 \n",
"5 238 \n",
"6 160 \n",
"7 159 \n",
"8 151 \n",
"9 213 \n",
"\n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!80 to 84 years \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!85 years and over \\\n",
"0 328 \n",
"1 648 \n",
"2 727 \n",
"3 174 \n",
"4 365 \n",
"5 1578 \n",
"6 320 \n",
"7 503 \n",
"8 241 \n",
"9 737 \n",
"\n",
" Margin of Error!!Total MOE!!Total population!!AGE!!85 years and over \\\n",
"0 95 \n",
"1 164 \n",
"2 158 \n",
"3 87 \n",
"4 115 \n",
"5 271 \n",
"6 138 \n",
"7 177 \n",
"8 129 \n",
"9 189 \n",
"\n",
" Annotation of Margin of Error!!Total MOE!!Total population!!AGE!!85 years and over \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
" Annotation of Estimate!!Total!!Total population!!AGE!!85 years and over \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 NaN \n",
"6 NaN \n",
"7 NaN \n",
"8 NaN \n",
"9 NaN \n",
"\n",
"[10 rows x 78 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('contrib_data/Age_Data_Zip.csv') #read the file\n",
"df.head(10) #print first 10 records"
]
},
{
"cell_type": "markdown",
"id": "e790cfad-2443-46a9-9287-94458585a2e6",
"metadata": {},
"source": [
"\n",
"Select the necessary columns from the dataset using iloc -> Integer-location based indexing for selection by position."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b9e485fb-697f-4423-a292-dd83947f94bb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Geographic Area Name | \n",
" Estimate!!Total!!Total population | \n",
" Estimate!!Total!!Total population!!AGE!!Under 5 years | \n",
" Estimate!!Total!!Total population!!AGE!!5 to 9 years | \n",
" Estimate!!Total!!Total population!!AGE!!10 to 14 years | \n",
" Estimate!!Total!!Total population!!AGE!!15 to 19 years | \n",
" Estimate!!Total!!Total population!!AGE!!20 to 24 years | \n",
" Estimate!!Total!!Total population!!AGE!!25 to 29 years | \n",
" Estimate!!Total!!Total population!!AGE!!30 to 34 years | \n",
" Estimate!!Total!!Total population!!AGE!!35 to 39 years | \n",
" Estimate!!Total!!Total population!!AGE!!40 to 44 years | \n",
" Estimate!!Total!!Total population!!AGE!!45 to 49 years | \n",
" Estimate!!Total!!Total population!!AGE!!50 to 54 years | \n",
" Estimate!!Total!!Total population!!AGE!!55 to 59 years | \n",
" Estimate!!Total!!Total population!!AGE!!60 to 64 years | \n",
" Estimate!!Total!!Total population!!AGE!!65 to 69 years | \n",
" Estimate!!Total!!Total population!!AGE!!70 to 74 years | \n",
" Estimate!!Total!!Total population!!AGE!!75 to 79 years | \n",
" Estimate!!Total!!Total population!!AGE!!80 to 84 years | \n",
" Estimate!!Total!!Total population!!AGE!!85 years and over | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ZCTA5 00601 | \n",
" 17599 | \n",
" 938 | \n",
" 1046 | \n",
" 1278 | \n",
" 1250 | \n",
" 1218 | \n",
" 1148 | \n",
" 1073 | \n",
" 1121 | \n",
" 918 | \n",
" 1147 | \n",
" 1257 | \n",
" 1215 | \n",
" 1093 | \n",
" 1080 | \n",
" 616 | \n",
" 524 | \n",
" 349 | \n",
" 328 | \n",
"
\n",
" \n",
" 1 | \n",
" ZCTA5 00602 | \n",
" 39209 | \n",
" 1794 | \n",
" 2320 | \n",
" 2113 | \n",
" 2706 | \n",
" 2751 | \n",
" 2537 | \n",
" 2350 | \n",
" 2715 | \n",
" 2227 | \n",
" 2749 | \n",
" 2942 | \n",
" 2833 | \n",
" 2810 | \n",
" 2181 | \n",
" 1753 | \n",
" 991 | \n",
" 789 | \n",
" 648 | \n",
"
\n",
" \n",
" 2 | \n",
" ZCTA5 00603 | \n",
" 50135 | \n",
" 2469 | \n",
" 2778 | \n",
" 3260 | \n",
" 3443 | \n",
" 3194 | \n",
" 3264 | \n",
" 3021 | \n",
" 3296 | \n",
" 3099 | \n",
" 3070 | \n",
" 3336 | \n",
" 2982 | \n",
" 3323 | \n",
" 3306 | \n",
" 2489 | \n",
" 1763 | \n",
" 1315 | \n",
" 727 | \n",
"
\n",
" \n",
" 3 | \n",
" ZCTA5 00606 | \n",
" 6304 | \n",
" 312 | \n",
" 304 | \n",
" 420 | \n",
" 479 | \n",
" 443 | \n",
" 302 | \n",
" 329 | \n",
" 279 | \n",
" 557 | \n",
" 457 | \n",
" 483 | \n",
" 317 | \n",
" 555 | \n",
" 345 | \n",
" 303 | \n",
" 171 | \n",
" 74 | \n",
" 174 | \n",
"
\n",
" \n",
" 4 | \n",
" ZCTA5 00610 | \n",
" 27590 | \n",
" 1256 | \n",
" 1346 | \n",
" 1895 | \n",
" 1939 | \n",
" 1924 | \n",
" 1624 | \n",
" 1575 | \n",
" 1638 | \n",
" 1944 | \n",
" 1937 | \n",
" 1910 | \n",
" 1748 | \n",
" 1744 | \n",
" 1629 | \n",
" 1400 | \n",
" 1140 | \n",
" 576 | \n",
" 365 | \n",
"
\n",
" \n",
" 5 | \n",
" ZCTA5 00612 | \n",
" 62566 | \n",
" 3260 | \n",
" 3077 | \n",
" 4249 | \n",
" 4397 | \n",
" 4346 | \n",
" 3621 | \n",
" 3586 | \n",
" 3550 | \n",
" 4225 | \n",
" 4312 | \n",
" 4021 | \n",
" 3607 | \n",
" 3903 | \n",
" 3812 | \n",
" 3038 | \n",
" 2551 | \n",
" 1433 | \n",
" 1578 | \n",
"
\n",
" \n",
" 6 | \n",
" ZCTA5 00616 | \n",
" 10687 | \n",
" 469 | \n",
" 417 | \n",
" 405 | \n",
" 678 | \n",
" 901 | \n",
" 784 | \n",
" 574 | \n",
" 661 | \n",
" 687 | \n",
" 877 | \n",
" 787 | \n",
" 558 | \n",
" 533 | \n",
" 753 | \n",
" 712 | \n",
" 245 | \n",
" 326 | \n",
" 320 | \n",
"
\n",
" \n",
" 7 | \n",
" ZCTA5 00617 | \n",
" 24508 | \n",
" 1340 | \n",
" 1401 | \n",
" 1702 | \n",
" 1686 | \n",
" 1709 | \n",
" 1607 | \n",
" 1570 | \n",
" 1719 | \n",
" 1766 | \n",
" 1584 | \n",
" 1434 | \n",
" 1195 | \n",
" 1521 | \n",
" 1390 | \n",
" 1117 | \n",
" 814 | \n",
" 450 | \n",
" 503 | \n",
"
\n",
" \n",
" 8 | \n",
" ZCTA5 00622 | \n",
" 7405 | \n",
" 224 | \n",
" 296 | \n",
" 489 | \n",
" 477 | \n",
" 428 | \n",
" 243 | \n",
" 352 | \n",
" 382 | \n",
" 436 | \n",
" 477 | \n",
" 579 | \n",
" 604 | \n",
" 365 | \n",
" 519 | \n",
" 588 | \n",
" 371 | \n",
" 334 | \n",
" 241 | \n",
"
\n",
" \n",
" 9 | \n",
" ZCTA5 00623 | \n",
" 42321 | \n",
" 1945 | \n",
" 2599 | \n",
" 2457 | \n",
" 2906 | \n",
" 2795 | \n",
" 2528 | \n",
" 2279 | \n",
" 2790 | \n",
" 3075 | \n",
" 3050 | \n",
" 2580 | \n",
" 2298 | \n",
" 2596 | \n",
" 2652 | \n",
" 2073 | \n",
" 1777 | \n",
" 1184 | \n",
" 737 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Geographic Area Name Estimate!!Total!!Total population \\\n",
"0 ZCTA5 00601 17599 \n",
"1 ZCTA5 00602 39209 \n",
"2 ZCTA5 00603 50135 \n",
"3 ZCTA5 00606 6304 \n",
"4 ZCTA5 00610 27590 \n",
"5 ZCTA5 00612 62566 \n",
"6 ZCTA5 00616 10687 \n",
"7 ZCTA5 00617 24508 \n",
"8 ZCTA5 00622 7405 \n",
"9 ZCTA5 00623 42321 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!Under 5 years \\\n",
"0 938 \n",
"1 1794 \n",
"2 2469 \n",
"3 312 \n",
"4 1256 \n",
"5 3260 \n",
"6 469 \n",
"7 1340 \n",
"8 224 \n",
"9 1945 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!5 to 9 years \\\n",
"0 1046 \n",
"1 2320 \n",
"2 2778 \n",
"3 304 \n",
"4 1346 \n",
"5 3077 \n",
"6 417 \n",
"7 1401 \n",
"8 296 \n",
"9 2599 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!10 to 14 years \\\n",
"0 1278 \n",
"1 2113 \n",
"2 3260 \n",
"3 420 \n",
"4 1895 \n",
"5 4249 \n",
"6 405 \n",
"7 1702 \n",
"8 489 \n",
"9 2457 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!15 to 19 years \\\n",
"0 1250 \n",
"1 2706 \n",
"2 3443 \n",
"3 479 \n",
"4 1939 \n",
"5 4397 \n",
"6 678 \n",
"7 1686 \n",
"8 477 \n",
"9 2906 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!20 to 24 years \\\n",
"0 1218 \n",
"1 2751 \n",
"2 3194 \n",
"3 443 \n",
"4 1924 \n",
"5 4346 \n",
"6 901 \n",
"7 1709 \n",
"8 428 \n",
"9 2795 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!25 to 29 years \\\n",
"0 1148 \n",
"1 2537 \n",
"2 3264 \n",
"3 302 \n",
"4 1624 \n",
"5 3621 \n",
"6 784 \n",
"7 1607 \n",
"8 243 \n",
"9 2528 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!30 to 34 years \\\n",
"0 1073 \n",
"1 2350 \n",
"2 3021 \n",
"3 329 \n",
"4 1575 \n",
"5 3586 \n",
"6 574 \n",
"7 1570 \n",
"8 352 \n",
"9 2279 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!35 to 39 years \\\n",
"0 1121 \n",
"1 2715 \n",
"2 3296 \n",
"3 279 \n",
"4 1638 \n",
"5 3550 \n",
"6 661 \n",
"7 1719 \n",
"8 382 \n",
"9 2790 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!40 to 44 years \\\n",
"0 918 \n",
"1 2227 \n",
"2 3099 \n",
"3 557 \n",
"4 1944 \n",
"5 4225 \n",
"6 687 \n",
"7 1766 \n",
"8 436 \n",
"9 3075 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!45 to 49 years \\\n",
"0 1147 \n",
"1 2749 \n",
"2 3070 \n",
"3 457 \n",
"4 1937 \n",
"5 4312 \n",
"6 877 \n",
"7 1584 \n",
"8 477 \n",
"9 3050 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!50 to 54 years \\\n",
"0 1257 \n",
"1 2942 \n",
"2 3336 \n",
"3 483 \n",
"4 1910 \n",
"5 4021 \n",
"6 787 \n",
"7 1434 \n",
"8 579 \n",
"9 2580 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!55 to 59 years \\\n",
"0 1215 \n",
"1 2833 \n",
"2 2982 \n",
"3 317 \n",
"4 1748 \n",
"5 3607 \n",
"6 558 \n",
"7 1195 \n",
"8 604 \n",
"9 2298 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!60 to 64 years \\\n",
"0 1093 \n",
"1 2810 \n",
"2 3323 \n",
"3 555 \n",
"4 1744 \n",
"5 3903 \n",
"6 533 \n",
"7 1521 \n",
"8 365 \n",
"9 2596 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!65 to 69 years \\\n",
"0 1080 \n",
"1 2181 \n",
"2 3306 \n",
"3 345 \n",
"4 1629 \n",
"5 3812 \n",
"6 753 \n",
"7 1390 \n",
"8 519 \n",
"9 2652 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!70 to 74 years \\\n",
"0 616 \n",
"1 1753 \n",
"2 2489 \n",
"3 303 \n",
"4 1400 \n",
"5 3038 \n",
"6 712 \n",
"7 1117 \n",
"8 588 \n",
"9 2073 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!75 to 79 years \\\n",
"0 524 \n",
"1 991 \n",
"2 1763 \n",
"3 171 \n",
"4 1140 \n",
"5 2551 \n",
"6 245 \n",
"7 814 \n",
"8 371 \n",
"9 1777 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!80 to 84 years \\\n",
"0 349 \n",
"1 789 \n",
"2 1315 \n",
"3 74 \n",
"4 576 \n",
"5 1433 \n",
"6 326 \n",
"7 450 \n",
"8 334 \n",
"9 1184 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!85 years and over \n",
"0 328 \n",
"1 648 \n",
"2 727 \n",
"3 174 \n",
"4 365 \n",
"5 1578 \n",
"6 320 \n",
"7 503 \n",
"8 241 \n",
"9 737 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1 = df.iloc[:,[1,2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74]] \n",
"\n",
"df1.head(10) #print first 10 records"
]
},
{
"cell_type": "markdown",
"id": "2dbf366d-6d5e-4478-9ee9-0153bc3433c5",
"metadata": {},
"source": [
"Replace the column values which has 'ZCTA5 00001' to 00001 using replace function."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ee0cf0b2-3e67-4bfc-a985-b29231230d05",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Geographic Area Name | \n",
" Estimate!!Total!!Total population | \n",
" Estimate!!Total!!Total population!!AGE!!Under 5 years | \n",
" Estimate!!Total!!Total population!!AGE!!5 to 9 years | \n",
" Estimate!!Total!!Total population!!AGE!!10 to 14 years | \n",
" Estimate!!Total!!Total population!!AGE!!15 to 19 years | \n",
" Estimate!!Total!!Total population!!AGE!!20 to 24 years | \n",
" Estimate!!Total!!Total population!!AGE!!25 to 29 years | \n",
" Estimate!!Total!!Total population!!AGE!!30 to 34 years | \n",
" Estimate!!Total!!Total population!!AGE!!35 to 39 years | \n",
" Estimate!!Total!!Total population!!AGE!!40 to 44 years | \n",
" Estimate!!Total!!Total population!!AGE!!45 to 49 years | \n",
" Estimate!!Total!!Total population!!AGE!!50 to 54 years | \n",
" Estimate!!Total!!Total population!!AGE!!55 to 59 years | \n",
" Estimate!!Total!!Total population!!AGE!!60 to 64 years | \n",
" Estimate!!Total!!Total population!!AGE!!65 to 69 years | \n",
" Estimate!!Total!!Total population!!AGE!!70 to 74 years | \n",
" Estimate!!Total!!Total population!!AGE!!75 to 79 years | \n",
" Estimate!!Total!!Total population!!AGE!!80 to 84 years | \n",
" Estimate!!Total!!Total population!!AGE!!85 years and over | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 00601 | \n",
" 17599 | \n",
" 938 | \n",
" 1046 | \n",
" 1278 | \n",
" 1250 | \n",
" 1218 | \n",
" 1148 | \n",
" 1073 | \n",
" 1121 | \n",
" 918 | \n",
" 1147 | \n",
" 1257 | \n",
" 1215 | \n",
" 1093 | \n",
" 1080 | \n",
" 616 | \n",
" 524 | \n",
" 349 | \n",
" 328 | \n",
"
\n",
" \n",
" 1 | \n",
" 00602 | \n",
" 39209 | \n",
" 1794 | \n",
" 2320 | \n",
" 2113 | \n",
" 2706 | \n",
" 2751 | \n",
" 2537 | \n",
" 2350 | \n",
" 2715 | \n",
" 2227 | \n",
" 2749 | \n",
" 2942 | \n",
" 2833 | \n",
" 2810 | \n",
" 2181 | \n",
" 1753 | \n",
" 991 | \n",
" 789 | \n",
" 648 | \n",
"
\n",
" \n",
" 2 | \n",
" 00603 | \n",
" 50135 | \n",
" 2469 | \n",
" 2778 | \n",
" 3260 | \n",
" 3443 | \n",
" 3194 | \n",
" 3264 | \n",
" 3021 | \n",
" 3296 | \n",
" 3099 | \n",
" 3070 | \n",
" 3336 | \n",
" 2982 | \n",
" 3323 | \n",
" 3306 | \n",
" 2489 | \n",
" 1763 | \n",
" 1315 | \n",
" 727 | \n",
"
\n",
" \n",
" 3 | \n",
" 00606 | \n",
" 6304 | \n",
" 312 | \n",
" 304 | \n",
" 420 | \n",
" 479 | \n",
" 443 | \n",
" 302 | \n",
" 329 | \n",
" 279 | \n",
" 557 | \n",
" 457 | \n",
" 483 | \n",
" 317 | \n",
" 555 | \n",
" 345 | \n",
" 303 | \n",
" 171 | \n",
" 74 | \n",
" 174 | \n",
"
\n",
" \n",
" 4 | \n",
" 00610 | \n",
" 27590 | \n",
" 1256 | \n",
" 1346 | \n",
" 1895 | \n",
" 1939 | \n",
" 1924 | \n",
" 1624 | \n",
" 1575 | \n",
" 1638 | \n",
" 1944 | \n",
" 1937 | \n",
" 1910 | \n",
" 1748 | \n",
" 1744 | \n",
" 1629 | \n",
" 1400 | \n",
" 1140 | \n",
" 576 | \n",
" 365 | \n",
"
\n",
" \n",
" 5 | \n",
" 00612 | \n",
" 62566 | \n",
" 3260 | \n",
" 3077 | \n",
" 4249 | \n",
" 4397 | \n",
" 4346 | \n",
" 3621 | \n",
" 3586 | \n",
" 3550 | \n",
" 4225 | \n",
" 4312 | \n",
" 4021 | \n",
" 3607 | \n",
" 3903 | \n",
" 3812 | \n",
" 3038 | \n",
" 2551 | \n",
" 1433 | \n",
" 1578 | \n",
"
\n",
" \n",
" 6 | \n",
" 00616 | \n",
" 10687 | \n",
" 469 | \n",
" 417 | \n",
" 405 | \n",
" 678 | \n",
" 901 | \n",
" 784 | \n",
" 574 | \n",
" 661 | \n",
" 687 | \n",
" 877 | \n",
" 787 | \n",
" 558 | \n",
" 533 | \n",
" 753 | \n",
" 712 | \n",
" 245 | \n",
" 326 | \n",
" 320 | \n",
"
\n",
" \n",
" 7 | \n",
" 00617 | \n",
" 24508 | \n",
" 1340 | \n",
" 1401 | \n",
" 1702 | \n",
" 1686 | \n",
" 1709 | \n",
" 1607 | \n",
" 1570 | \n",
" 1719 | \n",
" 1766 | \n",
" 1584 | \n",
" 1434 | \n",
" 1195 | \n",
" 1521 | \n",
" 1390 | \n",
" 1117 | \n",
" 814 | \n",
" 450 | \n",
" 503 | \n",
"
\n",
" \n",
" 8 | \n",
" 00622 | \n",
" 7405 | \n",
" 224 | \n",
" 296 | \n",
" 489 | \n",
" 477 | \n",
" 428 | \n",
" 243 | \n",
" 352 | \n",
" 382 | \n",
" 436 | \n",
" 477 | \n",
" 579 | \n",
" 604 | \n",
" 365 | \n",
" 519 | \n",
" 588 | \n",
" 371 | \n",
" 334 | \n",
" 241 | \n",
"
\n",
" \n",
" 9 | \n",
" 00623 | \n",
" 42321 | \n",
" 1945 | \n",
" 2599 | \n",
" 2457 | \n",
" 2906 | \n",
" 2795 | \n",
" 2528 | \n",
" 2279 | \n",
" 2790 | \n",
" 3075 | \n",
" 3050 | \n",
" 2580 | \n",
" 2298 | \n",
" 2596 | \n",
" 2652 | \n",
" 2073 | \n",
" 1777 | \n",
" 1184 | \n",
" 737 | \n",
"
\n",
" \n",
" 10 | \n",
" 00624 | \n",
" 23238 | \n",
" 1396 | \n",
" 1604 | \n",
" 1562 | \n",
" 1609 | \n",
" 1684 | \n",
" 1651 | \n",
" 1399 | \n",
" 1653 | \n",
" 1108 | \n",
" 1477 | \n",
" 1504 | \n",
" 1602 | \n",
" 1413 | \n",
" 1103 | \n",
" 1079 | \n",
" 806 | \n",
" 363 | \n",
" 225 | \n",
"
\n",
" \n",
" 11 | \n",
" 00627 | \n",
" 32936 | \n",
" 1557 | \n",
" 1770 | \n",
" 2109 | \n",
" 2333 | \n",
" 2377 | \n",
" 2024 | \n",
" 1886 | \n",
" 2366 | \n",
" 1949 | \n",
" 2320 | \n",
" 2269 | \n",
" 2051 | \n",
" 2102 | \n",
" 2022 | \n",
" 1540 | \n",
" 1109 | \n",
" 410 | \n",
" 742 | \n",
"
\n",
" \n",
" 12 | \n",
" 00631 | \n",
" 1555 | \n",
" 34 | \n",
" 54 | \n",
" 54 | \n",
" 75 | \n",
" 133 | \n",
" 116 | \n",
" 103 | \n",
" 56 | \n",
" 105 | \n",
" 78 | \n",
" 105 | \n",
" 110 | \n",
" 138 | \n",
" 114 | \n",
" 87 | \n",
" 68 | \n",
" 68 | \n",
" 57 | \n",
"
\n",
" \n",
" 13 | \n",
" 00637 | \n",
" 23610 | \n",
" 1210 | \n",
" 1199 | \n",
" 1629 | \n",
" 1665 | \n",
" 1607 | \n",
" 1368 | \n",
" 1309 | \n",
" 1384 | \n",
" 1576 | \n",
" 1461 | \n",
" 1424 | \n",
" 1501 | \n",
" 1432 | \n",
" 1329 | \n",
" 1432 | \n",
" 1056 | \n",
" 527 | \n",
" 501 | \n",
"
\n",
" \n",
" 14 | \n",
" 00638 | \n",
" 17202 | \n",
" 1025 | \n",
" 1170 | \n",
" 993 | \n",
" 1076 | \n",
" 1232 | \n",
" 1077 | \n",
" 1041 | \n",
" 930 | \n",
" 1042 | \n",
" 993 | \n",
" 1178 | \n",
" 1086 | \n",
" 1305 | \n",
" 991 | \n",
" 772 | \n",
" 552 | \n",
" 497 | \n",
" 242 | \n",
"
\n",
" \n",
" 15 | \n",
" 00641 | \n",
" 28332 | \n",
" 1359 | \n",
" 1546 | \n",
" 1760 | \n",
" 1975 | \n",
" 2037 | \n",
" 1832 | \n",
" 1532 | \n",
" 1579 | \n",
" 1645 | \n",
" 1766 | \n",
" 1985 | \n",
" 1937 | \n",
" 2080 | \n",
" 1717 | \n",
" 1358 | \n",
" 1092 | \n",
" 494 | \n",
" 638 | \n",
"
\n",
" \n",
" 16 | \n",
" 00646 | \n",
" 38008 | \n",
" 2033 | \n",
" 2570 | \n",
" 2658 | \n",
" 2730 | \n",
" 2448 | \n",
" 2078 | \n",
" 2240 | \n",
" 2610 | \n",
" 3112 | \n",
" 2767 | \n",
" 2477 | \n",
" 2020 | \n",
" 2349 | \n",
" 2258 | \n",
" 1405 | \n",
" 913 | \n",
" 843 | \n",
" 497 | \n",
"
\n",
" \n",
" 17 | \n",
" 00647 | \n",
" 5610 | \n",
" 226 | \n",
" 204 | \n",
" 264 | \n",
" 357 | \n",
" 411 | \n",
" 241 | \n",
" 406 | \n",
" 232 | \n",
" 280 | \n",
" 403 | \n",
" 398 | \n",
" 332 | \n",
" 423 | \n",
" 502 | \n",
" 357 | \n",
" 285 | \n",
" 117 | \n",
" 172 | \n",
"
\n",
" \n",
" 18 | \n",
" 00650 | \n",
" 14664 | \n",
" 848 | \n",
" 886 | \n",
" 939 | \n",
" 1175 | \n",
" 1147 | \n",
" 819 | \n",
" 847 | \n",
" 903 | \n",
" 963 | \n",
" 1058 | \n",
" 915 | \n",
" 715 | \n",
" 876 | \n",
" 851 | \n",
" 692 | \n",
" 460 | \n",
" 330 | \n",
" 240 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Geographic Area Name Estimate!!Total!!Total population \\\n",
"0 00601 17599 \n",
"1 00602 39209 \n",
"2 00603 50135 \n",
"3 00606 6304 \n",
"4 00610 27590 \n",
"5 00612 62566 \n",
"6 00616 10687 \n",
"7 00617 24508 \n",
"8 00622 7405 \n",
"9 00623 42321 \n",
"10 00624 23238 \n",
"11 00627 32936 \n",
"12 00631 1555 \n",
"13 00637 23610 \n",
"14 00638 17202 \n",
"15 00641 28332 \n",
"16 00646 38008 \n",
"17 00647 5610 \n",
"18 00650 14664 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!Under 5 years \\\n",
"0 938 \n",
"1 1794 \n",
"2 2469 \n",
"3 312 \n",
"4 1256 \n",
"5 3260 \n",
"6 469 \n",
"7 1340 \n",
"8 224 \n",
"9 1945 \n",
"10 1396 \n",
"11 1557 \n",
"12 34 \n",
"13 1210 \n",
"14 1025 \n",
"15 1359 \n",
"16 2033 \n",
"17 226 \n",
"18 848 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!5 to 9 years \\\n",
"0 1046 \n",
"1 2320 \n",
"2 2778 \n",
"3 304 \n",
"4 1346 \n",
"5 3077 \n",
"6 417 \n",
"7 1401 \n",
"8 296 \n",
"9 2599 \n",
"10 1604 \n",
"11 1770 \n",
"12 54 \n",
"13 1199 \n",
"14 1170 \n",
"15 1546 \n",
"16 2570 \n",
"17 204 \n",
"18 886 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!10 to 14 years \\\n",
"0 1278 \n",
"1 2113 \n",
"2 3260 \n",
"3 420 \n",
"4 1895 \n",
"5 4249 \n",
"6 405 \n",
"7 1702 \n",
"8 489 \n",
"9 2457 \n",
"10 1562 \n",
"11 2109 \n",
"12 54 \n",
"13 1629 \n",
"14 993 \n",
"15 1760 \n",
"16 2658 \n",
"17 264 \n",
"18 939 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!15 to 19 years \\\n",
"0 1250 \n",
"1 2706 \n",
"2 3443 \n",
"3 479 \n",
"4 1939 \n",
"5 4397 \n",
"6 678 \n",
"7 1686 \n",
"8 477 \n",
"9 2906 \n",
"10 1609 \n",
"11 2333 \n",
"12 75 \n",
"13 1665 \n",
"14 1076 \n",
"15 1975 \n",
"16 2730 \n",
"17 357 \n",
"18 1175 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!20 to 24 years \\\n",
"0 1218 \n",
"1 2751 \n",
"2 3194 \n",
"3 443 \n",
"4 1924 \n",
"5 4346 \n",
"6 901 \n",
"7 1709 \n",
"8 428 \n",
"9 2795 \n",
"10 1684 \n",
"11 2377 \n",
"12 133 \n",
"13 1607 \n",
"14 1232 \n",
"15 2037 \n",
"16 2448 \n",
"17 411 \n",
"18 1147 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!25 to 29 years \\\n",
"0 1148 \n",
"1 2537 \n",
"2 3264 \n",
"3 302 \n",
"4 1624 \n",
"5 3621 \n",
"6 784 \n",
"7 1607 \n",
"8 243 \n",
"9 2528 \n",
"10 1651 \n",
"11 2024 \n",
"12 116 \n",
"13 1368 \n",
"14 1077 \n",
"15 1832 \n",
"16 2078 \n",
"17 241 \n",
"18 819 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!30 to 34 years \\\n",
"0 1073 \n",
"1 2350 \n",
"2 3021 \n",
"3 329 \n",
"4 1575 \n",
"5 3586 \n",
"6 574 \n",
"7 1570 \n",
"8 352 \n",
"9 2279 \n",
"10 1399 \n",
"11 1886 \n",
"12 103 \n",
"13 1309 \n",
"14 1041 \n",
"15 1532 \n",
"16 2240 \n",
"17 406 \n",
"18 847 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!35 to 39 years \\\n",
"0 1121 \n",
"1 2715 \n",
"2 3296 \n",
"3 279 \n",
"4 1638 \n",
"5 3550 \n",
"6 661 \n",
"7 1719 \n",
"8 382 \n",
"9 2790 \n",
"10 1653 \n",
"11 2366 \n",
"12 56 \n",
"13 1384 \n",
"14 930 \n",
"15 1579 \n",
"16 2610 \n",
"17 232 \n",
"18 903 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!40 to 44 years \\\n",
"0 918 \n",
"1 2227 \n",
"2 3099 \n",
"3 557 \n",
"4 1944 \n",
"5 4225 \n",
"6 687 \n",
"7 1766 \n",
"8 436 \n",
"9 3075 \n",
"10 1108 \n",
"11 1949 \n",
"12 105 \n",
"13 1576 \n",
"14 1042 \n",
"15 1645 \n",
"16 3112 \n",
"17 280 \n",
"18 963 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!45 to 49 years \\\n",
"0 1147 \n",
"1 2749 \n",
"2 3070 \n",
"3 457 \n",
"4 1937 \n",
"5 4312 \n",
"6 877 \n",
"7 1584 \n",
"8 477 \n",
"9 3050 \n",
"10 1477 \n",
"11 2320 \n",
"12 78 \n",
"13 1461 \n",
"14 993 \n",
"15 1766 \n",
"16 2767 \n",
"17 403 \n",
"18 1058 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!50 to 54 years \\\n",
"0 1257 \n",
"1 2942 \n",
"2 3336 \n",
"3 483 \n",
"4 1910 \n",
"5 4021 \n",
"6 787 \n",
"7 1434 \n",
"8 579 \n",
"9 2580 \n",
"10 1504 \n",
"11 2269 \n",
"12 105 \n",
"13 1424 \n",
"14 1178 \n",
"15 1985 \n",
"16 2477 \n",
"17 398 \n",
"18 915 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!55 to 59 years \\\n",
"0 1215 \n",
"1 2833 \n",
"2 2982 \n",
"3 317 \n",
"4 1748 \n",
"5 3607 \n",
"6 558 \n",
"7 1195 \n",
"8 604 \n",
"9 2298 \n",
"10 1602 \n",
"11 2051 \n",
"12 110 \n",
"13 1501 \n",
"14 1086 \n",
"15 1937 \n",
"16 2020 \n",
"17 332 \n",
"18 715 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!60 to 64 years \\\n",
"0 1093 \n",
"1 2810 \n",
"2 3323 \n",
"3 555 \n",
"4 1744 \n",
"5 3903 \n",
"6 533 \n",
"7 1521 \n",
"8 365 \n",
"9 2596 \n",
"10 1413 \n",
"11 2102 \n",
"12 138 \n",
"13 1432 \n",
"14 1305 \n",
"15 2080 \n",
"16 2349 \n",
"17 423 \n",
"18 876 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!65 to 69 years \\\n",
"0 1080 \n",
"1 2181 \n",
"2 3306 \n",
"3 345 \n",
"4 1629 \n",
"5 3812 \n",
"6 753 \n",
"7 1390 \n",
"8 519 \n",
"9 2652 \n",
"10 1103 \n",
"11 2022 \n",
"12 114 \n",
"13 1329 \n",
"14 991 \n",
"15 1717 \n",
"16 2258 \n",
"17 502 \n",
"18 851 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!70 to 74 years \\\n",
"0 616 \n",
"1 1753 \n",
"2 2489 \n",
"3 303 \n",
"4 1400 \n",
"5 3038 \n",
"6 712 \n",
"7 1117 \n",
"8 588 \n",
"9 2073 \n",
"10 1079 \n",
"11 1540 \n",
"12 87 \n",
"13 1432 \n",
"14 772 \n",
"15 1358 \n",
"16 1405 \n",
"17 357 \n",
"18 692 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!75 to 79 years \\\n",
"0 524 \n",
"1 991 \n",
"2 1763 \n",
"3 171 \n",
"4 1140 \n",
"5 2551 \n",
"6 245 \n",
"7 814 \n",
"8 371 \n",
"9 1777 \n",
"10 806 \n",
"11 1109 \n",
"12 68 \n",
"13 1056 \n",
"14 552 \n",
"15 1092 \n",
"16 913 \n",
"17 285 \n",
"18 460 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!80 to 84 years \\\n",
"0 349 \n",
"1 789 \n",
"2 1315 \n",
"3 74 \n",
"4 576 \n",
"5 1433 \n",
"6 326 \n",
"7 450 \n",
"8 334 \n",
"9 1184 \n",
"10 363 \n",
"11 410 \n",
"12 68 \n",
"13 527 \n",
"14 497 \n",
"15 494 \n",
"16 843 \n",
"17 117 \n",
"18 330 \n",
"\n",
" Estimate!!Total!!Total population!!AGE!!85 years and over \n",
"0 328 \n",
"1 648 \n",
"2 727 \n",
"3 174 \n",
"4 365 \n",
"5 1578 \n",
"6 320 \n",
"7 503 \n",
"8 241 \n",
"9 737 \n",
"10 225 \n",
"11 742 \n",
"12 57 \n",
"13 501 \n",
"14 242 \n",
"15 638 \n",
"16 497 \n",
"17 172 \n",
"18 240 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = df1.replace('ZCTA5 ','', regex=True) \n",
"\n",
"df2"
]
},
{
"cell_type": "markdown",
"id": "1031ec69-0831-4769-9312-888da1fb26bd",
"metadata": {},
"source": [
"Rename the column names to the standard names using rename function."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f14aac09-e55e-4a29-bd31-8c965c51cec4",
"metadata": {},
"outputs": [],
"source": [
"df2.rename(columns={'Geographic Area Name':'ZCTA5','Estimate!!Total!!Total population':'total_population','Estimate!!Total!!Total population!!AGE!!Under 5 years':'age_5under','Estimate!!Total!!Total population!!AGE!!5 to 9 years':'age_5to9y','Estimate!!Total!!Total population!!AGE!!10 to 14 years':'age_10to14y','Estimate!!Total!!Total population!!AGE!!15 to 19 years':'age_15to19y'}, inplace=True)\n",
"\n",
"df2.rename(columns={'Estimate!!Total!!Total population!!AGE!!20 to 24 years':'age_20to24y','Estimate!!Total!!Total population!!AGE!!25 to 29 years':'age_25to29y','Estimate!!Total!!Total population!!AGE!!30 to 34 years':'age_30to34y','Estimate!!Total!!Total population!!AGE!!35 to 39 years':'age_35to39y','Estimate!!Total!!Total population!!AGE!!40 to 44 years':'age_40to44y','Estimate!!Total!!Total population!!AGE!!45 to 49 years':'age_45to49y'}, inplace=True)\n",
"\n",
"df2.rename(columns={'Estimate!!Total!!Total population!!AGE!!50 to 54 years':'age_50to54y','Estimate!!Total!!Total population!!AGE!!55 to 59 years':'age_55to59y','Estimate!!Total!!Total population!!AGE!!60 to 64 years':'age_60to64y','Estimate!!Total!!Total population!!AGE!!65 to 69 years':'age_65to69y','Estimate!!Total!!Total population!!AGE!!70 to 74 years':'age_70to74y','Estimate!!Total!!Total population!!AGE!!75 to 79 years':'age_75to79y','Estimate!!Total!!Total population!!AGE!!80 to 84 years':'age_80to84y','Estimate!!Total!!Total population!!AGE!!85 years and over':'age_above85y'}, inplace=True)"
]
},
{
"cell_type": "markdown",
"id": "0b5e137c-6cb9-4b02-8426-ba5a7a090e63",
"metadata": {},
"source": [
"Drop the duplicates from the dataframe using drop_duplicates function"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "dd592397-ec50-40b8-8297-278ae6f35586",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ZCTA5 | \n",
" total_population | \n",
" age_5under | \n",
" age_5to9y | \n",
" age_10to14y | \n",
" age_15to19y | \n",
" age_20to24y | \n",
" age_25to29y | \n",
" age_30to34y | \n",
" age_35to39y | \n",
" age_40to44y | \n",
" age_45to49y | \n",
" age_50to54y | \n",
" age_55to59y | \n",
" age_60to64y | \n",
" age_65to69y | \n",
" age_70to74y | \n",
" age_75to79y | \n",
" age_80to84y | \n",
" age_above85y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 00601 | \n",
" 17599 | \n",
" 938 | \n",
" 1046 | \n",
" 1278 | \n",
" 1250 | \n",
" 1218 | \n",
" 1148 | \n",
" 1073 | \n",
" 1121 | \n",
" 918 | \n",
" 1147 | \n",
" 1257 | \n",
" 1215 | \n",
" 1093 | \n",
" 1080 | \n",
" 616 | \n",
" 524 | \n",
" 349 | \n",
" 328 | \n",
"
\n",
" \n",
" 1 | \n",
" 00602 | \n",
" 39209 | \n",
" 1794 | \n",
" 2320 | \n",
" 2113 | \n",
" 2706 | \n",
" 2751 | \n",
" 2537 | \n",
" 2350 | \n",
" 2715 | \n",
" 2227 | \n",
" 2749 | \n",
" 2942 | \n",
" 2833 | \n",
" 2810 | \n",
" 2181 | \n",
" 1753 | \n",
" 991 | \n",
" 789 | \n",
" 648 | \n",
"
\n",
" \n",
" 2 | \n",
" 00603 | \n",
" 50135 | \n",
" 2469 | \n",
" 2778 | \n",
" 3260 | \n",
" 3443 | \n",
" 3194 | \n",
" 3264 | \n",
" 3021 | \n",
" 3296 | \n",
" 3099 | \n",
" 3070 | \n",
" 3336 | \n",
" 2982 | \n",
" 3323 | \n",
" 3306 | \n",
" 2489 | \n",
" 1763 | \n",
" 1315 | \n",
" 727 | \n",
"
\n",
" \n",
" 3 | \n",
" 00606 | \n",
" 6304 | \n",
" 312 | \n",
" 304 | \n",
" 420 | \n",
" 479 | \n",
" 443 | \n",
" 302 | \n",
" 329 | \n",
" 279 | \n",
" 557 | \n",
" 457 | \n",
" 483 | \n",
" 317 | \n",
" 555 | \n",
" 345 | \n",
" 303 | \n",
" 171 | \n",
" 74 | \n",
" 174 | \n",
"
\n",
" \n",
" 4 | \n",
" 00610 | \n",
" 27590 | \n",
" 1256 | \n",
" 1346 | \n",
" 1895 | \n",
" 1939 | \n",
" 1924 | \n",
" 1624 | \n",
" 1575 | \n",
" 1638 | \n",
" 1944 | \n",
" 1937 | \n",
" 1910 | \n",
" 1748 | \n",
" 1744 | \n",
" 1629 | \n",
" 1400 | \n",
" 1140 | \n",
" 576 | \n",
" 365 | \n",
"
\n",
" \n",
" 5 | \n",
" 00612 | \n",
" 62566 | \n",
" 3260 | \n",
" 3077 | \n",
" 4249 | \n",
" 4397 | \n",
" 4346 | \n",
" 3621 | \n",
" 3586 | \n",
" 3550 | \n",
" 4225 | \n",
" 4312 | \n",
" 4021 | \n",
" 3607 | \n",
" 3903 | \n",
" 3812 | \n",
" 3038 | \n",
" 2551 | \n",
" 1433 | \n",
" 1578 | \n",
"
\n",
" \n",
" 6 | \n",
" 00616 | \n",
" 10687 | \n",
" 469 | \n",
" 417 | \n",
" 405 | \n",
" 678 | \n",
" 901 | \n",
" 784 | \n",
" 574 | \n",
" 661 | \n",
" 687 | \n",
" 877 | \n",
" 787 | \n",
" 558 | \n",
" 533 | \n",
" 753 | \n",
" 712 | \n",
" 245 | \n",
" 326 | \n",
" 320 | \n",
"
\n",
" \n",
" 7 | \n",
" 00617 | \n",
" 24508 | \n",
" 1340 | \n",
" 1401 | \n",
" 1702 | \n",
" 1686 | \n",
" 1709 | \n",
" 1607 | \n",
" 1570 | \n",
" 1719 | \n",
" 1766 | \n",
" 1584 | \n",
" 1434 | \n",
" 1195 | \n",
" 1521 | \n",
" 1390 | \n",
" 1117 | \n",
" 814 | \n",
" 450 | \n",
" 503 | \n",
"
\n",
" \n",
" 8 | \n",
" 00622 | \n",
" 7405 | \n",
" 224 | \n",
" 296 | \n",
" 489 | \n",
" 477 | \n",
" 428 | \n",
" 243 | \n",
" 352 | \n",
" 382 | \n",
" 436 | \n",
" 477 | \n",
" 579 | \n",
" 604 | \n",
" 365 | \n",
" 519 | \n",
" 588 | \n",
" 371 | \n",
" 334 | \n",
" 241 | \n",
"
\n",
" \n",
" 9 | \n",
" 00623 | \n",
" 42321 | \n",
" 1945 | \n",
" 2599 | \n",
" 2457 | \n",
" 2906 | \n",
" 2795 | \n",
" 2528 | \n",
" 2279 | \n",
" 2790 | \n",
" 3075 | \n",
" 3050 | \n",
" 2580 | \n",
" 2298 | \n",
" 2596 | \n",
" 2652 | \n",
" 2073 | \n",
" 1777 | \n",
" 1184 | \n",
" 737 | \n",
"
\n",
" \n",
" 10 | \n",
" 00624 | \n",
" 23238 | \n",
" 1396 | \n",
" 1604 | \n",
" 1562 | \n",
" 1609 | \n",
" 1684 | \n",
" 1651 | \n",
" 1399 | \n",
" 1653 | \n",
" 1108 | \n",
" 1477 | \n",
" 1504 | \n",
" 1602 | \n",
" 1413 | \n",
" 1103 | \n",
" 1079 | \n",
" 806 | \n",
" 363 | \n",
" 225 | \n",
"
\n",
" \n",
" 11 | \n",
" 00627 | \n",
" 32936 | \n",
" 1557 | \n",
" 1770 | \n",
" 2109 | \n",
" 2333 | \n",
" 2377 | \n",
" 2024 | \n",
" 1886 | \n",
" 2366 | \n",
" 1949 | \n",
" 2320 | \n",
" 2269 | \n",
" 2051 | \n",
" 2102 | \n",
" 2022 | \n",
" 1540 | \n",
" 1109 | \n",
" 410 | \n",
" 742 | \n",
"
\n",
" \n",
" 12 | \n",
" 00631 | \n",
" 1555 | \n",
" 34 | \n",
" 54 | \n",
" 54 | \n",
" 75 | \n",
" 133 | \n",
" 116 | \n",
" 103 | \n",
" 56 | \n",
" 105 | \n",
" 78 | \n",
" 105 | \n",
" 110 | \n",
" 138 | \n",
" 114 | \n",
" 87 | \n",
" 68 | \n",
" 68 | \n",
" 57 | \n",
"
\n",
" \n",
" 13 | \n",
" 00637 | \n",
" 23610 | \n",
" 1210 | \n",
" 1199 | \n",
" 1629 | \n",
" 1665 | \n",
" 1607 | \n",
" 1368 | \n",
" 1309 | \n",
" 1384 | \n",
" 1576 | \n",
" 1461 | \n",
" 1424 | \n",
" 1501 | \n",
" 1432 | \n",
" 1329 | \n",
" 1432 | \n",
" 1056 | \n",
" 527 | \n",
" 501 | \n",
"
\n",
" \n",
" 14 | \n",
" 00638 | \n",
" 17202 | \n",
" 1025 | \n",
" 1170 | \n",
" 993 | \n",
" 1076 | \n",
" 1232 | \n",
" 1077 | \n",
" 1041 | \n",
" 930 | \n",
" 1042 | \n",
" 993 | \n",
" 1178 | \n",
" 1086 | \n",
" 1305 | \n",
" 991 | \n",
" 772 | \n",
" 552 | \n",
" 497 | \n",
" 242 | \n",
"
\n",
" \n",
" 15 | \n",
" 00641 | \n",
" 28332 | \n",
" 1359 | \n",
" 1546 | \n",
" 1760 | \n",
" 1975 | \n",
" 2037 | \n",
" 1832 | \n",
" 1532 | \n",
" 1579 | \n",
" 1645 | \n",
" 1766 | \n",
" 1985 | \n",
" 1937 | \n",
" 2080 | \n",
" 1717 | \n",
" 1358 | \n",
" 1092 | \n",
" 494 | \n",
" 638 | \n",
"
\n",
" \n",
" 16 | \n",
" 00646 | \n",
" 38008 | \n",
" 2033 | \n",
" 2570 | \n",
" 2658 | \n",
" 2730 | \n",
" 2448 | \n",
" 2078 | \n",
" 2240 | \n",
" 2610 | \n",
" 3112 | \n",
" 2767 | \n",
" 2477 | \n",
" 2020 | \n",
" 2349 | \n",
" 2258 | \n",
" 1405 | \n",
" 913 | \n",
" 843 | \n",
" 497 | \n",
"
\n",
" \n",
" 17 | \n",
" 00647 | \n",
" 5610 | \n",
" 226 | \n",
" 204 | \n",
" 264 | \n",
" 357 | \n",
" 411 | \n",
" 241 | \n",
" 406 | \n",
" 232 | \n",
" 280 | \n",
" 403 | \n",
" 398 | \n",
" 332 | \n",
" 423 | \n",
" 502 | \n",
" 357 | \n",
" 285 | \n",
" 117 | \n",
" 172 | \n",
"
\n",
" \n",
" 18 | \n",
" 00650 | \n",
" 14664 | \n",
" 848 | \n",
" 886 | \n",
" 939 | \n",
" 1175 | \n",
" 1147 | \n",
" 819 | \n",
" 847 | \n",
" 903 | \n",
" 963 | \n",
" 1058 | \n",
" 915 | \n",
" 715 | \n",
" 876 | \n",
" 851 | \n",
" 692 | \n",
" 460 | \n",
" 330 | \n",
" 240 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ZCTA5 total_population age_5under age_5to9y age_10to14y age_15to19y \\\n",
"0 00601 17599 938 1046 1278 1250 \n",
"1 00602 39209 1794 2320 2113 2706 \n",
"2 00603 50135 2469 2778 3260 3443 \n",
"3 00606 6304 312 304 420 479 \n",
"4 00610 27590 1256 1346 1895 1939 \n",
"5 00612 62566 3260 3077 4249 4397 \n",
"6 00616 10687 469 417 405 678 \n",
"7 00617 24508 1340 1401 1702 1686 \n",
"8 00622 7405 224 296 489 477 \n",
"9 00623 42321 1945 2599 2457 2906 \n",
"10 00624 23238 1396 1604 1562 1609 \n",
"11 00627 32936 1557 1770 2109 2333 \n",
"12 00631 1555 34 54 54 75 \n",
"13 00637 23610 1210 1199 1629 1665 \n",
"14 00638 17202 1025 1170 993 1076 \n",
"15 00641 28332 1359 1546 1760 1975 \n",
"16 00646 38008 2033 2570 2658 2730 \n",
"17 00647 5610 226 204 264 357 \n",
"18 00650 14664 848 886 939 1175 \n",
"\n",
" age_20to24y age_25to29y age_30to34y age_35to39y age_40to44y \\\n",
"0 1218 1148 1073 1121 918 \n",
"1 2751 2537 2350 2715 2227 \n",
"2 3194 3264 3021 3296 3099 \n",
"3 443 302 329 279 557 \n",
"4 1924 1624 1575 1638 1944 \n",
"5 4346 3621 3586 3550 4225 \n",
"6 901 784 574 661 687 \n",
"7 1709 1607 1570 1719 1766 \n",
"8 428 243 352 382 436 \n",
"9 2795 2528 2279 2790 3075 \n",
"10 1684 1651 1399 1653 1108 \n",
"11 2377 2024 1886 2366 1949 \n",
"12 133 116 103 56 105 \n",
"13 1607 1368 1309 1384 1576 \n",
"14 1232 1077 1041 930 1042 \n",
"15 2037 1832 1532 1579 1645 \n",
"16 2448 2078 2240 2610 3112 \n",
"17 411 241 406 232 280 \n",
"18 1147 819 847 903 963 \n",
"\n",
" age_45to49y age_50to54y age_55to59y age_60to64y age_65to69y \\\n",
"0 1147 1257 1215 1093 1080 \n",
"1 2749 2942 2833 2810 2181 \n",
"2 3070 3336 2982 3323 3306 \n",
"3 457 483 317 555 345 \n",
"4 1937 1910 1748 1744 1629 \n",
"5 4312 4021 3607 3903 3812 \n",
"6 877 787 558 533 753 \n",
"7 1584 1434 1195 1521 1390 \n",
"8 477 579 604 365 519 \n",
"9 3050 2580 2298 2596 2652 \n",
"10 1477 1504 1602 1413 1103 \n",
"11 2320 2269 2051 2102 2022 \n",
"12 78 105 110 138 114 \n",
"13 1461 1424 1501 1432 1329 \n",
"14 993 1178 1086 1305 991 \n",
"15 1766 1985 1937 2080 1717 \n",
"16 2767 2477 2020 2349 2258 \n",
"17 403 398 332 423 502 \n",
"18 1058 915 715 876 851 \n",
"\n",
" age_70to74y age_75to79y age_80to84y age_above85y \n",
"0 616 524 349 328 \n",
"1 1753 991 789 648 \n",
"2 2489 1763 1315 727 \n",
"3 303 171 74 174 \n",
"4 1400 1140 576 365 \n",
"5 3038 2551 1433 1578 \n",
"6 712 245 326 320 \n",
"7 1117 814 450 503 \n",
"8 588 371 334 241 \n",
"9 2073 1777 1184 737 \n",
"10 1079 806 363 225 \n",
"11 1540 1109 410 742 \n",
"12 87 68 68 57 \n",
"13 1432 1056 527 501 \n",
"14 772 552 497 242 \n",
"15 1358 1092 494 638 \n",
"16 1405 913 843 497 \n",
"17 357 285 117 172 \n",
"18 692 460 330 240 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df3 = df2.drop_duplicates()\n",
"\n",
"df3"
]
},
{
"cell_type": "markdown",
"id": "fe8c6a6d-5020-4dd8-9481-ab9665e17f8b",
"metadata": {},
"source": [
"Add new column to the dataframe using insert function for the year which we are loading data."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "8442ce68-2858-4d4a-9d41-331f4323d1eb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ZCTA5 | \n",
" Year | \n",
" total_population | \n",
" age_5under | \n",
" age_5to9y | \n",
" age_10to14y | \n",
" age_15to19y | \n",
" age_20to24y | \n",
" age_25to29y | \n",
" age_30to34y | \n",
" ... | \n",
" age_40to44y | \n",
" age_45to49y | \n",
" age_50to54y | \n",
" age_55to59y | \n",
" age_60to64y | \n",
" age_65to69y | \n",
" age_70to74y | \n",
" age_75to79y | \n",
" age_80to84y | \n",
" age_above85y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 00601 | \n",
" 2020 | \n",
" 17599 | \n",
" 938 | \n",
" 1046 | \n",
" 1278 | \n",
" 1250 | \n",
" 1218 | \n",
" 1148 | \n",
" 1073 | \n",
" ... | \n",
" 918 | \n",
" 1147 | \n",
" 1257 | \n",
" 1215 | \n",
" 1093 | \n",
" 1080 | \n",
" 616 | \n",
" 524 | \n",
" 349 | \n",
" 328 | \n",
"
\n",
" \n",
" 1 | \n",
" 00602 | \n",
" 2020 | \n",
" 39209 | \n",
" 1794 | \n",
" 2320 | \n",
" 2113 | \n",
" 2706 | \n",
" 2751 | \n",
" 2537 | \n",
" 2350 | \n",
" ... | \n",
" 2227 | \n",
" 2749 | \n",
" 2942 | \n",
" 2833 | \n",
" 2810 | \n",
" 2181 | \n",
" 1753 | \n",
" 991 | \n",
" 789 | \n",
" 648 | \n",
"
\n",
" \n",
" 2 | \n",
" 00603 | \n",
" 2020 | \n",
" 50135 | \n",
" 2469 | \n",
" 2778 | \n",
" 3260 | \n",
" 3443 | \n",
" 3194 | \n",
" 3264 | \n",
" 3021 | \n",
" ... | \n",
" 3099 | \n",
" 3070 | \n",
" 3336 | \n",
" 2982 | \n",
" 3323 | \n",
" 3306 | \n",
" 2489 | \n",
" 1763 | \n",
" 1315 | \n",
" 727 | \n",
"
\n",
" \n",
" 3 | \n",
" 00606 | \n",
" 2020 | \n",
" 6304 | \n",
" 312 | \n",
" 304 | \n",
" 420 | \n",
" 479 | \n",
" 443 | \n",
" 302 | \n",
" 329 | \n",
" ... | \n",
" 557 | \n",
" 457 | \n",
" 483 | \n",
" 317 | \n",
" 555 | \n",
" 345 | \n",
" 303 | \n",
" 171 | \n",
" 74 | \n",
" 174 | \n",
"
\n",
" \n",
" 4 | \n",
" 00610 | \n",
" 2020 | \n",
" 27590 | \n",
" 1256 | \n",
" 1346 | \n",
" 1895 | \n",
" 1939 | \n",
" 1924 | \n",
" 1624 | \n",
" 1575 | \n",
" ... | \n",
" 1944 | \n",
" 1937 | \n",
" 1910 | \n",
" 1748 | \n",
" 1744 | \n",
" 1629 | \n",
" 1400 | \n",
" 1140 | \n",
" 576 | \n",
" 365 | \n",
"
\n",
" \n",
" 5 | \n",
" 00612 | \n",
" 2020 | \n",
" 62566 | \n",
" 3260 | \n",
" 3077 | \n",
" 4249 | \n",
" 4397 | \n",
" 4346 | \n",
" 3621 | \n",
" 3586 | \n",
" ... | \n",
" 4225 | \n",
" 4312 | \n",
" 4021 | \n",
" 3607 | \n",
" 3903 | \n",
" 3812 | \n",
" 3038 | \n",
" 2551 | \n",
" 1433 | \n",
" 1578 | \n",
"
\n",
" \n",
" 6 | \n",
" 00616 | \n",
" 2020 | \n",
" 10687 | \n",
" 469 | \n",
" 417 | \n",
" 405 | \n",
" 678 | \n",
" 901 | \n",
" 784 | \n",
" 574 | \n",
" ... | \n",
" 687 | \n",
" 877 | \n",
" 787 | \n",
" 558 | \n",
" 533 | \n",
" 753 | \n",
" 712 | \n",
" 245 | \n",
" 326 | \n",
" 320 | \n",
"
\n",
" \n",
" 7 | \n",
" 00617 | \n",
" 2020 | \n",
" 24508 | \n",
" 1340 | \n",
" 1401 | \n",
" 1702 | \n",
" 1686 | \n",
" 1709 | \n",
" 1607 | \n",
" 1570 | \n",
" ... | \n",
" 1766 | \n",
" 1584 | \n",
" 1434 | \n",
" 1195 | \n",
" 1521 | \n",
" 1390 | \n",
" 1117 | \n",
" 814 | \n",
" 450 | \n",
" 503 | \n",
"
\n",
" \n",
" 8 | \n",
" 00622 | \n",
" 2020 | \n",
" 7405 | \n",
" 224 | \n",
" 296 | \n",
" 489 | \n",
" 477 | \n",
" 428 | \n",
" 243 | \n",
" 352 | \n",
" ... | \n",
" 436 | \n",
" 477 | \n",
" 579 | \n",
" 604 | \n",
" 365 | \n",
" 519 | \n",
" 588 | \n",
" 371 | \n",
" 334 | \n",
" 241 | \n",
"
\n",
" \n",
" 9 | \n",
" 00623 | \n",
" 2020 | \n",
" 42321 | \n",
" 1945 | \n",
" 2599 | \n",
" 2457 | \n",
" 2906 | \n",
" 2795 | \n",
" 2528 | \n",
" 2279 | \n",
" ... | \n",
" 3075 | \n",
" 3050 | \n",
" 2580 | \n",
" 2298 | \n",
" 2596 | \n",
" 2652 | \n",
" 2073 | \n",
" 1777 | \n",
" 1184 | \n",
" 737 | \n",
"
\n",
" \n",
" 10 | \n",
" 00624 | \n",
" 2020 | \n",
" 23238 | \n",
" 1396 | \n",
" 1604 | \n",
" 1562 | \n",
" 1609 | \n",
" 1684 | \n",
" 1651 | \n",
" 1399 | \n",
" ... | \n",
" 1108 | \n",
" 1477 | \n",
" 1504 | \n",
" 1602 | \n",
" 1413 | \n",
" 1103 | \n",
" 1079 | \n",
" 806 | \n",
" 363 | \n",
" 225 | \n",
"
\n",
" \n",
" 11 | \n",
" 00627 | \n",
" 2020 | \n",
" 32936 | \n",
" 1557 | \n",
" 1770 | \n",
" 2109 | \n",
" 2333 | \n",
" 2377 | \n",
" 2024 | \n",
" 1886 | \n",
" ... | \n",
" 1949 | \n",
" 2320 | \n",
" 2269 | \n",
" 2051 | \n",
" 2102 | \n",
" 2022 | \n",
" 1540 | \n",
" 1109 | \n",
" 410 | \n",
" 742 | \n",
"
\n",
" \n",
" 12 | \n",
" 00631 | \n",
" 2020 | \n",
" 1555 | \n",
" 34 | \n",
" 54 | \n",
" 54 | \n",
" 75 | \n",
" 133 | \n",
" 116 | \n",
" 103 | \n",
" ... | \n",
" 105 | \n",
" 78 | \n",
" 105 | \n",
" 110 | \n",
" 138 | \n",
" 114 | \n",
" 87 | \n",
" 68 | \n",
" 68 | \n",
" 57 | \n",
"
\n",
" \n",
" 13 | \n",
" 00637 | \n",
" 2020 | \n",
" 23610 | \n",
" 1210 | \n",
" 1199 | \n",
" 1629 | \n",
" 1665 | \n",
" 1607 | \n",
" 1368 | \n",
" 1309 | \n",
" ... | \n",
" 1576 | \n",
" 1461 | \n",
" 1424 | \n",
" 1501 | \n",
" 1432 | \n",
" 1329 | \n",
" 1432 | \n",
" 1056 | \n",
" 527 | \n",
" 501 | \n",
"
\n",
" \n",
" 14 | \n",
" 00638 | \n",
" 2020 | \n",
" 17202 | \n",
" 1025 | \n",
" 1170 | \n",
" 993 | \n",
" 1076 | \n",
" 1232 | \n",
" 1077 | \n",
" 1041 | \n",
" ... | \n",
" 1042 | \n",
" 993 | \n",
" 1178 | \n",
" 1086 | \n",
" 1305 | \n",
" 991 | \n",
" 772 | \n",
" 552 | \n",
" 497 | \n",
" 242 | \n",
"
\n",
" \n",
" 15 | \n",
" 00641 | \n",
" 2020 | \n",
" 28332 | \n",
" 1359 | \n",
" 1546 | \n",
" 1760 | \n",
" 1975 | \n",
" 2037 | \n",
" 1832 | \n",
" 1532 | \n",
" ... | \n",
" 1645 | \n",
" 1766 | \n",
" 1985 | \n",
" 1937 | \n",
" 2080 | \n",
" 1717 | \n",
" 1358 | \n",
" 1092 | \n",
" 494 | \n",
" 638 | \n",
"
\n",
" \n",
" 16 | \n",
" 00646 | \n",
" 2020 | \n",
" 38008 | \n",
" 2033 | \n",
" 2570 | \n",
" 2658 | \n",
" 2730 | \n",
" 2448 | \n",
" 2078 | \n",
" 2240 | \n",
" ... | \n",
" 3112 | \n",
" 2767 | \n",
" 2477 | \n",
" 2020 | \n",
" 2349 | \n",
" 2258 | \n",
" 1405 | \n",
" 913 | \n",
" 843 | \n",
" 497 | \n",
"
\n",
" \n",
" 17 | \n",
" 00647 | \n",
" 2020 | \n",
" 5610 | \n",
" 226 | \n",
" 204 | \n",
" 264 | \n",
" 357 | \n",
" 411 | \n",
" 241 | \n",
" 406 | \n",
" ... | \n",
" 280 | \n",
" 403 | \n",
" 398 | \n",
" 332 | \n",
" 423 | \n",
" 502 | \n",
" 357 | \n",
" 285 | \n",
" 117 | \n",
" 172 | \n",
"
\n",
" \n",
" 18 | \n",
" 00650 | \n",
" 2020 | \n",
" 14664 | \n",
" 848 | \n",
" 886 | \n",
" 939 | \n",
" 1175 | \n",
" 1147 | \n",
" 819 | \n",
" 847 | \n",
" ... | \n",
" 963 | \n",
" 1058 | \n",
" 915 | \n",
" 715 | \n",
" 876 | \n",
" 851 | \n",
" 692 | \n",
" 460 | \n",
" 330 | \n",
" 240 | \n",
"
\n",
" \n",
"
\n",
"
19 rows × 21 columns
\n",
"
"
],
"text/plain": [
" ZCTA5 Year total_population age_5under age_5to9y age_10to14y \\\n",
"0 00601 2020 17599 938 1046 1278 \n",
"1 00602 2020 39209 1794 2320 2113 \n",
"2 00603 2020 50135 2469 2778 3260 \n",
"3 00606 2020 6304 312 304 420 \n",
"4 00610 2020 27590 1256 1346 1895 \n",
"5 00612 2020 62566 3260 3077 4249 \n",
"6 00616 2020 10687 469 417 405 \n",
"7 00617 2020 24508 1340 1401 1702 \n",
"8 00622 2020 7405 224 296 489 \n",
"9 00623 2020 42321 1945 2599 2457 \n",
"10 00624 2020 23238 1396 1604 1562 \n",
"11 00627 2020 32936 1557 1770 2109 \n",
"12 00631 2020 1555 34 54 54 \n",
"13 00637 2020 23610 1210 1199 1629 \n",
"14 00638 2020 17202 1025 1170 993 \n",
"15 00641 2020 28332 1359 1546 1760 \n",
"16 00646 2020 38008 2033 2570 2658 \n",
"17 00647 2020 5610 226 204 264 \n",
"18 00650 2020 14664 848 886 939 \n",
"\n",
" age_15to19y age_20to24y age_25to29y age_30to34y ... age_40to44y \\\n",
"0 1250 1218 1148 1073 ... 918 \n",
"1 2706 2751 2537 2350 ... 2227 \n",
"2 3443 3194 3264 3021 ... 3099 \n",
"3 479 443 302 329 ... 557 \n",
"4 1939 1924 1624 1575 ... 1944 \n",
"5 4397 4346 3621 3586 ... 4225 \n",
"6 678 901 784 574 ... 687 \n",
"7 1686 1709 1607 1570 ... 1766 \n",
"8 477 428 243 352 ... 436 \n",
"9 2906 2795 2528 2279 ... 3075 \n",
"10 1609 1684 1651 1399 ... 1108 \n",
"11 2333 2377 2024 1886 ... 1949 \n",
"12 75 133 116 103 ... 105 \n",
"13 1665 1607 1368 1309 ... 1576 \n",
"14 1076 1232 1077 1041 ... 1042 \n",
"15 1975 2037 1832 1532 ... 1645 \n",
"16 2730 2448 2078 2240 ... 3112 \n",
"17 357 411 241 406 ... 280 \n",
"18 1175 1147 819 847 ... 963 \n",
"\n",
" age_45to49y age_50to54y age_55to59y age_60to64y age_65to69y \\\n",
"0 1147 1257 1215 1093 1080 \n",
"1 2749 2942 2833 2810 2181 \n",
"2 3070 3336 2982 3323 3306 \n",
"3 457 483 317 555 345 \n",
"4 1937 1910 1748 1744 1629 \n",
"5 4312 4021 3607 3903 3812 \n",
"6 877 787 558 533 753 \n",
"7 1584 1434 1195 1521 1390 \n",
"8 477 579 604 365 519 \n",
"9 3050 2580 2298 2596 2652 \n",
"10 1477 1504 1602 1413 1103 \n",
"11 2320 2269 2051 2102 2022 \n",
"12 78 105 110 138 114 \n",
"13 1461 1424 1501 1432 1329 \n",
"14 993 1178 1086 1305 991 \n",
"15 1766 1985 1937 2080 1717 \n",
"16 2767 2477 2020 2349 2258 \n",
"17 403 398 332 423 502 \n",
"18 1058 915 715 876 851 \n",
"\n",
" age_70to74y age_75to79y age_80to84y age_above85y \n",
"0 616 524 349 328 \n",
"1 1753 991 789 648 \n",
"2 2489 1763 1315 727 \n",
"3 303 171 74 174 \n",
"4 1400 1140 576 365 \n",
"5 3038 2551 1433 1578 \n",
"6 712 245 326 320 \n",
"7 1117 814 450 503 \n",
"8 588 371 334 241 \n",
"9 2073 1777 1184 737 \n",
"10 1079 806 363 225 \n",
"11 1540 1109 410 742 \n",
"12 87 68 68 57 \n",
"13 1432 1056 527 501 \n",
"14 772 552 497 242 \n",
"15 1358 1092 494 638 \n",
"16 1405 913 843 497 \n",
"17 357 285 117 172 \n",
"18 692 460 330 240 \n",
"\n",
"[19 rows x 21 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df3.insert(1, \"Year\", \"2020\", True)\n",
"df3"
]
},
{
"cell_type": "markdown",
"id": "bb001b45-3588-411a-8721-83af265e6b32",
"metadata": {},
"source": [
"If there are files for each year then oncatenate the data for all these years using concat function. Here I just used df3 as I showed only for one year, similarly we can create multiple dataframes for each year and concatenate using the below function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e254361f-15aa-440b-a597-651bd56263bc",
"metadata": {},
"outputs": [],
"source": [
"dataframes = [df3]\n",
" \n",
"result = pd.concat(dataframes)"
]
},
{
"cell_type": "markdown",
"id": "80e41a62-5de2-450a-a821-211d4ff59de3",
"metadata": {},
"source": [
"### Load the result to the sql server database\n",
"\n",
"Now, connect to the sql and load the dataframe as new table using SQLAlchemy. It is a sql toolkit to connect and perform all the necessary sql operations from python."
]
},
{
"cell_type": "markdown",
"id": "05914398-d17b-4c4a-aba1-2b3a761577aa",
"metadata": {},
"source": [
"svr is the sql server name
\n",
"uid is the user id
\n",
"pwd is the password
\n",
"db is the database which is USZIPCODE
\n",
"create_engine will establish python jdbc to the sql
"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "31ea1c6d-c6e7-4c38-b6e0-b6b71f8190b4",
"metadata": {},
"outputs": [],
"source": [
"from sqlalchemy import create_engine\n",
"import pandas as pd\n",
"svr = 'essql1.walton.uark.edu'\n",
"uid = 'XXXXXX'\n",
"pwd = 'XXXXXX'\n",
"db = 'USZIPCODE'\n",
"# Create a connection using SQL Alchemy engine\n",
"engine = create_engine(f\"mssql+pyodbc://{uid}:{pwd}@{svr}/{db}?driver=SQL+Server+Native+Client+11.0&TrustServerCertificate=yes\", fast_executemany=True)\n",
"# Using pandas, leverage the SQL Alchemy engine that we created above\n",
"result.to_sql('AGE',engine, if_exists='replace')"
]
},
{
"cell_type": "markdown",
"id": "0f16a50a-a4aa-4148-99cd-93593a4c186a",
"metadata": {},
"source": [
"The above peice of code will create a new table called 'AGE' in the 'USZIPCODE' database. The table will have all the columns which are present in the results dataframe. "
]
},
{
"cell_type": "markdown",
"id": "cca6efd4-69f5-4da2-847a-ef1bde4b1ac7",
"metadata": {},
"source": [
"We can validate the data by logging into the sql server directly or we can read the table using read_sql as shown below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd01b8a8-3304-4173-9ce9-e3b5a2b8aa36",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_sql('select * from AGE', engine)\n",
"print('***Age data for the ZCTA5***')\n",
"print(df)"
]
},
{
"cell_type": "markdown",
"id": "986f8875-aa88-42f2-9883-e6b211ed8e44",
"metadata": {},
"source": [
"## Use cases\n",
"There are many use cases which can be derived from the age group dataset. Below are few use cases. \n",
"For all the use cases below, please update the following fields in the below python code before executing\n",
"\n",
"1) uid - user id\n",
"2) pwd - password\n",
"3) SQL Query - the sql query which needs to be executed\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a105fc46-6386-4778-8636-751e11cc220b",
"metadata": {},
"outputs": [],
"source": [
"from sqlalchemy import create_engine\n",
"import pandas as pd\n",
"svr = 'essql1.walton.uark.edu'\n",
"uid = 'xxx'\n",
"pwd = 'xxx'\n",
"db = 'USZIPCODE'\n",
"# Create a connection using SQL Alchemy engine\n",
"engine = create_engine(f\"mssql+pyodbc://{uid}:{pwd}@{svr}/{db}?driver=SQL+Server+Native+Client+11.0&TrustServerCertificate=yes\", fast_executemany=True)\n",
"# Using pandas, leverage the SQL Alchemy engine that we created above\n",
"df = pd.read_sql('SQL Query', engine)\n",
"print(df)"
]
},
{
"cell_type": "markdown",
"id": "fc721edd-d998-4972-bdf6-fecc6083a159",
"metadata": {},
"source": [
"### Use case 1: \n",
"Find the total population of all age groups across all years and ZCTA5\n",
"```sql\n",
"select year, sum(total_population) as total_population \n",
"from [USZIPCODE].[dbo].[AGE] \n",
"group by year;\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "d626f8ad-67df-4baf-8a98-b6fe3fadf1e2",
"metadata": {},
"source": [
"### Use case 2:\n",
"Top five ZCTA5 codes where the population is highest\n",
"```sql\n",
"select top(5) zcta5,total_population,year \n",
"from [USZIPCODE].[dbo].[AGE] \n",
"where year=2020 \n",
"order by total_population desc;\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "293ffbfc-7ccb-41eb-90ae-b7ad0ebf7adb",
"metadata": {},
"source": [
"### Use case 3:\n",
"ZCTA5 with the highest teenage population for a given year\n",
"\n",
"```sql\n",
"select top(1) ZCTA5, total \n",
"from ( select ZCTA5,sum(age_10to14y + age_15to19y) as total \n",
" from [USZIPCODE].[dbo].[AGE]\n",
" where year=2020 \n",
" group by ZCTA5)\n",
"order by total desc;\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "190e030f-0185-4275-9a0b-3d5eaec05a02",
"metadata": {},
"source": [
"### Use case 4:\n",
"Population growth across years\n",
"\n",
"```sql\n",
"select year,total_population, total_population-LAG(total_population) OVER (ORDER BY year) AS population_diff \n",
"from ( select year, sum(total_population) as total_population \n",
" from [USZIPCODE].[dbo].[AGE] \n",
" group by year);\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.13 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
},
"vscode": {
"interpreter": {
"hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}