My test data is Dated!
Contents
My test data is Dated!#
This example will increment all the datetime fields in the Hallux database using#
sqlalchemy to retrieve the data into a dataframe
a function that will increment a datetime value based on a specific number of years based on a specific calendar date, a day of the week, or holiday
pymssql to update the database using dynamic sql
The example starts by importing the required libraries and defining dictionaries#
from os import getenv
import pandas as pd
from datetime import *; from dateutil.relativedelta import *
import calendar
import pymssql
from sqlalchemy import create_engine
# Dictionary of date columns in Hallux with a list for the value (table, artificial key column, increment method)
Hallux_Dates = {
"Hire_Date" : ("Agent" , "Agent_Id" , "INCREMENT"),
"Release_Date" : ("Album" , "Album_Id" , "INCREMENT"),
"Formation_Date" : ("Band" , "Band_Id" , "INCREMENT"),
"Join_Date" : ("Band_Member" , "Member_Id" , "INCREMENT"),
"Begin_Date" : ("Contract" , "Contract_Id" , "INCREMENT"),
"End_Date" : ("Contract" , "Contract_Id" , "INCREMENT"),
"Order_Date" : ("Order_Header" , "Order_Id" , "INCREMENT"),
"Promise_Date" : ("Order_Header" , "Order_Id" , "INCREMENT"),
"Birthdate" : ("Customer_Profile","Profile_Id" , "INCREMENT"),
"Performance_Date": ("Performance" ,"Performance_Id", "DAYOFWEEK")
}
# holiday dictionary with a list for the value (month, day of week, earliest day, latest day)
Holidays = {
"New Year's Day" : (1,-1,1,1),
"Martin Luther King Day" : (1,0,15,21),
"Washington's Birthday" : (5,0,15,21),
"Memorial Day" : (5,0,25,31),
"Juneteenth National Independence Day" : (6,-1,19,19),
"Independence Day" : (7,-1,4,4),
"Labor Day" : (9,-1,1,7),
"Columbus Day" : (10,0,8,14),
"Halloween" : (10 ,-1,31,31),
"Veteran's Day" : (11,-1,11,11),
"Thanksgiving" : (11,3,22,28),
"Christmas Eve" : (12,-1,24,24),
"Christmas Day" : (12,-1,25,25),
"New Year’s Eve" : (12,-1,31,31)
}
The is_holiday function will determine if a specified date is a holiday#
The trick is that some holidays, like the 4th of July, are always on the same calendar date. But other holidays, like Thanksgiving, are on a specific weekday and week of the month.
This function can be improved by using the Holidays dictionary.
# determine if a date is considered a holiday
def is_holiday ( a_date ):
ret_val = 'No'
s_date = str(a_date)
s_date = s_date[0:10]
x_date = datetime.strptime(s_date, '%Y-%m-%d')
my_month = x_date.month
my_day = x_date.day
my_weekday = x_date.weekday()
# Exercise: change this logic to loop thru the Holidays dictionary
if my_month == 1 and my_day == 1 :
ret_val = "New Year's Day"
# (3rd Monday in January)
if my_month == 1 and my_weekday == 0 and my_day >= 15 and my_day <= 21 :
ret_val = "Martin Luther King Day"
# (3rd Monday in February)
if my_month == 2 and my_weekday == 0 and my_day >= 15 and my_day <= 21 :
ret_val = "Washington's Birthday"
# (Last Monday in May)
if my_month == 5 and my_weekday == 0 and my_day >= 25 and my_day <= 31 :
ret_val = "Memorial Day"
if my_month == 6 and my_day == 19 :
ret_val = "Juneteenth National Independence Day"
if my_month == 7 and my_day == 4 :
ret_val = "Independence Day"
# (First Monday in September)
if my_month == 9 and my_weekday == 0 and my_day >= 1 and my_day <= 7 :
ret_val = "Labor Day"
# (Second Monday in October)
if my_month == 10 and my_weekday == 0 and my_day >= 8 and my_day <= 14 :
ret_val = "Columbus Day"
if my_month == 10 and my_day == 31 :
ret_val = "Halloween"
if my_month == 11 and my_day == 11 :
ret_val = "Veteran's Day"
# (Fourth Thursday in October)
if my_month == 11 and my_weekday == 3 and my_day >= 22 and my_day <= 28 :
ret_val = "Thanksgiving"
if my_month == 12 :
if my_day == 24 :
ret_val = "Christmas Eve"
if my_day == 25 :
ret_val = "Christmas Day"
if my_day == 31 :
ret_val = "New Year’s Eve"
return ret_val
The increment_date function takes a table and column name and increments the field by the specified number of years.#
This function can update the field by calendar date (INCREMENT) or by the day of the week (DAYOFWEEK).
There are three possible outputs:
PREVIEW - Display the current date and the new incremented date
SQL - Generate the sql that would update the database
UPDATE - update the database directly using dynamic sql
The idea behind the DAYOFWEEK option is that when incrementing performance dates, a concert that was held on a Saturday should still be performed on a Saturday when the date is incremented. The exception is when a performance is on a fixed holiday like the 4th of July or New Year’s Eve.
Note: To support sales data the time portion of a datetime column is kept the same as the original date.
# Retrieve a datetime field from the specified table and column.
# increment the date by the specified number of years
# return the specified output:
# the old and new date (PREVIEW)
# the sql to update the dates (SQL)
# or actually update the dates (UPDATE)
def increment_date (a_Table, a_Id_Col, a_Date_Col, a_years, a_method, a_output):
ret_val = 0
# Rows to be updated
col=[a_Id_Col,'Curr_Date','New_Date']
sql_select = 'SELECT ' + a_Id_Col + ', ' + a_Date_Col + ' as Curr_Date, convert(date,' + a_Date_Col + ') as New_Date FROM ' + a_Table + ' where ' + a_Date_Col + ' is not null' + ' order by ' + a_Id_Col
df_date = pd.read_sql(sql_select, eng)
df_date.set_index(a_Id_Col)
# add specified number of years
for i in range(0,a_years):
df_date['New_Date'] = df_date['New_Date']+relativedelta(years=+1)
if a_method == 'DAYOFWEEK':
for i in range(len(df_date)):
#move to the closest day with the same
cdate = df_date['Curr_Date'][i]
ndate = df_date['New_Date'][i]
# .weekday() 0 = Monday, 6 is Sunday
# note that for payroll and other systems Sunday is considered the first day of the week not monday
day_offset = cdate.weekday() - ndate.weekday()
# Handle holidays, a concert on a saturday is not the same as a concert on the 4th of July
special_day = is_holiday (cdate)
if special_day in Holidays:
Holiday_details = Holidays[special_day]
# only adjust a floating holiday
if Holiday_details[1] != -1 :
ndate = ndate + pd.DateOffset(days=day_offset)
# floating holidays can miss by 1 week
if ndate.day < Holiday_details[2] :
ndate = ndate + pd.DateOffset(days=7)
else:
ndate = ndate + pd.DateOffset(days=day_offset)
df_date.at[i,'New_Date'] = ndate
# This procedure will show the data frame, output the sql statements, or update the table
if a_output == 'PREVIEW' :
print(df_date)
else :
# generate the sql to update the table with the new date
for i in range(len(df_date)):
did = df_date[a_Id_Col][i]
cdate = df_date['Curr_Date'][i]
# keep same time as the original date
ndate = str(df_date['New_Date'][i]) + ' ' + str(cdate.time())
sql_stm = "Update " + a_Table + " set " + a_Date_Col + " = '" + str(ndate) + "' where " + a_Id_Col + " = " + str(did)
if a_output == 'SQL' :
print(cdate,sql_stm)
if a_output == 'UPDATE' :
cursor.execute(sql_stm)
if a_output == 'UPDATE' :
conn.commit()
if ret_val < 0:
print(a_Table,'Failure')
else:
print(a_Table,str(len(df_date)) + ' rows updated')
return ret_val
Here we go…#
Create a database engine and a separate connection for dynamic sql
Specify the number of years to increment the dates and the desired output
Call the increment_date function for the desired table, column combinations
Fun Fact: If the dates are incremented in multiples of 4, leap year data will be maintained!
This function can be improved by adding user prompts for year and output.
Note: Dataframes perform updates by doing a separate delete and insert. Depending on the constraints defined in the database, the delete may fail. For this reason, dynamic sql was used to update the database with dynamic update statements.
# main
hallux_svr = getenv("halluxsvr")
hallux_usr = getenv("halluxusr")
hallux_psd = getenv("halluxpsd")
hallux_db = getenv("halluxdb")
# for use in creating a dataframe
conn_string = f"mssql+pyodbc://{hallux_usr}:{hallux_psd}@{hallux_svr}/{hallux_db}?driver=SQL+Server+Native+Client+11.0&TrustServerCertificate=yes"
eng = create_engine(conn_string, fast_executemany=True)
# used for executing dynamic sql
conn = pymssql.connect(hallux_svr, hallux_usr, hallux_psd, hallux_db)
cursor = conn.cursor()
# number of years to add to the dates
# Note that by adding a number years that is divisible by 4, leap years dates are maintained.
num_years = 8
# specify the output: PREVIEW, SQL, or UPDATE
#Preview shows the new dates, sql generates the update statements, update actually updates the dates in the database
the_output = 'PREVIEW'
# increment the dates
for curr_hallux_date, curr_value in Hallux_Dates.items():
ret = increment_date (curr_value[0], curr_value[1], curr_hallux_date, num_years, curr_value[2], the_output )
print(curr_hallux_date, ret)
conn.close()
eng.dispose()
Agent_Id Curr_Date New_Date
0 200001 2013-01-30 2021-01-30
1 200002 2018-08-21 2026-08-21
2 200003 2017-08-09 2025-08-09
3 200004 2022-12-01 2030-12-01
4 200005 2016-02-11 2024-02-11
5 200006 2015-04-22 2023-04-22
6 200007 2019-01-15 2027-01-15
7 200008 2021-09-26 2029-09-26
8 200009 2017-07-20 2025-07-20
9 200010 2014-02-12 2022-02-12
10 200011 2022-07-21 2030-07-21
11 200012 2013-01-01 2021-01-01
12 200013 2018-12-26 2026-12-26
13 200014 2015-07-21 2023-07-21
14 200015 2018-06-27 2026-06-27
15 200016 2013-01-01 2021-01-01
16 200017 2013-08-01 2021-08-01
17 200018 2021-03-19 2029-03-19
18 200019 2020-03-07 2028-03-07
19 200020 2014-01-30 2022-01-30
20 200021 2015-10-23 2023-10-23
21 200022 2021-08-16 2029-08-16
22 200023 2017-06-09 2025-06-09
23 200024 2018-07-19 2026-07-19
24 200025 2013-01-01 2021-01-01
25 200026 2018-01-14 2026-01-14
26 200027 2018-11-15 2026-11-15
27 200028 2014-06-22 2022-06-22
28 200029 2020-01-07 2028-01-07
29 200030 2013-07-23 2021-07-23
30 200031 2015-07-18 2023-07-18
31 200032 2015-11-27 2023-11-27
32 200033 2014-08-14 2022-08-14
33 200034 2018-02-23 2026-02-23
34 200035 2018-03-09 2026-03-09
35 200036 2019-06-01 2027-06-01
36 200037 2016-04-23 2024-04-23
37 200038 2016-11-04 2024-11-04
38 200039 2013-01-01 2021-01-01
39 200040 2013-11-30 2021-11-30
40 200041 2015-08-23 2023-08-23
41 200042 2014-05-12 2022-05-12
Hire_Date 0
Album_Id Curr_Date New_Date
0 20000 2007-01-04 2015-01-04
1 20001 2008-02-26 2016-02-26
2 20002 2009-07-24 2017-07-24
3 20003 2011-06-06 2019-06-06
4 20004 2012-12-04 2020-12-04
... ... ... ...
2636 27970 2020-12-13 2028-12-13
2637 27971 2022-02-22 2030-02-22
2638 27980 2019-01-18 2027-01-18
2639 27981 2020-02-27 2028-02-27
2640 27982 2021-08-17 2029-08-17
[2641 rows x 3 columns]
Release_Date 0
Band_Id Curr_Date New_Date
0 1000 2007-01-04 2015-01-04
1 1001 2019-04-24 2027-04-24
2 1002 2007-08-19 2015-08-19
3 1003 2021-10-06 2029-10-06
4 1004 2019-08-17 2027-08-17
.. ... ... ...
395 1395 2006-01-01 2014-01-01
396 1396 2016-12-08 2024-12-08
397 1397 2006-04-23 2014-04-23
398 1398 2006-01-01 2014-01-01
399 1399 2019-01-18 2027-01-18
[400 rows x 3 columns]
Formation_Date 0
Member_Id Curr_Date New_Date
0 110000 2007-01-04 2015-01-04
1 110001 2007-01-04 2015-01-04
2 110010 2019-04-24 2027-04-24
3 110011 2022-06-13 2030-06-13
4 110012 2022-04-20 2030-04-20
... ... ... ...
1410 113991 2019-01-18 2027-01-18
1411 113992 2019-01-18 2027-01-18
1412 113993 2019-01-18 2027-01-18
1413 113994 2019-01-18 2027-01-18
1414 113995 2019-01-18 2027-01-18
[1415 rows x 3 columns]
Join_Date 0
Contract_Id Curr_Date New_Date
0 10110 2010-09-23 2018-09-23
1 10130 2018-05-13 2026-05-13
2 10150 2014-05-10 2022-05-10
3 10170 2010-06-26 2018-06-26
4 10230 2022-04-15 2030-04-15
.. ... ... ...
134 13900 2021-09-16 2029-09-16
135 13910 2023-02-13 2031-02-13
136 13940 2022-05-17 2030-05-17
137 13950 2006-12-31 2014-12-31
138 13980 2006-06-14 2014-06-14
[139 rows x 3 columns]
Begin_Date 0
Contract_Id Curr_Date New_Date
0 10110 2026-01-01 2034-01-01
1 10130 2026-01-01 2034-01-01
2 10150 2026-01-01 2034-01-01
3 10170 2026-01-01 2034-01-01
4 10230 2026-01-01 2034-01-01
.. ... ... ...
134 13900 2026-01-01 2034-01-01
135 13910 2026-01-01 2034-01-01
136 13940 2026-01-01 2034-01-01
137 13950 2026-01-01 2034-01-01
138 13980 2026-01-01 2034-01-01
[139 rows x 3 columns]
End_Date 0
Order_Id Curr_Date New_Date
0 1000001 2006-01-01 13:52:28 2014-01-01
1 1000002 2006-01-01 02:19:03 2014-01-01
2 1000003 2006-01-01 11:30:38 2014-01-01
3 1000004 2006-01-01 15:27:33 2014-01-01
4 1000005 2006-01-01 04:47:53 2014-01-01
... ... ... ...
184121 1620849 2022-12-31 04:44:44 2030-12-31
184122 1620850 2022-12-31 03:19:24 2030-12-31
184123 1620851 2022-12-31 17:09:23 2030-12-31
184124 1620852 2022-12-31 00:54:38 2030-12-31
184125 1620853 2022-12-31 20:34:23 2030-12-31
[184126 rows x 3 columns]
Order_Date 0
Order_Id Curr_Date New_Date
0 1000001 2006-01-10 2014-01-10
1 1000002 2006-01-09 2014-01-09
2 1000003 2006-01-07 2014-01-07
3 1000004 2006-01-07 2014-01-07
4 1000005 2006-01-08 2014-01-08
... ... ... ...
184121 1620849 2023-01-06 2031-01-06
184122 1620850 2023-01-08 2031-01-08
184123 1620851 2023-01-05 2031-01-05
184124 1620852 2023-01-06 2031-01-06
184125 1620853 2023-01-06 2031-01-06
[184126 rows x 3 columns]
Promise_Date 0
Profile_Id Curr_Date New_Date
0 100002 1989-11-17 1997-11-17
1 100004 1989-11-06 1997-11-06
2 100006 1992-04-28 2000-04-28
3 100008 2013-11-13 2021-11-13
4 100012 1997-05-19 2005-05-19
... ... ... ...
4491 109992 1999-09-10 2007-09-10
4492 109994 1993-09-28 2001-09-28
4493 109996 1995-11-29 2003-11-29
4494 109998 1995-12-11 2003-12-11
4495 110000 1993-06-06 2001-06-06
[4496 rows x 3 columns]
Birthdate 0
Performance_Id Curr_Date New_Date
0 2000000 1999-01-16 2007-01-20 00:00:00
1 2000001 1999-01-17 2007-01-21 00:00:00
2 2000002 1999-01-22 2007-01-26 00:00:00
3 2000003 1999-01-26 2007-01-23 00:00:00
4 2000004 1999-02-07 2007-02-11 00:00:00
... ... ... ...
65435 2798221 2014-01-15 2022-01-12 00:00:00
65436 2798222 2014-01-23 2022-01-20 00:00:00
65437 2798223 2014-01-26 2022-01-30 00:00:00
65438 2798224 2014-02-05 2022-02-02 00:00:00
65439 2798225 2014-02-11 2022-02-08 00:00:00
[65440 rows x 3 columns]
Performance_Date 0