My test data is Dated!#

This example will increment all the datetime fields in the Hallux database using#

  • sqlalchemy to retrieve the data into a dataframe

  • a function that will increment a datetime value based on a specific number of years based on a specific calendar date, a day of the week, or holiday

  • pymssql to update the database using dynamic sql

The example starts by importing the required libraries and defining dictionaries#

from os import getenv
import pandas as pd
from datetime import *; from dateutil.relativedelta import *
import calendar

import pymssql
from sqlalchemy import create_engine



# Dictionary of date columns in Hallux with a list for the value (table, artificial key column, increment method) 

Hallux_Dates = {
    
    "Hire_Date"       : ("Agent"         , "Agent_Id"     , "INCREMENT"),
    "Release_Date"    : ("Album"         , "Album_Id"     , "INCREMENT"),
    "Formation_Date"  : ("Band"          , "Band_Id"      , "INCREMENT"),
    "Join_Date"       : ("Band_Member"   , "Member_Id"    , "INCREMENT"),
    "Begin_Date"      : ("Contract"      , "Contract_Id"  , "INCREMENT"),
    "End_Date"        : ("Contract"      , "Contract_Id"  , "INCREMENT"),
    "Order_Date"      : ("Order_Header"  , "Order_Id"     , "INCREMENT"),
    "Promise_Date"    : ("Order_Header"  , "Order_Id"     , "INCREMENT"),
    "Birthdate"       : ("Customer_Profile","Profile_Id"  , "INCREMENT"),
    "Performance_Date": ("Performance"   ,"Performance_Id", "DAYOFWEEK")
}



# holiday dictionary with a list for the value  (month, day of week, earliest day, latest day) 

Holidays = {
    "New Year's Day" : (1,-1,1,1), 
    "Martin Luther King Day" : (1,0,15,21),
    "Washington's Birthday" : (5,0,15,21),
    "Memorial Day" : (5,0,25,31),
    "Juneteenth National Independence Day" : (6,-1,19,19),
    "Independence Day" : (7,-1,4,4),
    "Labor Day" : (9,-1,1,7),
    "Columbus Day" : (10,0,8,14),
    "Halloween" : (10 ,-1,31,31),
    "Veteran's Day" : (11,-1,11,11),
    "Thanksgiving" : (11,3,22,28),
    "Christmas Eve" : (12,-1,24,24),
    "Christmas Day" : (12,-1,25,25),
    "New Year’s Eve" : (12,-1,31,31)
}

The is_holiday function will determine if a specified date is a holiday#

The trick is that some holidays, like the 4th of July, are always on the same calendar date. But other holidays, like Thanksgiving, are on a specific weekday and week of the month.

This function can be improved by using the Holidays dictionary.

# determine if a date is considered a holiday

def is_holiday ( a_date ):   

    
    ret_val = 'No'

    s_date = str(a_date)
    s_date = s_date[0:10]
    x_date = datetime.strptime(s_date, '%Y-%m-%d')
    
    my_month     = x_date.month
    my_day       = x_date.day
    my_weekday   = x_date.weekday()
    
    
    # Exercise: change this logic to loop thru the Holidays dictionary
    
    
    if my_month == 1 and my_day == 1 :
        ret_val = "New Year's Day"
        
    # (3rd Monday in January)
    if my_month == 1 and my_weekday == 0 and my_day >= 15 and my_day <= 21 :
        ret_val = "Martin Luther King Day"
    
    # (3rd Monday in February)
    if my_month == 2 and my_weekday == 0 and my_day >= 15 and my_day <= 21 :
        ret_val = "Washington's Birthday"
        
    # (Last Monday in May)
    if my_month == 5 and my_weekday == 0 and my_day >= 25 and my_day <= 31 :
        ret_val = "Memorial Day"
    
    if my_month == 6 and my_day == 19 :
        ret_val = "Juneteenth National Independence Day"
        
    if my_month == 7 and my_day == 4 :
        ret_val = "Independence Day"
   
    # (First Monday in September)
    if my_month == 9 and my_weekday == 0 and my_day >= 1 and my_day <= 7 :
        ret_val = "Labor Day"
        
    # (Second Monday in October)
    if my_month == 10 and my_weekday == 0 and my_day >= 8 and my_day <= 14 :
        ret_val = "Columbus Day"
        
    if my_month == 10 and my_day == 31 :
        ret_val = "Halloween"
        
    if my_month == 11 and my_day == 11 :
        ret_val = "Veteran's Day"
        
    # (Fourth Thursday in October)
    if my_month == 11 and my_weekday == 3 and my_day >= 22 and my_day <= 28 :
        ret_val = "Thanksgiving"
        
    if my_month == 12 :
        
        if my_day == 24 : 
            ret_val = "Christmas Eve"
        if my_day == 25 : 
            ret_val = "Christmas Day"
        if my_day == 31 : 
            ret_val = "New Year’s Eve"
    
    return ret_val

The increment_date function takes a table and column name and increments the field by the specified number of years.#

This function can update the field by calendar date (INCREMENT) or by the day of the week (DAYOFWEEK).

There are three possible outputs:

  • PREVIEW - Display the current date and the new incremented date

  • SQL - Generate the sql that would update the database

  • UPDATE - update the database directly using dynamic sql

The idea behind the DAYOFWEEK option is that when incrementing performance dates, a concert that was held on a Saturday should still be performed on a Saturday when the date is incremented. The exception is when a performance is on a fixed holiday like the 4th of July or New Year’s Eve.

Note: To support sales data the time portion of a datetime column is kept the same as the original date.

# Retrieve a datetime field from the specified table and column.
# increment the date by the specified number of years
# return the specified output: 
#  the old and new date (PREVIEW)
#  the sql to update the dates (SQL)
#  or actually update the dates (UPDATE)

def increment_date (a_Table, a_Id_Col, a_Date_Col, a_years, a_method, a_output):
    
    ret_val = 0
    
    # Rows to be updated    
    col=[a_Id_Col,'Curr_Date','New_Date']
          
    sql_select = 'SELECT ' + a_Id_Col + ', ' + a_Date_Col + ' as Curr_Date, convert(date,' + a_Date_Col + ') as New_Date FROM ' + a_Table + ' where ' + a_Date_Col + ' is not null' + ' order by ' + a_Id_Col      
    
    df_date = pd.read_sql(sql_select, eng)
    df_date.set_index(a_Id_Col)

    # add specified number of years 
    for i in range(0,a_years):
        df_date['New_Date'] = df_date['New_Date']+relativedelta(years=+1)

    if a_method == 'DAYOFWEEK':
        
        for i in range(len(df_date)):

            #move to the closest day with the same 

            cdate = df_date['Curr_Date'][i]
            ndate = df_date['New_Date'][i]
            
            # .weekday() 0 = Monday, 6 is Sunday
            # note that for payroll and other systems Sunday is considered the first day of the week not monday
            day_offset = cdate.weekday() - ndate.weekday()
            
            # Handle holidays, a concert on a saturday is not the same as a concert on the 4th of July
            
            special_day = is_holiday (cdate)
            
            if special_day in Holidays:
            
                Holiday_details = Holidays[special_day]

                # only adjust a floating holiday
                if Holiday_details[1] != -1 :
                    ndate = ndate + pd.DateOffset(days=day_offset)
                    # floating holidays can miss by 1 week    
                    if ndate.day < Holiday_details[2] :
                        ndate = ndate + pd.DateOffset(days=7)
                    
            else:    
                
                ndate = ndate + pd.DateOffset(days=day_offset)
       
            df_date.at[i,'New_Date'] = ndate


        
        
    # This procedure will show the data frame, output the sql statements, or update the table
    
    if a_output == 'PREVIEW' :
        
        print(df_date)
        
    else :    

        # generate the sql to update the table with the new date

        for i in range(len(df_date)):
        
            did   = df_date[a_Id_Col][i]
            cdate = df_date['Curr_Date'][i]
            # keep same time as the original date
            ndate = str(df_date['New_Date'][i]) + ' ' + str(cdate.time())
            sql_stm  = "Update " + a_Table + " set " + a_Date_Col + " = '" + str(ndate) + "' where " + a_Id_Col + " = " + str(did)
        
            if a_output == 'SQL' :
                print(cdate,sql_stm)           
            
            if a_output == 'UPDATE' :
                cursor.execute(sql_stm)    

        if a_output == 'UPDATE' :
        
            conn.commit()
        
        if ret_val < 0:
            print(a_Table,'Failure')
        else:        
            print(a_Table,str(len(df_date)) + ' rows updated')
        
    
    return ret_val

Here we go…#

  • Create a database engine and a separate connection for dynamic sql

  • Specify the number of years to increment the dates and the desired output

  • Call the increment_date function for the desired table, column combinations

Fun Fact: If the dates are incremented in multiples of 4, leap year data will be maintained!

This function can be improved by adding user prompts for year and output.

Note: Dataframes perform updates by doing a separate delete and insert. Depending on the constraints defined in the database, the delete may fail. For this reason, dynamic sql was used to update the database with dynamic update statements.

# main

hallux_svr = getenv("halluxsvr")
hallux_usr = getenv("halluxusr")
hallux_psd = getenv("halluxpsd")
hallux_db  = getenv("halluxdb")


# for use in creating a dataframe
conn_string = f"mssql+pyodbc://{hallux_usr}:{hallux_psd}@{hallux_svr}/{hallux_db}?driver=SQL+Server+Native+Client+11.0&TrustServerCertificate=yes"
eng = create_engine(conn_string, fast_executemany=True)

# used for executing dynamic sql
conn   = pymssql.connect(hallux_svr, hallux_usr, hallux_psd, hallux_db)
cursor = conn.cursor()

# number of years to add to the dates
# Note that by adding a number years that is divisible by 4, leap years dates are maintained.
num_years = 8

# specify the output: PREVIEW, SQL, or UPDATE
#Preview shows the new dates, sql generates the update statements, update actually updates the dates in the database
the_output = 'PREVIEW'

# increment the dates
for curr_hallux_date, curr_value in Hallux_Dates.items():
    ret = increment_date (curr_value[0], curr_value[1], curr_hallux_date, num_years, curr_value[2], the_output )
    print(curr_hallux_date, ret)

conn.close()
eng.dispose()
    Agent_Id  Curr_Date    New_Date
0     200001 2013-01-30  2021-01-30
1     200002 2018-08-21  2026-08-21
2     200003 2017-08-09  2025-08-09
3     200004 2022-12-01  2030-12-01
4     200005 2016-02-11  2024-02-11
5     200006 2015-04-22  2023-04-22
6     200007 2019-01-15  2027-01-15
7     200008 2021-09-26  2029-09-26
8     200009 2017-07-20  2025-07-20
9     200010 2014-02-12  2022-02-12
10    200011 2022-07-21  2030-07-21
11    200012 2013-01-01  2021-01-01
12    200013 2018-12-26  2026-12-26
13    200014 2015-07-21  2023-07-21
14    200015 2018-06-27  2026-06-27
15    200016 2013-01-01  2021-01-01
16    200017 2013-08-01  2021-08-01
17    200018 2021-03-19  2029-03-19
18    200019 2020-03-07  2028-03-07
19    200020 2014-01-30  2022-01-30
20    200021 2015-10-23  2023-10-23
21    200022 2021-08-16  2029-08-16
22    200023 2017-06-09  2025-06-09
23    200024 2018-07-19  2026-07-19
24    200025 2013-01-01  2021-01-01
25    200026 2018-01-14  2026-01-14
26    200027 2018-11-15  2026-11-15
27    200028 2014-06-22  2022-06-22
28    200029 2020-01-07  2028-01-07
29    200030 2013-07-23  2021-07-23
30    200031 2015-07-18  2023-07-18
31    200032 2015-11-27  2023-11-27
32    200033 2014-08-14  2022-08-14
33    200034 2018-02-23  2026-02-23
34    200035 2018-03-09  2026-03-09
35    200036 2019-06-01  2027-06-01
36    200037 2016-04-23  2024-04-23
37    200038 2016-11-04  2024-11-04
38    200039 2013-01-01  2021-01-01
39    200040 2013-11-30  2021-11-30
40    200041 2015-08-23  2023-08-23
41    200042 2014-05-12  2022-05-12
Hire_Date 0
      Album_Id  Curr_Date    New_Date
0        20000 2007-01-04  2015-01-04
1        20001 2008-02-26  2016-02-26
2        20002 2009-07-24  2017-07-24
3        20003 2011-06-06  2019-06-06
4        20004 2012-12-04  2020-12-04
...        ...        ...         ...
2636     27970 2020-12-13  2028-12-13
2637     27971 2022-02-22  2030-02-22
2638     27980 2019-01-18  2027-01-18
2639     27981 2020-02-27  2028-02-27
2640     27982 2021-08-17  2029-08-17

[2641 rows x 3 columns]
Release_Date 0
     Band_Id  Curr_Date    New_Date
0       1000 2007-01-04  2015-01-04
1       1001 2019-04-24  2027-04-24
2       1002 2007-08-19  2015-08-19
3       1003 2021-10-06  2029-10-06
4       1004 2019-08-17  2027-08-17
..       ...        ...         ...
395     1395 2006-01-01  2014-01-01
396     1396 2016-12-08  2024-12-08
397     1397 2006-04-23  2014-04-23
398     1398 2006-01-01  2014-01-01
399     1399 2019-01-18  2027-01-18

[400 rows x 3 columns]
Formation_Date 0
      Member_Id  Curr_Date    New_Date
0        110000 2007-01-04  2015-01-04
1        110001 2007-01-04  2015-01-04
2        110010 2019-04-24  2027-04-24
3        110011 2022-06-13  2030-06-13
4        110012 2022-04-20  2030-04-20
...         ...        ...         ...
1410     113991 2019-01-18  2027-01-18
1411     113992 2019-01-18  2027-01-18
1412     113993 2019-01-18  2027-01-18
1413     113994 2019-01-18  2027-01-18
1414     113995 2019-01-18  2027-01-18

[1415 rows x 3 columns]
Join_Date 0
     Contract_Id  Curr_Date    New_Date
0          10110 2010-09-23  2018-09-23
1          10130 2018-05-13  2026-05-13
2          10150 2014-05-10  2022-05-10
3          10170 2010-06-26  2018-06-26
4          10230 2022-04-15  2030-04-15
..           ...        ...         ...
134        13900 2021-09-16  2029-09-16
135        13910 2023-02-13  2031-02-13
136        13940 2022-05-17  2030-05-17
137        13950 2006-12-31  2014-12-31
138        13980 2006-06-14  2014-06-14

[139 rows x 3 columns]
Begin_Date 0
     Contract_Id  Curr_Date    New_Date
0          10110 2026-01-01  2034-01-01
1          10130 2026-01-01  2034-01-01
2          10150 2026-01-01  2034-01-01
3          10170 2026-01-01  2034-01-01
4          10230 2026-01-01  2034-01-01
..           ...        ...         ...
134        13900 2026-01-01  2034-01-01
135        13910 2026-01-01  2034-01-01
136        13940 2026-01-01  2034-01-01
137        13950 2026-01-01  2034-01-01
138        13980 2026-01-01  2034-01-01

[139 rows x 3 columns]
End_Date 0
        Order_Id           Curr_Date    New_Date
0        1000001 2006-01-01 13:52:28  2014-01-01
1        1000002 2006-01-01 02:19:03  2014-01-01
2        1000003 2006-01-01 11:30:38  2014-01-01
3        1000004 2006-01-01 15:27:33  2014-01-01
4        1000005 2006-01-01 04:47:53  2014-01-01
...          ...                 ...         ...
184121   1620849 2022-12-31 04:44:44  2030-12-31
184122   1620850 2022-12-31 03:19:24  2030-12-31
184123   1620851 2022-12-31 17:09:23  2030-12-31
184124   1620852 2022-12-31 00:54:38  2030-12-31
184125   1620853 2022-12-31 20:34:23  2030-12-31

[184126 rows x 3 columns]
Order_Date 0
        Order_Id  Curr_Date    New_Date
0        1000001 2006-01-10  2014-01-10
1        1000002 2006-01-09  2014-01-09
2        1000003 2006-01-07  2014-01-07
3        1000004 2006-01-07  2014-01-07
4        1000005 2006-01-08  2014-01-08
...          ...        ...         ...
184121   1620849 2023-01-06  2031-01-06
184122   1620850 2023-01-08  2031-01-08
184123   1620851 2023-01-05  2031-01-05
184124   1620852 2023-01-06  2031-01-06
184125   1620853 2023-01-06  2031-01-06

[184126 rows x 3 columns]
Promise_Date 0
      Profile_Id  Curr_Date    New_Date
0         100002 1989-11-17  1997-11-17
1         100004 1989-11-06  1997-11-06
2         100006 1992-04-28  2000-04-28
3         100008 2013-11-13  2021-11-13
4         100012 1997-05-19  2005-05-19
...          ...        ...         ...
4491      109992 1999-09-10  2007-09-10
4492      109994 1993-09-28  2001-09-28
4493      109996 1995-11-29  2003-11-29
4494      109998 1995-12-11  2003-12-11
4495      110000 1993-06-06  2001-06-06

[4496 rows x 3 columns]
Birthdate 0
       Performance_Id  Curr_Date             New_Date
0             2000000 1999-01-16  2007-01-20 00:00:00
1             2000001 1999-01-17  2007-01-21 00:00:00
2             2000002 1999-01-22  2007-01-26 00:00:00
3             2000003 1999-01-26  2007-01-23 00:00:00
4             2000004 1999-02-07  2007-02-11 00:00:00
...               ...        ...                  ...
65435         2798221 2014-01-15  2022-01-12 00:00:00
65436         2798222 2014-01-23  2022-01-20 00:00:00
65437         2798223 2014-01-26  2022-01-30 00:00:00
65438         2798224 2014-02-05  2022-02-02 00:00:00
65439         2798225 2014-02-11  2022-02-08 00:00:00

[65440 rows x 3 columns]
Performance_Date 0