Refactoring Microsoft Access Databases With Python

development

Working with an inherited legacy application can feel like an insurmountable task. I found myself in this position at work where there is a sizable amount of code in legacy Microsoft Access databases. Access in particular can be difficult to work with as it doesn't make implementing software development best practices easy. In the wrong hands an Access database can quickly turn into a mess.

The way I've approached refactoring Access database is to first export all of the database objects (tables, queries, forms, reports, macros, and modules) to text files. I then use Python to search the exported files for object references. This makes the task of determining which objects can be safely removed or changed much more manageable.

Exporting Microsoft Access Objects

Enter the code below in a new or existing module in the chosen Access database, update exportLocation to the desired output directory, and run it to export all objects as text files.

Option Compare Database: Option Explicit

Public Sub ExportDatabaseObjects()
    On Error GoTo Err_ExportDatabaseObjects

    Dim db As DAO.Database
    Set db = CurrentDb()

    ' Make sure to include a closing back slash (ie: C:\Temp\).
    Dim exportLocation As String
    exportLocation = "Enter path to folder to store exported files"

    ' Export all tables.
    Dim td As TableDef
    Dim name As String
    For Each td In db.TableDefs
        Dim isValidTable As Boolean
        isValidTable = Left(td.name, 4) <> "MSys" And Left(td.name, 1) <> "~"

        If isValidTable Then
            DoCmd.TransferText _
                acExportDelim, , td.name, _
                exportLocation & "Table_" & td.name & ".txt", True
        End If
    Next td

    Set td = Nothing

    ' Export all forms.
    Dim c As Container
    Set c = db.Containers("Forms")
    Dim d As Document
    For Each d In c.Documents
        Application.SaveAsText _
            acForm, d.name, exportLocation & "Form_" & d.name & ".txt"
    Next d

    ' Export all reports.
    Set c = db.Containers("Reports")
    For Each d In c.Documents
        Application.SaveAsText _
            acReport, d.name, exportLocation & "Report_" & d.name & ".txt"
    Next d

    ' Export all macros.
    Set c = db.Containers("Scripts")
    For Each d In c.Documents
        Application.SaveAsText _
            acMacro, d.name, exportLocation & "Macro_" & d.name & ".txt"
    Next d

    ' Export all modules.
    Set c = db.Containers("Modules")
    For Each d In c.Documents
        Application.SaveAsText _
            acModule, d.name, exportLocation & "Module_" & d.name & ".txt"
    Next d

    Set c = Nothing
    Set d = Nothing

    ' Export all queries.
    Dim i As Integer
    For i = 0 To db.QueryDefs.Count - 1
        Application.SaveAsText _
            acQuery, db.QueryDefs(i).name, _
            exportLocation & "Query_" & db.QueryDefs(i).name & ".txt"
    Next i

    Set db = Nothing

    MsgBox "All database objects have been exported as a text file to " & _
           exportLocation, vbInformation, "Export complete"
    Exit Sub

Err_ExportDatabaseObjects:
    MsgBox Err.Number & " - " & Err.Description, vbCritical, "Error"
End Sub

Searching Exported Data With Python

The Python code below will search all exported files for references to a given object. In main() update directory to the location chosen for the Access export and update object_name to a particular object you are looking to remove or alter.

import codecs
import os

def foundinfile(openfile, term):
    contents = openfile.readlines()
    found = False

    for line in contents:
        if term.upper() in line.upper():
            found = True

    return found

def main():
    directory = "Enter path to exported files"
    object_name = "Enter object name"
    found = []

    for root, _, files in os.walk(directory):
        for f in files:
            path = os.path.join(root, f)

            # Open without decoding first.
            with open(path, "r") as openfile:
                if foundinfile(openfile, object_name):
                    found.append(f)

            # If nothing found then try decoding.
            if f not in found:
                with codecs.open(path, "r", "utf-16-le") as openfile:
                    if foundinfile(openfile, object_name):
                        found.append(f)

    print("Results for: {0}".format(object_name))
    for f in found:
        print(f)

if __name__ == "__main__":
    main()

The code above can be found on GitHub here.

View other posts