Web Based Data Load Tool
Vivekanand S. Rao
Oral Defence Date:
Tuesday, August 3, 2010 - 14:10
Professor Murphy, and Professor Petkovic
Migrating high volumes of data from old systems to new systems is a challenging process. Data migration is often carried out in diverse environments in which there are syntactic and semantic differences between the source and target data repositories, which make this process even more complex. Manually entering large volumes of data into a new data repository can be both time consuming and error-prone. These problems motivate the development of the web based data load tool presented in this report. This application automates migration of data stored in one of the most commonly used applications for computer manipulation of data, Microsoft Excel spreadsheet, into a standard relational database management system, such as PostgreSQL. Use of this tool will partially free up human resources from the monotonous job of data entry when migrating from an existing Spreadsheet based application to one based on Relational Database technology. We assume that both the spreadsheet and the relational schema correspond to a conceptual schema that is represented by an Entity-Relationship (ER) schema diagram with full integrity constraints. The web based data load tool effectively extracts data from the Excel spreadsheets and loads the corresponding data into a PostgreSQL database. The user manually enters the correspondence between spreadsheet columns and table attributes and the application checks for additional spreadsheet data that must be uploaded to satisfy primary and foreign keys in the database tables. Users of the web based data load tool can also make use of an update feature to carry out an update on existing or missing data in database tables that have been partially populated with spreadsheet data. A prototype implementation supporting a graphic user interface and full integration with Microsoft Excel and PostgreSQL is fully functional and executes correctly when used with an ER schema containing two entities and one M:N relationship.
Data migration, Entitiy-Relationship schema, Data load tool, Apache POI
Vivekanand S. Rao