Data Mining Access Patterns (DMAP)
Julie Erdman
May 2005

Data mining has become an extremely important tool for finding useful information in massive datasets.  Data mining is important in several facets of data management especially in handling data from databases and from the Web.  There are different methodologies and classifications used for data mining databases versus data mining the Web.

This thesis project focuses on Data Mining the Web.  Specifically, the project deals with mining web log data in order to gain insight into user access patterns of a website.  Web logs contain data about accesses to files on a web server.  Analysis of this data can be used to find patterns and trends of website usage.  The Data Mining Access Patterns (DMAP) application, developed as part of this thesis project, is a data mining tool that processes web log data and provides results about user access patterns.


Data Mining:  An Anomaly to the Detection of Web-based Attacks
David Grizzanti
May 2005

Data Mining has become a significant tool in finding useful information in massive datasets.  Data Mining, in general, can apply to many areas of data management, specifically handling data from databases and the Web.  In particular, web server logs contain an enormous amount of potential information. 

Web-based attacks represent a substantial share of the total security exposures throughout computer networks.  To detect known attacks, computer networks are equipped with misuse intrusion detection systems with a large signature base.  Signature-based detection systems equip a set of pre-determined definitions, which make it possible to detect known attacks.  However, it is difficult to keep up to date on all web-based vulnerabilities and installation specific vulnerabilities may be introduced by web applications.  Therefore, to be better able to detect all types of attacks, misuse detection systems should be complemented with an anomaly detection system.  Anomaly-based systems make it possible to apply a learning-based approach to detecting attacks rather than using a pre-defined set of definitions in the detection process.

This thesis project focuses on data mining web log data and introduces an intrusion detection system that uses an anomaly-based approach to detecting attacks against web servers and web-based applications.  The system analyzes queries in the logs files that call server-side programs and uses a number of different models for analyzing and categorizing different features of these queries.  Examples of such features include access patterns to server-side programs and values of parameters and their attributes within each query.  The use of an application-specific classification of the query parameters allows the system to focus its analysis and produce less false positives.


Web Services
Suraj Cheruvathoor
November 2005

This thesis is based on getting an understanding of how web-services work using real-world example.  Using this example, the paper will examine the core functionalities of web-services and compare the individual functionalities with the different options that are available versus what the company did choose for the project.  The various technologies that were used to make the project work will also be discussed along with all technology limitations that were a part of the project.  The paper will also constitute a survey of the state of the art in Web Services technology.


Design Pattern Code Generator
Patricia Kraker
May 2005

Professional software developers have five goals when designing their software.  Their design should enable software to be reusable, flexible, and maintainable.  All three of these stress the importance of limiting the chore of rewriting code.  Reusability enables programmers to move on to other tasks because code that they need has already been written and can be incorporated in their program.  Because changes are likely, software flexibility is vital so software does not have to be redesigned.  Maintainability allows a developer to add changes and features with minimal effort.  Robustness and correctness are two other goals of software developers.  Robust code has the ability to handle unusual situations such as bad data, user errors, programming errors and environmental conditions.  In addition, correctness indicates that every program must do what it is intended to do.  Good software design enables these goals to be realized.  Design patterns enable good design to be realized [Braude, 2004].

Design patterns denote an idea.  This idea can be expressed as a way to organize classes accompanied by algorithms that perform the pattern’s basic operations [Braude, 2004].  Many times in creating software, a common situation or problem occurs.  Patterns are developed from the solutions to these common problems.  These solutions, or patterns, were designed in such a way that they help in achieving the five goals of software development.

The organization of classes in the patterns plays a major role in their fulfillment of reusability, flexibility, maintainability, robustness and correctness.  Each class in a pattern exists for a specific purpose.  Design patterns arrange objects in such a way that they can communicate with one another with minimal dependency upon the implementation of the other objects.  Objects created in this way are self-contained.  Allowing objects to be self-contained enables them to be used in another application that may need the functionality that this object independently provides.  Because these objects do not rely on the implementation of others in order to work, flexibility and maintainability are enabled.  As long as objects can still communicate effectively, one object’s implementation can be changed or additions can be made and this will not affect the functionality of the other objects.  Robustness and correctness are two of the more obvious goals in everyday programming.  Design patterns fulfill these two goals in that they are proven, working solutions.  Their advantages and disadvantages are documented.

Patterns gained popularity and became more widely used with the publication of the book called Design Patterns:  Elements of Reusable Object-Oriented Software.  [Gamma, 1995].  Developers who are avid users of design patterns currently use this book as their main resource.  This book is really the only source for an in-depth discussion of each pattern, situations in which the pattern should be used, and how to use the pattern.  Information and sample code can be found on websites devoted to design patterns.  There are also sites that host tutorials for using the patterns.

In order to know and use patterns effectively, it takes time to become familiar with them and to understand their full use and power.  For those who are beginners to using patterns, multiple examples are the key to understanding them.  Diagrams that show the relationships of objects in pattern are also a necessity.  Developers who already know this information want to see sample code in multiple programming languages that illustrate how to implement a pattern.  The problem that currently exists is that all of this information cannot be found in one place.  Also, current development environments do not support the use of patterns by providing sample code templates.  Obviously, as new patterns are discovered, the book by Gamma et al.. becomes outdated.  The language of the examples in the book is already outdated, as the samples are coded in Smalltalk.

This project is to design a system called Design Pattern Code Generator (abbreviated DPCG) that integrates all of these features into one system.  Here, beginners can find descriptions of the patterns and class diagrams that explain object relationships.  Multiple examples, including those that describe a real-life situation are found in this system.  Patterns are divided into different types based on their functionality.  For a better understanding of the patterns and how they are categorized, each type is explained.  Developers can find code samples in multiple languages.  With an advanced feature of DPCG, a plugin that contains this information can be downloaded and used from within a developer’s IDE (Integrated Development Environment).  Professionals that discover new patterns can add them to this system along with all of the descriptions, diagrams, examples, and code.  This offers a way to keep the system and information provided up-to-date.  There is no longer a need to search in multiple places when it can all be found in DPCG.  This saves the valuable time people spend looking for information that can now be spent learning the information.