Abstract:
It is inherently difficult to study changes taking place in uncontrolled and boundless atmosphere. Traditionally, rules of Physics have been employed to forecast weather. Satellite/radar picture is normally correlated with „synoptic‟ weather attributes to predict significant weather phenomenon. This research is about finding the suitability of application of various Data Mining (DM) techniques for weather prediction in a hybrid fashion. Daily weather data for 14 years of Islamabad, stored as per standard of World Meteorological Organization, was obtained from Pakistan Meteorological Department (PakMet). Data cleansing was done in order to bring it to a level where DM techniques could be applied to it. Three parameters were targeted as output, i.e. Precipitation, Maximum and Minimum Temperatures. The DM techniques, which were used, are Artificial Neural Network (ANN), Clustering, Decision Tree (DT), Linear Regression (LR) and Memory (Case) Based Reasoning. A model to evaluate the DM techniques in a hybrid fashion is suggested. First DT was used to classify into „Rain‟ and „No Rain‟, and then „Rain‟ cases were used to do Quantitative Precipitation Forecast (QPF). Finally, all DM techniques were applied for simple time series analysis, which proved very useful for Min and Max Temperatures. Because of extremely high instances of zeros in precipitation data, it was difficult for DM techniques to predict QPF and also time series analysis of „Rain‟. Balancing of classes was done to reduce the skew-ness. Models made on balanced data set were more accurate than on unbalanced. Significant discoveries were made regarding precipitation, wind, relative humidity values and temperatures. It became apparent that no one technique can be stated as the best for all situations, but each one has varying significance for different situations. Results show that DM can be applied successfully to Meteorology and other related domains, such as Electricity Load Forecast.