Research Article
Modeling and Visualization of Scattered Positive Data
Department of Computer Science and Engineering, UET Lahore, Pakistan
Abad Ali Shah
R and D Center of Computer Science, UET, Lahore, Pakistan
Data visualization is an important tool used for analysis and understanding of observed phenomena. Scattered data samples are encountered in many areas of business, science and engineering. Since common visualization methods require an underlying grid, it is required to approximate the reality at the same grid. To approximate the unknown reality at the required grid, we need to construct the representation of the sampled data. Many interpolation methods are available to construct an empirical model from the scattered data sets. Characteristics of the data to be visualized are important consideration while selecting the interpolant for construction of the empirical model. We know number of characteristics of the data to be explored like spatial distribution (1D, 2D, 3D), type of data (scalar, vector etc), data distribution (scattered or ordered data), coordinate system, scale, continuity etc.
In addition to these features our visualization must not contradict the known characteristics about data otherwise the visualization will not be trustable. For example one would not trust in the reality discovered using a visualization that produces physically erroneous results. The problem of visualization based on inherent characteristics of data exists in many application domains. Examples of few such features are positivity, monotonicity, convexity and bounds that are encountered in various scientific and engineering domains. These are usually known facts about the datasets to be explored. When we are presenting data in visual format the visualization must not contradict such known realities inherent to the data.
Study of many researchers has been reported in literature that preserve the above mentioned features of data on regular grid, but relatively less attention has been given to scattered data. Few researchers have investigated the visualization of the scattered data based on the inherent features of the data. We refer to the work of (Sarfraz, 2000; Fritsh and Carlson, 1980; Schmidt and Hess, 1988) to preserve monotonicity of data on regular grid and (Han and Schumaker, 1979) for scattered data. Only a few researchers have worked to preserve the convexity of data. We refer to the work of (Schmidt, 1990) for convex, monotone and positive data on regular grid.
Positivity of data is encountered in many situations in science, engineering and business. The significance of positivity lies in the fact that sometimes it does not make sense to talk of some quantity to be negative. For example, percentage mass concentrations in a chemical reaction, volume, absolute temperate and pressure and mass of something are meaningless when negative. Many researchers have also considered the problem of positivity. For related literature and background we refer to the study of (Nadler, 1992; Sarfraz et al., 2000; Schumaker, 1982; Schmidt and Hess, 1988; Brodlie and Butt, 1993).
For scattered data, the inverse distance weighted methods are considered better due to their efficiency, extendibility to higher dimensions, global in nature and ease in implementation. Modified Quadratic Shepard (MQS) method, being smooth with C1 continuity, is a commonly used method in the field of science and engineering such as geophysics, astronomy and meteorology. However, it has problem when it is used for visualization of positive data as it interpolates negative values for inherently positive data. In this study we will present a scheme namely fixed point scaling scheme to preserve positivity while modeling scattered datasets that are inherently positive.
SHEPARD METHOD
MQS is an inverse distance weighted method that is based on the approach introduced by Shepard (1968). Let a set of N non-negative data values, fi, at associated scattered sampling locations, Xi = (x1i, x2i, x3i,......), where i = 1, 2, ....., N, is given. Shepard interpolant is defined as follow:
(1) |
where
and
This interpolant provides one of the solutions to the problem of visualization of positive data sets. However it is not a suitable choice for many visualization applications. Few problems have been discussed in (Xiao et al., 1996) and (Xiao and Woodburry, 1999). For example F(X) is bounded between maximum and minimum in the data set. However the sampled values do not always contain all extremes of the quantity. The bound preservation property may also result in loss of information carried by the original sampled data in the subsequent steps of visualization process. This interpolant also has an unnecessary property that slop at each reference point is zero as shown in Fig. 1. Continuity of this interpolant is not acceptable for many applications. Also as it is a global method so it becomes inefficient for large data sets.
Modified quadratic Shepard method: Numbers of modifications have been suggested to overcome the drawbacks of the original Shepards method. The most interesting modification for data visualization perspective is due to (Frank and Neilson, 1980). They improved continuity of the method by replacing constant basis function, fi, by quadratic basis function, Qi(X), with the following characteristics:
Qi(Xi) = fi |
Qi(X) is inverse distance weighted least square fit to the other data points. This made the method a C1 continuous method.
Fig. 1: | Plot of data in Table 1 using Shepard method |
The resulting interpolating function, F(X), called Modified Quadratic Shepard method, is defined as follows:
(2) |
The basis functions Qi (X) is defined as follow:
(3) |
By definition Qi(Xi) = fi and is an inverse distance weighted least squares approximation to the other data points. The matrix, Ai, is Hessian matrix of the quadratic and is gradient vector. The modifications given above not only improve continuity of the interpolant but also eliminate the problem of missing data values with the original Shepard method.
To overcome the inefficiency due to global nature of Shepard method, modified weight functions suggested by (Frank and Neilson, 1980) and (Renika, 1988) are important. Weight function due to Frank and Neilson are defined as follows:
where
R is the radius within which the nodes take part for construction of Q(X) and calculation of weight function wi(X). Frank and Neilson (1980) suggested two fixed values for R i.e., Rq for construction of Q and Rw for calculation of weight.
where
Nq are the number of control points used to calculate Q(X), Nw are the number of control points used to calculate w(X) and N is total number of control points. D is maximum distance between two points in the data set. Suggested values for evenly distributed 2D data sets are Nw = 9 and Nq = 18. For sparse data or where data sets are small (N<25) considerable increase in Nq and Nw is required with Nq/Nw = 2.
Renika (1988) obtained improvement in accuracy using different criteria and values of Nq and Nw. These are the number of control points that impose significant effect for the reference point. For 2D data sets suggested values for good results are Nw = 19 and Nq = 13.
The modification given above also improves extrapolation capability of the method. The resulting interpolant called Modified Quadratic Shepard method is a suitable choice for many scientific visualization domains.
Loss of positivity using MQS method: The Modified Quadratic Shepard Method does not guarantee to preserve positivity. In Table 1 samples of oxygen percentage in flue gases of a boiler with respect to time are given. Mass concentration is always positive. A negative mass concentration value does not make sense. So we expect interpolated values to be positive. Using Modified Quadratic Shepard method a curve has been constructed in Fig. 2 for this data set. The graph goes negative for the inherently positive data. The plot shows negative mass value that is ambiguous. A number of solutions to the problem of positivity preservation using this method have been suggested. We will discuss and compare some of the solutions in the following subsection.
Earlier work for preservation of positivity using MQS method: There are number of ways to preserve positivity using Modified Quadratic Shepard method. Basis functions are common target to achieve the goal of positivity. As the weight functions, w(X), of the interpolant are always positive so the function, F(X), will never go negative if all basis functions are positive. Most of the methods concentrate on basis function for preservation of positivity of the interpolant.
Fig. 2: | Plot of data in Table 1 using MQS method |
Fig. 3: | Plot of data in Table 1 using MQS (Dashed) and truncated basis functions (Solid) |
Table 1: | Oxygen levels in flue gases from a boiler |
Basis function truncation: As the weight functions of MQS interpolant are always positive, so the function, F(X), will never go negative if all basis functions, Q(X), are positive. One way to ensure positivity of Q(X) is to truncate its negative part by putting all negative values to zero. This is simple solution but it has drawback that continuity of the graph is reduced to °C as shown in Fig. 3. A similar approach is suggested by (Xiao and Woodbury, 1999) however the truncation of the negative value after evaluation. The resulting continuity is usually not acceptable for visualization applications.
Constraining radius of participation: In this scheme we constrain the radius of participation of basis function for construction of graph, Rw, such that the basis function is positive in this radius. This scheme is simple and low cost as small additional computations are involved at preprocessing stage only. However drawbacks are not tolerable. The most significant drawback is that the number of basis functions participating in construction of the graph may be too small to keep the continuity of the graph.
Simple positive modified quadratic Shepard method: This method, suggested by (Asim, 2000), scales all the basis functions that go negative. Positivity of Q(X) is achieved by scaling and shifting of basis function. The original negative basis functions are replaced by the modified positive basis functions (X). The modified basis functions, (X), have following characteristics:
• | It interpolate fi i.e., |
(Xi) = fi |
• | For 2D data sets it is a circular paraboloid. |
(x1, x2) = zk+ai[(x1-x1i)2+(x2-x2i)2] |
The paraboloid may be a convex or concave depending upon the shape of original basis function. The symmetrical axis of the paraboloid passes through the stationary point, zs, of the original basis function. Scaling is performed by shifting, zs to zk, upward or downward, based on the type of original basis function and satisfying the above given conditions. Asim (2000) discussed details of the scheme and implementation results.
This scheme also has draw backs. This scheme unnecessarily perturbs basis functions and the interpolant may sometimes lose continuity with respect to data. The resultant graph may not preserve the structure of original graph.
Scaling and shifting algorithm: This is another scheme given by (Asim, 2000). Implementation of the scheme is discussed in (Asim et al., 2004). This scheme scales Hessian matrix, Ai, of each negative basis function by a positive scaling factor αi and shift the basis function to interpolate the reference point fi. More specifically:
• | It interpolates fi i.e., |
(Xi) = fi |
• | Otherwise. (X) = zk+αi[(X-Xs)TA(X-Xs)] |
where A is known as Hessian Matrix and (X-Xs) is a vector defining coordinate position and (X-Xs)T is
transpose of this vector. Xs is the coordinates of the stationary points of the basis function. For 2D data sets the shape of paraboloid after scaling, is similar to the original. The paraboloid may be a convex, concave or hyperbolic depending upon the shape of original basis function. The symmetrical axis of the paraboloid passes through the stationary point, zs, of the original basis function.
Scaling is performed by shifting, zs to zk, upward or downward, based on the type of original basis function and satisfying the above given conditions. Details of the scheme and implementation results are discussed in (Asim, 2004).
Extension of the scheme to higher dimensions is difficult as many special cases emerge due to increase of eigenvalues combinations. In addition this technique faces divided by zero problem
This study presents fixed point scaling scheme that will remove the drawbacks of the previous schemes.
Fixed point scaling: In this scaling scheme we scale only those basis functions that go negative in the region of their participation. We constrain each negative basis function to be positive by using fixed point scaling scheme as given below.
• | We take reference point of the basis function as fixed point. |
• | We scale the basis function given in Eq. 3 by a positive scaling factor,, such that the following conditions for the modified positive basis function (Xi) hold: |
(4) |
where Xm is the coordinates of minima of the basis function in the region Rw. The scaled basis function takes the following form:
(5) |
To find scaling factor, αi, we apply the boundary conditions as stated above and get:
(6) |
The minimum of the basis function is given by:
(7) |
Fig. 4: | Plot of data in Table 1 using MQS (Dashed) and fixed point scaling schemes (Solid) |
where Xm is the point of minima
Solving Eq. 6 and 7 simultaneously we get:
(8) |
It is worth noting that αi varies between 0 and 1.
If fi = 0 then αi = 0. This means that the basis function becomes Q(X) = 0 for all cases where fi is zero. So we treat zero reference values as a special case For semi-positive value of Q(Xm) there is no need of scaling. Scaling is required only if Q(Xm) is negative. In this case scaling factor is defined by the Eq. 8.
The beauty of this scaling scheme is that it scales basis function without changing signs of eigenvalues. So the basis functions do not lose their shape during the constraining process. This preserves the shape (convex, concave and hyperbolic) of the basis function.
The implementation of this study is straight forward. A separate module for scaling is required without disturbing the existing modules. Module for scaling takes the coefficient of quadratic Q(x) and returns the modified coefficients of (x) after multiplying with the scaling factor. We have implemented and tested the results of the research in 1D and 2D data. Plot of 1D data in Table 1 is shown in Fig. 4 and plot of 2D data is shown in Fig. 6. The results are promising as compared to the earlier variants. We have got the improvements on two fronts:
First: instead of treating the convex, concave and hyperbolic cases separately our fixed point scaling technique treat these cases uniformly. It reduces not only the programming effort but also the computational cost.
Fig. 5: | Plot of test data using Modified Quadratic Shepard method |
Fig. 6: | Plot of test data using fixed point scaling algorithm |
Moreover the earlier scaling and shifting technique was facing divided by zero problem. This problem has been eliminated in this scheme.
Second: During implementation we have considered the fact that the cases where data value is zero, the scaling factor too become zero. So we have treated it as special case that further reduces the computational effort.
For construction of examples in 2D and comparison purposes we used following test functions for generation of test data.
(9) |
Data generated at 30 random locations using this function has been used for construction of examples. Plot of a data having many zero values, using MQS, is shown Fig. 5. The graph goes negative where as the test function is always non-negative. Plot of the same data using fixed point scaling is shown in Fig. 6. The positivity is preserved as depicted from the range of graph.
The criteria for comparison are based on the key features of MQS that make it suitable for certain applications. The key features of this interpolant are continuity, efficiency and extendibility to higher dimensions.
Original MQS method is C1. This level of continuity is often required for visualization applications. So C1 continuity is also required for positivity preserving algorithm. It is trivial to prove that the continuity of the positivity preserving algorithm is also C1 however gradient at reference point is also scaled by the scaling factor. For zero valued reference points the gradients also become zero.
Visualizations applications often involve higher dimensional data sets and MQS is easily extendible to higher dimensions. So extendibility of the positivity preserving algorithms to higher dimensions is also important criteria for comparison. We formulated the fixed point scaling scheme without its dimensional consideration so implementation and extension of the scheme to higher dimensional applications is straightforward.
The results are of comparative assessment of these features is summarized in Table 2. Computational cost relative to MQS method is given for 2D data plotted on a 200X200 grid for comparison purpose. Computational cost increase is small because additional processing required for positivity preservation is done at preprocessing stage. Treating zero reference values as a special case further reduces the computational cost.
Table 2: | Comparison of MQS with positivity preserving algorithm |
The deviation of the graph from original MQS method increases with the number of its basis function undergoing scaling. This deviation is high when the most of the reference values are zero or near zero. The error terms in Table 2 indicate that the modeling capability of the interpolant has reduced a lot due to the scaling for preservation of positivity.
The fixed point scaling provides an efficient solution to preserve positivity while visualizing positive data using Modified Quadratic Shepard Method. This research can be implemented using few additional software modules without any major disturbance to existing software modules.
The error estimates indicate that accuracy of the interpolant decreases due to this modification. More work is required to improve the accuracy of the interpolant while preserving positivity.
Regarding future work:
• | We are working to improved accuracy of the positivity preserving interpolant. |
• | We are exploring the possibility of achieving goal of positivity by fitting positive least square quadratic to data. |
The authors acknowledge enabling role of PAEC and Higher Education Commission, Islamabad, Pakistan and appreciates financial supports through Development of S and T Manpower through Indigenous Ph.D (300 Scholars).