Risk quantification using fuzzy-based Monte Carlo simulation

Estimating cost contingency of construction projects depends largely on data captured from previous projects and/or experience and judgment of members of project team. Mote Carlo simulation is commonly used in estimating contingency, where its accuracy was reported to depend on number of iterations used in the simulation process, probability density functions associated with each project cost item being considered and the correlation among these cost items. The literature reveals that the latter is the most important issue for accurate estimate of contingency. It, however, requires the calculation of coefficients of correlation among cost items based on captured historical records of cost data. Subjective correlation was introduced to alleviate the difficulties associated with the calculation of these coefficients. This paper presents a newly developed method for cost contingency estimation that considers subjective correlations and allows for contingency estimation with and without computer simulation. Unlike the methods reported in the literature, the present method considers uncertainty associated with the coefficients of correlation and utilizes earlier work of the first author in calculating the variance of total project cost. It also allows for assessing the impact of variable covariance matrix on the estimated project cost using a simple and user-friendly computational platform. The application of the developed method on cost data captured from two databases demonstrates its use and accuracy in estimating cost contingency. The results are compared to those produced by others using Monte Carlo Simulation with and without correlation using an actual project data.


INTRODUCTION
Although different project parties may have different definition for contingency, it is strongly linked to risk and project cost overrun on original scope of work (Moselhi 1997). Contingency is a tool to mitigate and control this risk (Hammad et al., 2016). Accurate and not computationally demanding estimation of cost contingency can be a challenging task. It requires historical records of cost data in the domain of application, reliable modeling and selection of suitable estimating methods.
There are a wide range of methods in the body of literature onr cost contingency estimation. Monte Carlo Simulation (MCS) is a commonly used probabilistic method in estimating cost contingency. Despite its common use, there are two major limitations in its application (Touran and Wiser 1992, Touran 1993, Wall 1997, Touran and Suphoe 1997, Yang 2005, Ökmen and Öztaş 2010, Firouzi et al. 2016. Firstly, it requires development of probability density functions for each individual cost item. Secondly, it requires calculation of correction coefficients among cost items to ensure accuracy. The second requirement is usually neglected, partly because it requires a great deal of data that are not always available (Touran and Wiser 1992).
In absence of cost data, contractors estimate cost contingency in a subjective manner based on their gut feeling and experience (e.g. they allocate 5-10% of contract amount), resulting at times in errors or overestimation (Smith et al. 1999, Baccarini 2004, Chou 2013). The quality of such subjectivity can be attributed to their skills, knowledge and motivations (Burroughs et al. 2004).
In view of the above, a method is required to alleviate such requirements, particularly historical cost data records, while enabling accurate estimate of contingency that utilizes experience and judgment of contractors. The proposed method utilizes Monte Carlo Simulation with interdependent variables and fuzzy set theory. It introduces a systematic procedure that accounts for uncertainties associated with subjective correlation coefficients of cost items in estimating project cost contingency and performs calculation of total standard deviation of project cost regardless of the type of marginal distributions of its cost items.

LITERATURE REVIEW
Construction is a risky business and contingency is a vehicle for managing that risk. Contingency is defined by the Association for the Advancement of Cost Engineering (AACE) as: "An amount added to an estimate to allow for items, conditions, or events for which the state, occurrence, or effect is uncertain and that experience will likely result, in aggregate, in additional costs. It is typically estimated using statistical analysis or judgment based on past asset or experience." (AACE 2010).
Contingency estimating methods were studied by Bakhshi and Touran (2014) and clustered in three groups: (1) deterministic, (2) probabilistic, and (3) modern methods. Deterministic methods are the simplest methods in which cost contingency is estimated as a predetermined percentage of project cost based on past experience and historical data (Baccarini, 2005). However, these methods are heavily relying on expert experience and can lead to errors or overestimation (Yeo 1990, Smith et al. 1999, Baccarini 2004, Olumide et al. 2010, Chou 2013. Probabilistic methods include simulation and non-simulation methods. Monte Carlo Simulation (MCS) is the commonly used probabilistic simulation method. The accuracy of MCS strongly relies on calculation of correction coefficients among cost items. The research conducted by Touran and Wiser (1992) is one of the earliest efforts in modeling the impact of correlation among cost items on the total cost variance of construction projects and hence on the estimated contingency. Touran and Suphot (1997) concluded that the use of rank correlations for generating correlated random variables outperforms those correlations established from traditional methods based on Pearson correlations. Moselhi (1997) presents a quantitative direct method for calculating total project cost variance considering correlations without the need for Mote Carlo simulation. To alleviate the difficulties associated with calculations of correlation coefficients, Touran (1993) introduced subjective correlations: high (with a correlation coefficient of larger than a predefined threshold), middle and weak. That method, however, did not consider the uncertainties associated with subjective correlation coefficients among cost items.
The second category of probabilistic methods is non-simulation methods which includes probability tree, expected value, first-order second-moment, program evaluation and review technique (PERT), analytical hierarchy process, optimism bias uplifts, and regression method (Diab et al., 2017). The last one is one of the traditional utilized method in that category in which various independent variables (e.g location, size) are employed to predict the dependent variable (e,g. estimated final cost) (Baccarini, 2005). Lam and Siwingwa (2017) recently utilised multiple regression method to predict the required contingency sum during the preconstruction phase of the project considering the risks associated with construction phases and clients. Diab et al. (2017) investigated the impact of risk drivers on contingency estimation from client and contractors point of views. They utilized regression model to predict the required contingency budget in highway construction projects by rating the potential risk drivers based on their relative importance, cost impact, and schedule impact. However, the use of regression method is recommended where there is a linear relationship between dependent and independent variables (Bakhshi and Touran, 2014) which is not the case in construction project with complex nature.
Therefore, modern methods such as Artificial Neural Networks (ANN) are employed to overcome the linearity assumption in estimating cost contingency (Leung et al., 2018). For instance, Chen and Hartman (2000) utilized a back propagation general regression neural networks (GRNN) model in order to estimate cost contingency at the front-end stage of the project development. Lhee et al. (2014) further proposed a two-step neural network-based method for optimal contingency estimation from an owner's perspective. Their proposed model accounted for modeling non-linearity between the predictor variables and the corresponding target solution. However, ANNbased contingency estimation methods are not capable of capturing the uncertainty associated with input data provided by individual experts and these methods require an extensive data collection for training and testing (Chen andHartman 2000, Leung et al., 2018). However, these methods did not consider the impact of correlation coefficients in estimating project cost contingency.
In summary, all the methods cited above collectively or individually are incapable of simultaneously : (1) considering correlations among project cost items, either subjective or objective, (2) performing contingency estimation with or without using Monte Carlo simulation, (3) accounting for uncertainty associated with subjective correlation coefficients among cost items, (4) calculating the variance of total project cost regardless of the type of the marginal distributions of its cost items, and (5) assessing the impact of variability of the elements of covariance matrix in estimating project cost contingency using a simple and user-friendly platform.
Unlike existing methods in the body of the literature, this paper introduces a new contingency estimation method considering correlations among project cost items, either subjective or objective. The proposed method account for subjectivity of input data provided by individual experts and models the interdependency between cost items. It is also capable of modeling project cost contingency with and without computer simulation. This is deemed particularly useful when using subjective correlations.

METHODOLOGY
The proposed method is designed to enable the use of an allocated range for each subjective correlation coefficient in estimating cost contingency with and without simulation. The components of the developed method are illustrated in Fig 1. The method consists of five steps. The output of each step is used as an input to the following step automatically. In the first step, a qualitative variation range is assigned by the user for each subjective correlation coefficient. It must be noted that in this research the term user refers to either project managers or cost estimators who have enough knowledge and experience to assign that range. In the second step, based on the assigned qualitative variation ranges for the coefficients, three subjective correlation matrices are generated: optimistic, most likely, and pessimistic. Based on these three matrices, three covariance matrices are developed employing Equation (1) of Moselhi and Dimitrov (1993). Then, the sum of covariance of each cost item with other cost items is calculated using Equation (2). In the third step, in case of using simulation, the developed MCS in Microsoft Excel is applied to simulate the variation range of the sum of covariance of each cost items with other cost items. In the fourth step, Fuzzy set theory is applied in order to calculate the expected value of the sum of covariance of each cost item. In this step, the output of MCS is utilized to estimate fuzzy number for cost items. The fuzzification and defizzyfication processes are performed utilizing Equations (3), (4) and Equation (5) respectively. And finally, in the fifth step, the standard deviation of the project total cost is calculated based on Equation (9). The required steps for estimating cost contingency are depicted in Fig 2. Determine qualitative variation range for correlation coefficients of cost items Step 1 ( Data gathering )

Database 2 Database 1 Extract mean and standard deviation of cost items
Step 2 ( Data analysis) Select databases 1 for method development Construct optimistic, most likely, and pessimistic subjective correlation matrixes Extract mean, standard deviation and correlation coefficient of cost items construct optimistic, most likely, and pessimistic covariance matrixes utilizing Eq.1 Extract summation values of covariance of each cost item with other cost itemss in each matrix

Step 3 ( Application of Monte Carlo Simulation )
Determine mean and standard deviation for summation values of covariance per each cost items Simulate the assigned qualitative variation range considering 10,000 iterations Step 4 (Calculation of expected value using fuzzy set theory Step 5 ( Contingency estimation ) Calculate total standard deviation utilizing Eq. 9 Calculate project cost contingency

Data gathering
Two databases from the literature are used for the development of the proposed method and for its validation. The first database was reported in the work of Wall (1997). The data are based on the analysis of elemental cost which were provided on-line by Building Cost Information Service (BCIS) of the Royal Institution of Chartered Surveyors in the UK. The data represents cost per square meter rates of 216 office buildings, having two or more storeys constructed between 1980 and 1994. The total mean and standard deviation of total unit cost are 543.8 (£/m2) and 181.1 (£/m2), respectively. The second database is drawn from the reported work of Touran (1993). This database represents various cost items of 1,014 low-rise office buildings consisting of two to four storeys. Each project cost is decomposed into 15 items. A sample of three correlated cost items which were used by Touran (1993) from a selected sub-set of 26 projects built between 1981 and 1983 is used in the paper to enable a comparison. The mean and standard deviation of total unit cost are 16.6 ($/ft2) and 10.5 ($/ft2), respectively. The cost data of the two databases are presented in Table 1.  (Touran 1993, Wall 1997

Assigned range for correlation coefficient
Users based on their experience can assign a range for each correlation coefficient used for estimating cost contingency. For example, the user can assign a range from 0.0 to 0.30 for week correlation and ranges from 0.30 to 0.60 and 0.60 to 1.0 for moderate and strong correlation, respectively.

Data analysis
The cost data gathered from database 1 are used for method development. The subjective correlation matrix of database 1 is shown in Table 2. In order to generate subjective correlation matrix, all values between 0.0 -0.3, 0.3 -0.6, and 0.6 -1.0 in objective correlation matrix are replaced with 0.15, 0.45, and 0.8, respectively. Two more matrices; optimistic and pessimistic are generated to cover the variation range of correlation coefficients. For example, the optimistic correlation matrix is produced by replacing all values between 0.0 -0.3, 0.3 -0.6, and 0.6 -1.0 in subjective correlation matrix with 0.3, 0.6, and 0.9, respectively, while these values replaced with 0.1, 0.3, and 0.6 for pessimistic correlation matrix. Based on the three produced correlation matrixes, three covariance matrixes are generated utilizing Equation (1) (Moselhi and Dimitrov 1993).
cov (i, j) = ρ ij sd i sd j Equation (1) Where, cov (i,j) is the covariance between cost items i and j, sd is the standard deviation cost items, is the correlation coefficient of cost items, and i and j = 1,2 . . . n, with n the number of cost items. The most likely subjective covariance matrix of database 1 is shown in Table 3. Then, the sum of covariance of each cost item i (i=1,2,3,…,n) with other cost items (j=1,2,3,…,n) is calculated in each covariance matrix utilizing Equation (2) as shown in Table 4.
Equation (2) The mean and standard deviation of the calculated Si (i=1, 2,…,n) are computed for each cost item enabling the generation a set of random data as shown in Table 4.

Application of Fuzzy-Based Monte Carlo Simulation
In this method, correlation between variables (i.e. between cost items) was long proven to be essential for accurate estimate of contingency (Touran and Wiser 1992). In this research, MCS is utilized to generate data from the means and standard deviations of the sum of covariance of cost items housed in Table 4. In other words, MSC is used to cover the variation range of the correlation coefficients between pairs of cost items using 10,000 iterations. This number of iterations is equal to that used by Wall (1997). The developed fuzzy-based MCS algorithm is shown in Fig 3. Start with first cost item (i=1) for first project / expert (

Application of Fuzzy Set Theory
In this step, the output of MCS is used to generate fuzzy random variable, making use of two processes: fuzzification and defuzzification. Each is described subsequently.

Fuzzy estimation
The use of fuzzy number allows modeling imprecision and vagueness. In fuzzy estimation, the data gathering process where items can be evaluated using one of the following fuzzy numbers: • Crisp [a]: it represents that "a" is the item's definitive value.
• Uniform [a, b]: it represents that item's value is expressed by a range [a, b]. • Triangular [a, b, c]: it represents that the item's value is almost assumed to be equal to "b" but with a possibility to be within a minimum (a) and maximum (c) values. • Trapezoidal [a, b, c, d]: it represents that the item's value has more possibility to be within the [b, c] range but it could not be less than "a" or greater than "d".
In this research, a trapezoidal membership function is developed for each cost item. Other membership functions can be used. The trapezoidal function of each cost item is calculated using Equation (3). Equation (3) Where, Si (i = 1… ni) is fuzzy estimation of the sum of covariance of each cost item with other cost items, m is the number of fuzzy estimation per each cost item, and min (ρij sdi sdj), mean (ρij sdi sdj), most likely (ρij sdi sdj), and max (ρij sdi sdj) are the minimum, mean, most likely, and maximum estimation of covariance of each cost item, respectively. The fuzzy number associated with the covariance of each cost item is shown in Table 5. It should be noted that the fuzzy values associated with each cost item were extracted from the generated variation range of MCS 10,000 iterations. The total fuzzy estimation of covariance for project cost items is calculated using Equation (4).
Equation (4) The last column in Table 5 depicts the result of generated using Equation (4).

Defuzzification
The commonly used method for defuzzification is the center of area method (COA) which can be expressed as (Amaya et al. 2009

Equation (5)
Where, y*, μ, and x represent defuzzification value, membership function, and output variable. The expected value (EV) represents the defuzzified value of a fuzzy number according to Equation (6) (Salah 2012, Shaheen et al. 2007. Equation (6) Where, a, b, c, and d are quadruples of a trapezoidal membership function.
Therefore, in this study, the expected value is calculated as Equation (7).
Equation (7) Utilizing Equation (7), the expected value is calculated to be 8476.79, serving as the defuzzified value of covariance matrix.

Contingency estimation
The standard deviation of the project cost is calculated using the method of Moselhi and Dimitrov (1993) as expressed by Equation (8) which considers correlation of cost items and avoids simulation.
Equation (8) In this study, Equation (8) is adapted, where its second term is replaced by the expected value generated from the application of fuzzy set theory as shown in Equation (9). It should be noted that the second term addresses the covariances and their associated uncertainties calculated earlier (see Table 4).
Total standard deviation (sd) = (∑ = =1 + (2 × )) 1/2 Equation (9) By determining the mean or project target cost (TC) and its associated standard deviation, the probability of exceeding or not exceeding that target can be investigated by any specified sum (SS) (contingency) using Equation (10) (Moselhi 1997) A comparison between the results of the proposed method and those of Wall (1997) is shown in Table 6. Although the proposed method utilized subjective correlation and was performed without simulation, it has almost equal accuracy to that of Wall which uses simulation, and objective correlation (2% vs 1.98%). The results also indicates maximum difference in error between the proposed method and the best results of Wall after experimenting with different probability distributions is 1.76% (2-.24%).

METHOD VALIDATION
The cost data captured from the second database are utilized to validate the proposed method. Table 7 summarizes the cost data of the three sample cost items including electrical systems, mechanical systems, and moisture protection (the cost of roofing, insulation, and waterproofing) as reported by Touran (1993).
The subjective correlation matrix of cost items is shown in Table 8.
Based on the assumed variation range, the optimistic and pessimistic correlation matrices are generated and the application of the developed method yielded the results summarized in Table 9. The results indicated that the developed method outperforms those of Touran (1993) in estimating the standard deviation of project cost (1 % vs 0.01% error). It is interesting to note that same performance is experienced even in the application of the proposed method without simulation.

CONCLUSION
This paper presented a novel method for estimating project cost contingency considering correlations among project cost items, either subjective or objective, and performs the calculations with or without using Monte Carlo simulation. As such, the method provides considerable flexibilities in estimating project contingency to accommodate situations where data needed for proper utilization of MCS may not be available. It is particularly useful when using subjective correlations.
The results of the two databases demonstrate the validity and good accuracy of the developed method in comparison with other methods. For instance, in case of estimating standard deviation with subjective correlation matrix, and without simulation, the developed method yielded almost equal accuracy to that of Wall (1997) which uses simulation, and objective correlation (2% vs 1.98%). In addition, the developed method outperforms those of Touran (1993) in reducing the standard deviation error of project cost from 1 % to 0.01%.
The contributions of the developed method include (1) accounting for uncertainty associated with subjective correlation coefficients, (2) calculating the variance of total project cost regardless of the type of the marginal distributions of cost items; and 3) assessing the impact of variable covariance matrix on the estimated cost of a project using a simple and user-friendly platform.
The developed method is limited to the use of trapezoidal membership function as the fuzzy set theory applied on the subjective correlation matrix. A sensitivity analysis needs to be conducted in future work in order to investigate the effect of diverse qualitative variation range of the correlation coefficients between pairs of cost items on the accuracy of the estimated contingency.