MACHINE LEARNING - BASED FRAMEWORK FOR CONSTRUCTION DELAY MITIGATION

SUMMARY: The construction industry, for many decades, has been underperforming in terms of the success of project delivery. Construction delays have become typical of many construction projects leading to lawsuits, project termination, and ultimately dissatisfied stakeholders. Experts have highlighted the lack of adoption of modern technologies as a cause of underproductivity. Nevertheless, the construction industry has an opportunity to tackle many of its woes through Construction 4.0, driven by enabling digital technologies such as machine learning. Consequently, this paper describes a framework based on the application of machine learning for delay mitigation in construction projects. The key areas identified for machine learning application include "cost estimation", "duration estimation", and "delay risk assessment". The developed framework is based on the CRISP-DM graphical framework. Relevant data were obtained to implement the framework in the three key areas identified, and satisfactory results were obtained. The machine learning methods considered include Multi Linear Regression Analysis, K-Nearest Neighbours, Artificial Neural Networks, Support Vector Machines, and Ensemble methods. Finally, interviews with professional experts were carried out to validate the developed framework in terms of its applicability, appropriateness, practicality, and reliability. The main contribution of this research is in its conceptualization and validation of a framework as a problem-solving strategy to mitigate construction delays. The study emphasized the cross-disciplinary campaign of the modern construction industry and the potential of machine learning in solving construction problems.


INTRODUCTION
For almost five decades, the construction industry has been overcome by inefficiency and under productivity. Other industries such as the service and manufacturing industries have increased productivity by adopting digital technologies and automation across their value chain. Forbes and Ahmed (2010) report that up to 30% of wastage is due to inefficiency, errors, delays, and poor communication. Unfortunately, the majority of construction work is still based on century-old methods, which do not relieve the construction industry of its problems. The problem of underperformance in construction is even more aggravated in light of large and ambitious construction projects which are characteristic of the industry in the 21st century. Assaf et al., (1995) hinted that a critical issue with large construction projects is that of delays. Aibinu and Jagboro (2002) define delays as situations where a project's duration is extended due to factors related to the client, consultant, and contractor. Delays in construction projects have negative effects on all stakeholders including litigation (Santoso and Soeng, 2016), cost and time overruns, loss of productivity and revenue, and contract termination (Sambasivan and Soon, 2007).
Conventionally, construction research has focused on exploring the causes of construction delays in various economies globally (Sanni-Anibire et al., 2020a). Such exploratory research approaches have been subject to severe criticism. For instance, AlSehaimi et al., (2013) argued for constructive research approaches in the construction industry in general, and delay mitigation in particular. Accordingly, some researchers have proposed frameworks to mitigate the prevalence of delays in the industry. Examples include a conceptual framework based on Case-Based Reasoning (CBR) by Ng et al., (2000); a model based on knowledge management by Abdul-Rahman (2008); the adoption of the Lean Planner System by AlSehaimi (2011); a risk response model by Motaleb (2014); and a management framework by Khair et al., (2018). These studies have viewed delay mitigation as a means of providing preventative measures, recommendations derived from lessons learned, and modified project management frameworks. Despite the contribution these studies have made to the research and professional community, the occurrence of delays is still a common feature of projects. It could be argued that these studies do not treat the issue of delay mitigation from a holistic perspective by considering the cost estimate, schedule, and project risks (Galway, 2004;Meyer, 2015). Moreover, none of these studies has sought to leverage the growth of industrial data which has the potential to transform the construction industry. Success stories on the use of modern digital technologies to resolve performance issues have been reported in other industries (Larrañaga et al., 2018). Interestingly, Woodley (2019) suggests that digitalization has the potential to resolve claims and disputes due to construction delays. Regrettably, the construction industry is notorious for its slow adoption of modern technology; nonetheless, a technological transformation driven by the fourth industrial revolution (IR4.0) is now witnessed in the industry.
The current trend of the construction industry is Construction 4.0aimed at digitalization and automation in the industry for improved productivity. Machine learning-a subset of artificial intelligence-is considered one of the top ten technologies driving IR4.0 (PricewaterhouseCoopers, 2017). It is worth mentioning that machine learning has been applied to various domains of construction and civil engineering (Adeli, 2020); however, only a few studies are directly related to delay mitigation in construction. Promisingly, there is a current interest in applying machine learning to delay risk prediction in construction (Gondia et al. 2019, El-Kholy 2019, though such studies already predominate in other industries (Yaghini et al. 2013, Takeichi et al. 2017. While the prediction of delay risk is valuable towards the central objective of mitigating delays, it is crucial to mention that there exists a gamut of factors that influence construction delay. Aziz and Abdel-Hakam (2016) identified two hundred and ninety-three (293) construction delay factors, while Enshassi et al., (2009) presented 110 delay factors. Therefore, there could be a wide range of possibilities in identifying relevant areas of machine learning application in delay mitigation.
In light of the foregoing, the goal of this study is to develop a framework based on machine learning to mitigate delays in construction projects. According to the Project Management Institute, project risk mitigation should holistically cover three areas of cost estimate, schedule, and identified project risks (Galway, 2004;Meyer, 2015). Thus, the researchers proposed three concepts: (1) delay mitigation by accurate estimation of cost (Abdul Rahman et al. 2006, Olawale and Sun 2010, Love et al. 2000, Smart Market Report 2011, El-Kholy 2019; (2) delay mitigation by accurate estimation of duration (Ng 2007, Abdul Rahman et al. 2006, Love et al. 2000, El-Kholy 2019; and (3) delay mitigation through delay risk management (Ng,2007, Abdul Rahman et al. 2006, Abedi et al. 2011, Olawale and Sun 2010, Gondia et al. 2020, Motaleb 2020. The researchers thus theorized that these three areas could serve as control points in the planning stage of construction projects to mitigate the occurrence of delays, and ultimately, ensure the success of the project. Abdul Rahman et al. (2006) state "it is important to predict and identify the problems in the early stages of construction and diagnose the cause to find and implement the most appropriate and economical solutions".
Thus, to demonstrate the framework, relevant data was obtained and machine learning models were developed in the three areas of cost, duration, and delay risk. Subsequently, the results, as well as a graphical representation of the framework, were presented to industry experts for validation. The conclusion from the study was the proposal of a framework based on machine learning for delay mitigation in construction. The framework was developed as an adapted format of the most prevalent analytics model: The Cross-Industry Standard Process for Data Mining (CRISP-DM) (Wirth and Hipp, 2000). The ensuing sections discuss productivity in the construction industry, Construction 4.0 and machine learning, a summary of other frameworks developed by researchers for delay mitigation, the methodology of the study, findings, discussions, and conclusions.

LITERATURE REVIEW
The following sections highlight the initiative of various governments to increase productivity in their respective construction industries. Industry 4.0 as a new paradigm in the construction industry, as well as the role of machine learning in this paradigm, is also discussed. Finally, a descriptive summary of previous efforts to develop frameworks to mitigate construction delay is presented.

Productivity in the construction industry
Construction is crucial to a country's industrialization, and thus, it occupies the center stage in a nation's socioeconomic development. The worth of the construction industry is estimated to be about $8.7 trillion, accounting for 12.2% of the world's economic output and providing employment for about 200 million people worldwide (Zou and Sunindijo, 2015). Hence, problems in the construction industry have far-reaching consequences to a nation's socio-economic structure. It is therefore reasonable that stakeholders are interested in the construction industry's productivity, which is lagging behind that of other industries. In the UK, a study by Sir Michael Latham (1994) titled "Constructing the Team" provided a 30-point executive summary of strategies that could be used to enhance the productivity of the construction industry (Latham, 1994). Another study titled "Rethinking Construction" was made by Sir John Egan (Egan, 1998). A more recent study by Mark Farmer (2016) commonly known as the "Farmer Review" or by its subtitle "Modernise or Die" identified key deficiencies of the British Construction Industry. Farmer (2016) recommended the adoption of modern techniques including greater use of robotics, machine learning, and automated planning decisions by use of digital design. Other countries have undertaken similar studies and developed action points as well as key performance indicators for the development of the construction industry (Sawhney et al., 2020). The continuous proliferation of such studies may suggest that the industry has not made significant progress in its battle for enhanced productivity.
Notably, few countries like China, Singapore, and Japan have made giant strides in the past few decades to adopt technologies such as prefabrication (Xu et al. 2020, Gao et al. 2020. However, on a global scale, construction productivity is largely affected by the industry's slow adoption of modern technology. Farmer (2016)'s report states, "Construction has not even made the transition to 'Industry 3.0' status which is predicated on large scale use of electronics and IT to automate production". Unlike the manufacturing, automotive, and aerospace industries, the construction industry has failed to embrace the opportunities afforded by technology and advances in data management. Modern digital technologies offer unique opportunities in construction project management through automated means of capturing, storing, and processing large quantities of data for effective decision-making (Forbes and Ahmed, 2010). Positively, the industry has embraced a new era of faster, more automated, and smarter construction processes (Keith, 2018).

Industry 4.0 and machine learning
The production world, to date, has witnessed four industrial revolutions. The first industrial revolution relied on water and steam, the second on electricity, the third on electronic systems and information technology. The fourth industrial revolution is a convergence of the digital and physical worlds. It merges the third industrial revolution's IT, such as computer integrated manufacturing, machine learning, the internet, and many other technologies, with operational technologies to create the disruptive technologies that are the backbone of the IR4.0 (Larrañaga et al., 2018). Industry 4.0 was defined by Kagermann et al. (2013) as an initiative to secure the future of the German manufacturing industry. Although, a German concept, it has been adopted globally by governments and organizations. Global examples include Industrie du Futur in France, Industria Conectada in Spain, Made in China 2025, Made in India, ASEAN 4.0 with manufacturing leaders such as Singapore and Malaysia, and Society 5.0 in Japan (Larrañaga et al., 2018).
In light of the fourth industrial revolution, the construction industry is presented with the opportunity to propel to more efficient production, business models, and value chains. Such a transformation could be achieved through existing and emerging technologies that form part of the IR4.0 paradigm (Oestereich and Teuteberg, 2016). The construction industry's response to Industry 4.0 is Construction 4.0. The idea of Construction 4.0 is based on a confluence of trends and technologies (both digital and physical) that promise to reshape the way builtenvironment assets are designed and constructed. Liao et al. (2017) presented a systematic literature review of 224 papers published until June 2016 to determine the technologies and key features of IR4.0. Notably, machine learning was identified as one of the enabling digital technologies to drive IR4.0. Machine learning is one of the key techniques able to generate data-driven predictive models that can be used for decision-making. Cost management, scheduling, and construction management are applications that can be developed using digital technologies as an inherent concept of Construction 4.0 (Sawhney et al., 2020).

Frameworks for construction delay mitigation
Construction delay is one of the most significant problems in the construction industry, and it is to this end that numerous studies have been made to explore the main causes of construction delays in various countries (Sanni-Anibire et al., 2020a). AlSehaimi and Koskela (2008) opine that the failure of existing delay studies is attributed to their descriptive and exploratory nature, and thus suggested the need for alternative research approaches. The following is a descriptive summary of research work that has been achieved towards that aim. Love et al. (2000) proposed a systems dynamics model for delay mitigation due to prolonged overtime work on project costs and quality. The study suggested that utility theory can be applied to determine the most appropriate solutions to mitigate project delays. The study validated its procedure with 14 projects in Hong Kong. Ng et al. (2000) investigated the application of Case-Based Reasoning to develop a conceptual framework for delay mitigation. The framework consists of various components, including "delay identification", "delay analysis", "crashing activity scrutiny", "estimate", "schedule re-estimation", "data input" and "output". Abdul- Rahman (2008) proposed a delay mitigation model based on the adoption of knowledge management. The model was based on knowledge obtained through lessons learned in construction projects and noted that the accuracy and volume of information obtained are potential limitations of the model. AlSehaimi (2011) theorized that traditional project management tools and practices are inadequate in modern construction. Consequently, a proposal to adopt the Last Planner System as a viable solution was made. A risk response model was developed by Motaleb (2014). The study described preventative measures and mitigation measures in developing the model, and further validated the same through interviews with professionals from selected case studies. Similarly, Chai et al. (2015) presented a structural equation model based on preventive measures, predictive measures, organizational measures, and corrective measures for delay mitigation. Khair et al. (2018) proposed a project management framework based on the 'stage-gate' approach and validated the same with input from focused group discussions with experts. Recently, Gunduz and Al-Naimi (2021) proposed a delay mitigation framework through the identification of 41 delay mitigation factors categorized into financial and enabler objectives. The main contribution of the study was the integration of the balanced scorecard approach and quality function deployment in developing the framework.
Despite the acknowledged contribution of these studies, construction delays remain unmitigated in various construction sectors globally (Sanni-Anibire et al., 2020a). Simplistic frameworks which do not consider the complex interplay among multiple delay factors have been perceived as ineffective (Woodley, 2019). Effective strategies to delay mitigation should cover three main aspects including the project's timeframe, estimated costs, and performance in terms of identified risks (Galway, 2004;Meyer, 2015). These three domains are also recognized by other researchers (Abdul Rahman et al. 2006, El-Kholy 2019Gondia et al. 2020). Furthermore, adopting digital technologies has the potential to mitigate most delay factors through improved situational awareness and information insights derived from data (Woodley, 2019). The current study hopes to fulfil the research need for a digital and holistic delay mitigation framework.

METHODOLOGY
The methodology employed in this research can be summarily described as follows:

Identify areas of ML application for construction delay mitigation
A thorough review of the extant literature was made to identify the most common delay risk factors in the construction industry. This exercise led to the identification of 36 delay risk factors extracted from relevant studies published in the last 15 years (Abd El-Razek 2008, Sambasivan and Soon 2007, Aibinu and Odeyinka 2006, Doloi et al. 2012, Faridi and El-Sayegh 2006, Fugar and Agyakwah-Baah 2010, Gündüz et al. 2013, Lo et al. 2006, Sweis et al. 2008, Toor and Ogunlana 2008, Enshassi et al. 2009). Furthermore, the research defined three key areas for ML application based on relevant literature, including: "accurate estimation of cost"; "accurate estimation of duration"; and "delay risk assessment" as previously discussed.

Develop conceptual delay mitigation framework
Conceptual frameworks are representations of ideas as understood by the researcher. Miles and Huberman (1984) define a conceptual framework as "the current version of the researcher's map of the territory being investigated". Likewise, Weaver-Hart (1988) views a conceptual framework as "a structure for organizing and supporting ideas; a mechanism for systematically arranging abstractions; sometimes revolutionary or original, and usually rigid". Thus, the areas identified for ML application in delay mitigation were converged to develop the conceptual framework. The Cross-Industry Standard Process for Data Mining (CRISP-DM) was adapted as the vehicle to convey the graphical format of the framework. The CRISP-DM is a standard process model for data mining which was developed in a project partly sponsored by the European Commission. The value of the model is in its effectiveness, reliability, and adaptability to various situations regardless of the industry sector and technology used (Wirth and Hipp, 2000). Though other data mining frameworks have been developed over the years (Shafique and Qaiser, 2014), CRISP-DM is widely adopted across various industries. For instance, the CRISP-DM has been used to develop machine learning frameworks for quality management (Schäfer et al., 2018), and sports management (Bunker and Thabtah, 2019;Schelling and Robertson, 2020).

Implement delay mitigation framework
An implementation of the delay mitigation framework was carried out based on data obtained from past case studies. Data (see table 1 for statistical summary and description) on the impact and likelihood of 36 identified delay risk factors were obtained from 48 industry professionals in skyscraper projects. These included professionals across the life cycle of the project, where the contractors represent 30% of the population, while the consultants represent 35%, and similarly the clients' representatives/facility managers. Likewise, at the time of the survey, 23% of the respondents were designated as project managers while 21% were holding facility manager roles, 15% director roles and 11% executive director roles. In terms of professional experience, 48% which represents the majority of the respondents had greater than 15 years of experience. UAE and Saudi Arabia, with 42% and 33%, represents the majority in terms of the location of the respondents. The obtained values for impact and likelihood were used to compute risk classes for each response and then presented in a suitable data structure as shown in table 4 (Sanni-Anibire et al., 2020b). Relevant cost and duration data (see tables 2 and 3 for statistical summary and description) were obtained for skyscrapers from the Mega Project Case Study Centre of China (http://www.mpcsc.org/case_search.htm). Thirty-five projects with information on the project duration and other relevant data were selected, and there were 5 missing values for the project cost as shown in table 2. Hence, 35 projects were adopted for the duration, while 30 projects were adopted for the cost.
Subsequently, machine learning algorithms were implemented on the dataset and evaluated based on standard performance metrics including the Root Mean Squared Error (RMSE), Correlation Coefficient (R 2 ), Mean Absolute Percentage Error (MAPE), the Classification Accuracy and Misclassification Error. The Waikato Environment for Knowledge Analysis (WEKA 3.8.3)-one of the most popular tools used in the machine learning community, has been used in this study (Larrañaga et al., 2018). In this study, four ML algorithms have been considered including Multi Linear Regression Analysis (MLRA), Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), and Support Vector Machines (SVM). Detailed theoretical and mathematical descriptions of these algorithms are presented in relevant references (Cortes and Vapnik 1995, Lek and Guégan 1999, Olatunji et al. 2013, Wauters and Vanhoucke 2017, Sethi et al. 2017, Olatunji 2017.

Validation of the framework with construction professionals
The validation approach utilized in this study was based on structured interviews of professional experts on the applicability, appropriateness, practicability, and reliability of the framework. Applicability means "the framework is suitable to the context of the project", appropriateness means "the framework is clear, and easy to follow", "practicability means "the framework is technically sound and correct", and reliability means "the results of the framework is dependable" (Bassioni et al., 2005). A total of 10 highly qualified professionals (see table 5) were contacted for the validation interview. This was based on the recommendation of 5-10 interviewees as an adequate number of participants for the framework validation process (Bryman and Bell, 2003;Gao and Low, 2014). The professionals were initially briefed on the aim of the interview, and then a validation document describing the framework in figure 1 was discussed with them. This also included the performance of the ML models as shown in figures 2 and 3, and table 8. Subsequently, the professionals were requested to give their feedback on the validity of such a framework to their professional practice. A scoring sheet with a Likert scale of validation from (1) to (5) as presented in table 6 was provided to the professionals to validate the framework based on relevant criteria.

RESULTS AND FINDINGS
The delay mitigation framework proposed in this study has been developed as an adapted format of the CRISP-DM framework. The framework as presented in figure 1 is briefly described as follows:

Business understanding
This entails defining the problem from a business perspective (i.e. cost, duration, and delay risk) and obtaining relevant data. As stated earlier, the problem of delay could be mitigated through three key areas including accurate estimation of cost, accurate estimation of duration, and delay risk assessment. Thus, three models would be developed with relevant data in these domains. The dataset developed for delay mitigation has been described in stage 3 of the methodology section.

Data understanding
To understand data, statistical analysis and visualization of the dataset containing information about the problem are required. Table 1 provides descriptive statistics of the delay risk factors (DRFs) obtained from professional experts, while tables 2 and 3 provide descriptions of the cost and duration data obtained from the Mega Project Case Study Centre of China.

Data preparation
The necessary activities to construct the final dataset from the initial raw data are carried out in this phase. The study first examined the various filtering options that expose the understanding of the data structure to the machinelearning algorithm. Consequently, the replacemissingvalues filter was adopted when deploying the ANN and KNN algorithm for cost estimation, while the standardization filter was adopted for MLRA and SVM algorithms. In developing the model for duration estimation, the entire dataset (input features and output class) was normalized, while the untransformed data was adopted for predicting delay risk as presented in table 4. Another aspect of data pre-processing is feature subset selection. This is necessary, as data used for machine learning sometimes suffer from a phenomenon known as the "curse of dimensionality". Thus, correlation-based feature selection was adopted for cost and duration estimation; while wrapper-based feature selection (SVM as a base classifier) was adopted for delay risk prediction. The results revealed that the most relevant feature in estimating the cost was the "floor area", while the most relevant feature in estimating the duration was the "# of total floors". The most relevant feature in delay risk prediction was "slowness in decision making".

Modeling
This stage involved the application of various machine-learning algorithms, as well as ensemble methods to the dataset. The dataset was split into a train-test ratio of 66% to 34%. The final model developed in predicting the duration was based on an ensemble method with ANN as the combining classifier of three base classifiers as described in table 7. In estimating the cost, an ensemble model was also developed with KNN as the combining classifier of three base classifiers as described in table 7. Likewise, the model for predicting delay risk was developed based on ANN as presented in table 7.

Evaluation
Evaluating the models developed is based on established performance metrics depending on the type of problem i.e. classification or regression. Developed models may also be evaluated based on targets set by the company or based on existing techniques. In this study, relevant performance metrics from the literature including the Root Mean Squared Error (RMSE), Correlation Coefficient (R 2 ), Mean Absolute Percentage Error (MAPE) were adopted for cost and duration (regression problem), while the Classification Accuracy and Misclassification Error were adopted for delay risk (classification problem). In essence, the performance of the model for the duration was characterized by an R 2 of 0.69, MAPE of 0.18, and RMSE of 301.76, while the model for predicting cost by an R 2 of 0.81, MAPE of 80.95%, and RMSE of 6.09. The results also show that the model was able to correctly classify the risk of delay, except in one case where a risk of "Very High" was erroneously classified as "Moderate". The model thus achieved a 93.75% accuracy. Cross plots have also been made to illustrate the performance of the various models. Figures 2 and 3 show a satisfactory level of performance for the duration and cost models respectively. The data point of the actual and predicted observations was closely correlated and sometimes matching, while in some instances significant gaps were observed between the actual and predicted values. Improved predictive performance of the models could be achieved through the establishment of larger datasets.

Deployment
The implementation of the machine-learning model could be achieved through the development of standalone digital tools or as an addition to existing project management tools. To validate the potential for deployment by project managers, interviews with ten highly experienced professionals (described in table 5) involved in the planning and delivery of construction projects were carried out. The results of their validation of the framework are presented in table 9. Their feedback on the validity of the proposed framework can be summarised as follows: The professionals believed in the applicability and appropriateness of the framework in mitigating construction delays. They emphasized its value in the pre-planning stage of construction projects. It can be observed from table 9 that the mean validation score for "applicability", "appropriateness", and "practicality" was 4.6, 4.4, and 4.1 respectively. On the other hand, the professionals were not highly confident of the "reliability" of the framework, where a mean validation score of 3.3 was achieved. Further discussions with the professionals revealed that the credibility of the dataset used to develop such models is crucial to the dependability of the decisions made due to it. The professionals opined that proper documentation of historical data from past projects needs to be confirmed for the reliability of the framework. Generally, the professionals welcomed the idea of the framework should it be

Data Points
Actual Predicted developed further as tools to be adopted in project planning. They also generally suggested that it is valuable as a baseline for comparison with existing techniques in the industry.

DISCUSSION
The construction industry continues to suffer from under productivity issues such as the occurrence of delays. Though the research domain is saturated with numerous delay studies, the majority of these studies are explorative, and hence do not provide solutions to the inherent problem. Few studies have sought to adopt constructive research methods that develop tools and models with practical and theoretical implications (Love et al. 2000;Ng et al. 2000;Abdul-Rahman 2008;AlSehaimi 2011;Motaleb 2014;Chai et al. 2015;Khair et al. 2018;Gunduz and Al-Naimi 2021). These studies have mainly viewed delay mitigation as a means of providing preventative measures or recommendations derived from lessons learned. Some studies have sought to control the estimated time based on data from previous cases, while others have proposed modified project management frameworks. Despite the inherent contribution of these studies to the body of knowledge on construction delay, there is a huge gap in research seeking the potential of computing and information technology for delay mitigation. Interestingly, the construction industry is moving towards digitization under the umbrella of Construction 4.0. Perhaps Construction 4.0 and its related technologies can transform the construction industry into a more efficient and transparent enterprise (Liao et al. 2017). A major enabling technology of Construction 4.0 is machine learning. Machine learning, although has predominated in other industries, has lacked rapid proliferation in the construction industry.
Therefore, this paper presents the development of a framework to mitigate delay in construction projects based on the application of machine learning. The specific advantage of the current study is in its adoption of modern digital technology such as machine learning to develop a framework that could serve as a tool for delay mitigation in construction. The use of historical data to develop data-driven predictive models is one of the key advantages of machine learning. Furthermore, this study promotes the concept of research triangulation, where multiple areas have been investigated as potential means of delay mitigation. Heale and Forbes (2013) define triangulation as "the use of multiple theories, data sources, methods or investigators within the study of a single phenomenon". This is of significant value when compared to previous research works where methodologies proposed for delay mitigation have focused on specific areas such as time management and project management methods. The extant literature however reveals that there is a gamut of factors influencing construction delay, hence, tools that could control various project delay factors is of significant value. Notably, effective mitigation strategies for delay risk should holistically cover the project's duration, cost, and performance (Galway, 2004;Meyer, 2015).
In light of the foregoing, the development of the framework was based on an identification of potential areas for machine learning application. Hence, existing studies in the domain of construction delays were consulted. As a result, the authors theorized three fundamental areas of machine learning application which is backed sufficiently by literature. The areas identified for delay mitigation formed the business understanding as described in the framework presented in figure 1. The framework was further implemented by obtaining relevant data from established databases, as well as construction professionals. The implementation of the framework is discussed in the results and findings section, where machine learning models were developed and were evaluated based on standard performance metrics as established in relevant literature.
The deployment of the framework requires its use in a real-life project. In this case, a robust and reliable dataset will be established and the ML models will be developed as previously discussed. The output of the developed ML models will be used to support decision-making in project planning and risk management exercises. Ultimately, the validity and performance of the framework compared with other delay mitigation approaches can be determined. However, this study investigated the framework's validity for deployment through interviews with construction professionals. The validation process is a fundamental aspect of construction research, and so numerous studies in the construction domain have relied on interviews to validate the various models/frameworks developed (Bryman and Bell, 2003;Gao and Low, 2014). Consequently, ten highly experienced construction professionals were interviewed to validate the framework. In selecting the professionals for the interview, demonstrated experience in project management, as well as education, were stipulated (see table 5). Their feedback generally reflects the concept of the framework as a welcome idea to the professional community. The professionals hinted that if such a framework were to be tactical developed into ICT tools or incorporated into existing project management tools, it will be a novel approach to mitigating construction delay. Professionals believe such a framework could serve as a baseline in the planning stage of construction projects. Professionals also hint that the continuous development of the framework through the documentation of data to increase the accuracy and reliability of the model's prediction is a tangible benefit. In general, the framework was viewed as a valuable contribution to construction project management.
It is worthy to mention at this point, that machine learning, though provides an opportunity for modernizing the construction industry, also has some challenges. For instance, the construction industry has not been able to document data suitable for machine learning applications. Ballesteros-Perez et al., (2020) note that the size of data available is one of the limitations of applying artificial intelligence methods in construction management. Strikingly, Farmer (2016) stated "construction has not even made the transition to 'Industry 3.0' status which is predicated on large scale use of electronics and IT to automate production". Additionally, machine learning is not always applicable because it could be expensive or unnecessary if there are traditional engineering-based approaches capable of solving the problem effectively (Larrañaga et al., 2018). This study however shows that the construction industry has a lot to gain in seeking ways in which smart technologies could be used to solve some of its underperformance issues.

CONCLUSION
The construction industry has suffered for decades from underperformance issues such as delays. The slow adoption of emerging technologies is a crucial causal factor to under productivity in construction. The construction industry is currently witnessing a technological shift under the umbrella of Construction 4.0. One of the most significant technologies driving Construction 4.0 is machine learning-a subset of artificial intelligence. Construction 4.0 has ushered in a new age of construction digitization, where technologies such as machine learning will be used to solve age-long construction problems, and thus be of value to potential stakeholders. This study is the first to propose a delay mitigation framework as an adaptation of the CRISP-DM framework. The proposed delay mitigation framework was implemented with relevant data from past case studies, and the results, as well as the graphical representation of the framework, were presented to industry professionals. The proposed delay mitigation framework was viewed as a welcome development in the construction community. It was noted that the value of the framework is dependent on the reliability of the data used for machine learning applications. This study makes an important contribution to the construction industry, as it shows the value of adopting emerging computing technologies in solving age-long construction problems. Tools developed based on the framework could support decision-making at the planning stage of construction projects, with due consideration for the limitations consequential of the data used.

DATA AVAILABILITY STATEMENT
All data used in this study are available upon request from the corresponding author.