Advanced Steel Construction

Vol. 22, No. 2, pp. 244-254 (2026)


 A DATA-CENTRIC STRATEGY TO MITIGATE OVERFITTING OF ML MODELS FOR

PREDICTING TORSIONAL CAPACITY FOR CFST COLUMNS

 

Ming-Xia Dang 1, Meng-Xue Guo 2, *, Ying Li 1, Hua Li 1 and Shi-Lin Yang 3

1 School of Intelligent Construction and Environment, Xian Jiaotong University City College, Xian, 710018, China

2 School of Civil & Architecture Engineering, Xi'an technological university, Xian 710021, China

3 Shaanxi Construction Engineering Group Corporation Limited, Xian 710003, China

*(Corresponding author: E-mail:This email address is being protected from spambots. You need JavaScript enabled to view it.">This email address is being protected from spambots. You need JavaScript enabled to view it.)

Received: 31 March 2025; Revised: 7 July 2025; Accepted: 2 August 2025

 

DOI:10.18057/IJASC.2026.22.2.10

 

View Article   Export Citation: Plain Text | RIS | Endnote

ABSTRACT

This study investigates the effectiveness of both model-centric and data-centric strategies in addressing the overfitting issue in machine learning (ML) models for predicting the torsional capacity of concrete-filled steel tubular (CFST) columns under combined loading. While prior work has largely focused on optimizing model architectures, our findings reveal that model-centric approaches offer limited improvement when training data is scarce. To address this, we propose a data-centric framework that enhances both the quantity and quality of training data. Specifically, we augment the dataset with synthetic data generated by Conditional Generative Adversarial Networks (CGANs) and finite element analysis (FEA) results. To ensure reliability, we introduce a filtering mechanism that selects high-quality simulated data for model training. Our results reveal that directly incorporating unfiltered synthetic or FEA data into model training can degrade test performance due to the presence of noisy or unreliable samples. In contrast, when high-quality FEA data is carefully filtered and selectively combined with experimental data, the model exhibits a substantial improvement in generalization, reflected by a 5% increase in R² with only a marginal 0.45% rise in MAPE. The proposed data selection strategy consistently reduces performance variance across multiple test splits, indicating strong robustness and resistance to overfitting.

 

KEYWORDS

Data-centric, Overfitting, Machine learning, CTGAN, CFST columns


REFERENCES

[1] El-Dakhakhni, W. Data Analytics in Structural Engineering. Journal of Structural Engineering, 2021. 147(8): 02021001.

[2] Feng, D.C. Implementing ensemble learning methods to predict the shear strength of RC deep beams with/without web reinforcements. Engineering Structures, 2021. 235: 111979.

[3] Nguyen-Sy T. Predicting the compressive strength of concrete from its compositions and age using the extreme gradient boosting method. Construction and Building Materials, 2020. 260: 119757.

[4] Liu K.H., Xie T.Y., Cai Z.K., et al. Data-driven prediction and optimization of axial compressive strength for FRP-reinforced CFST columns using synthetic data augmentation[J]. Engineering Structures, 2024, 300.

[5] Rahal, K.N. Torsional strength of normal and high strength reinforced concrete beams [J]. Engineering Structures, 2013. 56: 2206-2216.

[6] Deifalla, A. Refining the torsion design of fibered concrete beams reinforced with FRP using multi-variable non-linear regression analysis for experimental results[J]. Engineering Structures, 2021. 226: 111394.

[7] Fiore A , Berardi L , Marano G C. Predicting torsional strength of RC beams by using Evolutionary Polynomial Regression[J].Advances in Engineering Software, 2012, 47(1).

[8] Kim C. Torsional Behavior Evaluation of Reinforced Concrete Beams Using Artificial Neural Network[J].Applied Sciences, 2021, 11.

[9] Zhang T.J., Wang D.L., Lu Y. A data-centric strategy to improve performance of automatic pavement defects detection. Automation in Construction, 2024, 160,105334.

[10] Guo M.X., Huang H., Zhang W., et al. Assessment of RC frame capacity subjected to a loss of corner column[J]. Journal of Structural Engineering, 2022, 148(9):0422122.

[11] Lai D.D., Demartino C., Xiao Y. Interpretable machine-learning models for maximum displacements of RC beams under impact loading predictions[J]. Engineering Structures, 2023,281.

[12] Zakieh A., Hadi G., Amin S., et al. DCServCG: A data-centric sevice code generation using deep learning[J]. Engineering Applications of Artificial Intelligence, 2023, 123,106304.

[13] Sung S.H., Suh J.M., Hwang Y.J., et al. Data-centric arttificial olfactory system based on the eigengraph[J]. Nature Communications, 2024,15:1211.

[14] Li M., Jia G. Multifidelity Gaussian Process Model Integrating Low- and High-Fidelity Data Considering Censoring[J].Journal of Structural Engineering, 2020(3):146.

[15] Luo H. and Paal S.G. Reducing the effect of sample bias for small data sets with double weighted support vector transfer regression. Computer Aided Civil and Infrastructure Engineering, 2021. 36(3): p. 248-263.

[16] Marani A , Nehdi M L .Predicting shear strength of FRP-reinforced concrete beams using novel synthetic data driven deep learning[J].Engineering structures, 2022(Apr.15):257.

[17] Fu B.C., Gao Y.Q., and Wang W. Dual generative adversarial networks for automated component layout design of steel frame-brace structures[J]. Automation in Construction, 2022,146.

[18] Almustafa M.K., Nehdi M.L. Machine learning prediction of structural response for FEP retorfitted RC slabs subjected to blast loading[J]. Engineering Structures, 2021, 244.

[19] Song Z.M., Zhang C., and Lu Y.Y. The methodology for evaluating the fire resistance performance of concrete-filled steel tube columns by integrating conditional tabular generative adversarial networks and random oversampling[J]. Journal of Building Engineering, 2024,97.

[20] Zeng S.H., Wang X., Hua L.Q., et al. Prediction of compressive strength of FRP-confined concrete using machine learning: A novel synthetic data driven framework[J]. Journal of Building Engineering, 2024,94.

[21] Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. Generative adversarial nets[J],2014. arXiv preprint arXiv:1406.2661.

[22] Beck, J. and O. Kiyomiya. Fundemental pure torsional properties of concrete filled circular steel tubes. Doboku Gakkai Ronbunshu, 2003. 2003(739): 285-296.

[23] Han, L.H., G.-H. Yao, Z. Tao. Performance of concrete-filled thin-walled steel tubes under pure torsion. Thin-Walled Structures, 2007. 45(1): 24-36.

[24] Wang, Y.-H., G.-B. Lu, X.-H. Zhou, Experimental study of the cyclic behavior of concrete-filled double skin steel tube columns subjected to pure torsion. Thin-walled Structures, 2018. 122: 425-438.

[25] Chen, J., W.L. Jin, J. Fu, Experimental investigation of thin-walled centrifugal concrete-filled steel tubes under torsion. Thin-walled structures, 2008. 46(10): 1087-1093.

[26] Nie, X. Ultimate torsional capacity of steel tube confined reinforced concrete columns. Journal of Constructional Steel Research, 2019. 160: 207-222.

[27] Wang, Y.-H. Torsional capacity of concrete-filled steel tube columns circumferentially confined by CFRP. Journal of Constructional Steel Research, 2020. 175: 106320.

[28] Wang Y.H., Guo Y.F., Liu J.P., et al. Experimental study on behavior of concrete filled steel tube columns under torsion and eccentric compression[J]. China Civil Engineering Journal, 2017,50(7):51-61.

[29] Nie J.G., Wang Y.H., Fan J.S. Experimental study on concrete filled steel tubular columns under combined compression, flexure and torsion[J]. Journal of Building Structures, 2012,33(9):1-11.

[30] Wang Y.H., Nie J.G., Fan J.S. Study on the torsion behavior of concrete filled steel tube column with circular section[J]. Engineering Mechanics, 2014,31(3):222-227.

[31] Wang Y.H., Nie J.G., Fan J.S. Cross sectional shear strain disrtibution of rectangular concrete filled steel tubu columns subjected to torsion[J]. Engineering Mechanics, 2014,31(5):101-119.

[32] Wang Y.H., Li S., Zhou X.H., et al. Study on mechanical behavior of concrete filled steel tubular short columns under compound bending-shear-torsion load[J]. Journal of Building Structures, 2017, 38(11):1-12.

[33] Wang Q.L., Ling Z.N., Chen D. Experimental study on torsional behavior of concrete filled CFRP-steel tube with square cross-section [J]. Journal of Building Structures, 2017, 38,S1:478-484.

[34] Jamalpour R. and Hossain K.M.A. Torsion and Combined Torsion-Axial Load Behaviour of Concrete Filled Steel Tube Columns with and without ECC/CFRP Wrap[J]. Journal of Earthquake Engineering, 2024.

[35] Wang Q L, Peng K, Shao Y B. Research on Mechanical Properties of CFRP Confined Concrete-Filled SquareSteel Tubular Under Bending-Torsion Load[J]. Acta Materiae Compositae Sinica,2022, 39(11): 55575573.

[36] Wang YH., Wang Y Y., Zhou X H., et al. Coupled ultimate capacity of CFRP confined concrete-filled steel tube columns under compression-bending-torsion load[J]. Structures, 2021,31:558-575.

[37] Wang YH., Nie JG., and Fan JS. Theoretical model and investigation of concrete filled steel tube columns under axial forcetorsion combined action[J]. Thin-Walled Structures, 2013,69:1-9.

[38] Nie X., Wang YH., and Li S., et al. Coupled bending-shear-torsion bearing capacity of concrete filled steel tube short columns[J].Thin-Walled Structures,2018,123:305-316.

[39] Yang ZC., Han LH., Zhao HY., et al. Performance of recycled aggregate concrete-filled high-strength steel tubular members under combined compression-bending-torsion[J].Engineering Structures,2025,335:120052.

[40] Zarringol M., Thai H.T. Prediction of the load-shortening curve of CFST columns using ANN-based models[J]. Journal of Buidling Engineering, 2022,51.

[41] Huang H., Xue C.L., Zhang W., et al. Torsion design of CFRP-CFST columns using a data-driven optimization approach[J]. Engineering Structures, 2022, 251:113479.

[42] Feng D.C., Wang W.J., Mangalathu S., et al. Implementing ensemble learning methods to predict the shear strength of RC deep beams with/without web reinforcements[J]. Engineering Structures,2021,235:111979.

[43] Xu L., Maria S., Alfredo C., et al. Modeling Tabular data using Conditional GAN[C]//Advances in Neural Information Processing Systems 32 (NIPS 2019) pre-proceedings, CA: NIPS, 2019.