Skip to content

Applying Linear Regression techniques in order to estimate the problem of modeling the compressive strength of concrete.

License

Notifications You must be signed in to change notification settings

georgios-kalomitsinis/Concrete-compressive-strength

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concrete-compressive-strength

The Compressive Strength of Concrete determines the quality of concrete. The compression strength of concrete is a measure of the concrete's ability to resist loads which tend to compress it. It is measured by crushing cylindrical concrete specimens in compression testing machine. Concrete is the most important material in building construction, and its compressive strength is a non-linear function of its age and components. In this repo, we estimate this problem through linear regrtession techniques as a function of its characteristics.

Dataset

The Concrete Compressive Strength DataSet consists of 1030 observations under 9 attributes. There are 8 input variables and 1 output variable. Seven input variables represent the amount of raw material (measured in kg/m³) and one represents Age (in Days). The target variable is Concrete Compressive Strength measured in (MPa). The attributes include factors that affect concrete strength such as cement, water, aggregate (coarse and fine), and fly ash etc... Also, this dataset is obtained from UCI Machine Learning Repository.

  • Number of instances - 1030
  • Number of Attributes - 9
    • Attribute breakdown - 8 quantitative inputs, 1 quantitative output
Attributes Unit
Cement kg/m³
Blast Furnace Slag kg/m³
Fly Ash kg/m³
Water kg/m³
Superplasticizer kg/m³
Coarse Aggregate kg/m³
Fine Aggregate kg/m³
Age Days
Concrete Compressive Strength MPa

Table 1. The features of the Concrete Compressive Strength DataSet.

Modelling and Evaluation

ALGORITHMS

  • Linear regression
  • Lasso regression
  • Ridge regression

METRICS

Since the target variable is a continuous variable, regression evaluation metric MSE (Mean Squared Error), MAE (Mean Absolute Error) and MAPE Score (Mean Absolute Percentage Error) have been used.

equation

equation

equation

Exploratory Data Analysis

The first step is to understand the data and gain insights from the data before doing any modelling. This includes checking for any missing values, plotting the features with respect to the target variable, observing the distributions of all the features and so on. In Figure 1, we display the correlation between the features through heatmap and in Figure 2 the pairplot in seaborn to plot pairwise relations between all the features and distributions of features along the diagonal.

Figure 1. Correlation betweem features.

Figure 2. Visual representation of correlations (pairplot).

METHODOLOGY

STEP No1

The linear algorithms are tested with different values of the alpha parameter. In specific:

alpha = [10-3, 10-2, 10-1, 1 , 5, 10, 102, 103]

As the value of the alpha parameter increases, the complexity of the Ridge model increases in both training and evaluation process, while the complexity of Lasso model in the prediction process remains constant. This is because the Ridge model takes account into all the features of the dataset, while the Lasso model performs feature selection, and in particular, the coefficients of the other features are zeroed or reduced by a fixed factor.

STEP No2

In order to select the optimal value of the alpha parameter, cross validation method was applied. In the case of linear regression the model parameters are independent of each other, then the logarithmic probability function of the model parameter vector represents the contribution of each parameter. Thus, by changing the alpha values, we basically control the coefficients penalty. The higher its values, the higher the penalty and therefore the smaller the values of the feature coefficients.

The dataset was randomly splitted into 70% for training and 30% for testing. This process is repeated 10 times and as a result the average and the standard deviation of each metric was calculated.

STEP No3

Given the non-linearity of the function we are trying to model, it is worth evaluating more expressive linear regression models with polynomial terms of the features. For this reason, a function

test_poly_regression(X_train, y_train, X_test, y_test, n >= 2)

was implemented. In specific:

  • Inputs

    • X_train: training set
    • y_train: labels of the training set
    • X_test: testint set
    • y_test: labels of the testing set
    • n: degree of polynomial 𝑛≥2
  • Outputs

    • A new set of features consisting of the original features and their versions elevated to powers up to 𝑛.

Dependencies

Install all the neccecary dependencies using pip3 install <package name>

Required packages:

- numpy (Version >= 1.19.4)
- matplotlib (Version >= 3.4.3)
- scikit-learn (Version >= 0.22.2)
- seaborn (Version >= 0.10.1)
- pandas (Version >= 1.0.3)

License

This project is licensed under the MIT license.

About

Applying Linear Regression techniques in order to estimate the problem of modeling the compressive strength of concrete.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages