How I Built and Tuned a Machine Learning Model to Predict Laptop Prices
🧠 Introduction
As a computer engineering undergraduate passionate about machine learning (ML), I embarked on a practical project: predicting the price of laptops based on their specifications. The aim was not only to build a regression model but to truly understand each phase of the ML workflow — from problem analysis to model evaluation, and ultimately toward web deployment.
This post details the entire journey: technical challenges, tools used, key learnings, and future possibilities.
📌 Problem Statement
The goal of this project was to create a predictive model that can estimate a laptop’s market price (in LKR) given a set of features such as:
-
Brand (
Company
) -
Processor (
CPU
) -
Graphics card (
GPU
) -
RAM (in GB)
-
Operating System (
OpSys
) -
Weight (in kg)
-
Screen characteristics
-
Touchscreen and IPS support
The target variable was Price_LKR
, and the task is clearly a supervised regression problem.
🧼 Data Preprocessing & Feature Engineering
The dataset contained a mix of categorical, numerical, and textual fields. Here’s how I prepared it for modeling:
1. Feature Cleaning
-
Converted
"8GB"
→8
using.str.replace()
and.astype(int)
-
Parsed weights like
"1.37kg"
→1.37
as float -
Simplified CPU/GPU info (e.g., extracted brand like
Intel
,AMD
) -
Grouped OS types:
Windows
,Mac
,Linux
,Chrome
,Android
2. Categorical Encoding
To handle string labels, I used:
pd.get_dummies(data, dtype=int)
This converted all categorical variables into binary (0/1) columns suitable for modeling.
🏗️ Model Building & Evaluation
The cleaned dataset was split into training and test sets using train_test_split()
with 20% held out for testing.
I experimented with several regression algorithms from scikit-learn
:
🌲 What is Random Forest Regression?
Random Forest Regression is a powerful and widely used machine learning algorithm that belongs to the ensemble learning family. It combines the predictions of multiple decision trees to produce a more accurate and stable output.
🔎 How Does It Work?
-
Random Forest creates many decision trees using random subsets of the training data and features.
-
Each tree gives its own prediction for the output.
-
The final result is the average of all tree predictions (for regression tasks).
This helps reduce overfitting, a common issue in decision trees, and gives better generalization on unseen data.
🛠 Why I Used It in My Project:
I chose RandomForestRegressor from scikit-learn
because:
-
It works well with both numerical and categorical (one-hot encoded) features.
-
It handles large feature spaces effectively (like our 44-feature input).
-
It doesn’t require feature scaling.
-
It’s robust to outliers and noise in the dataset.
🔍 Hyperparameter Tuning with GridSearchCV
To optimize RandomForestRegressor
, I used:
GridSearchCV(estimator=rf, param_grid={
'n_estimators': [10, 50, 100],
'criterion': ['squared_error', 'absolute_error', 'poisson']
}, cv=5)
🧪 Final Evaluation
The best-performing model was:
RandomForestRegressor(criterion='absolute_error')
It achieved:
-
R² Score:
0.82
— explains 82% of price variation -
MAE: Low, acceptable error in the LKR price range
⚙️ ML Pipeline Summary
🚧 Issues Faced
Laptop Price Predictor Web App Completed
Key Features:
-
Web-based form interface for user input
-
Real-time price prediction
-
Machine Learning model trained on a labeled dataset
-
Hosted locally with Flask (deployment-ready!)
🔧 Tech Stack:
-
Python 🐍
-
Pandas, NumPy, Scikit-learn
-
Flask (backend)
-
HTML + CSS (frontend)
-
📦 Additional Features
-
Add GPU benchmarks or rating scores
-
Build a recommendation engine
-
Integrate user reviews (sentiment analysis)
-
Keep updating the model with new data
🧠 Reflections
This project gave me a firsthand understanding of what real ML work involves — not just model building, but data wrangling, debugging, evaluation, and iteration. I learned how preprocessing and clean data are the foundations of meaningful predictions.
As a machine learning beginner, this project helped me internalize how critical data understanding and pipeline structuring are in any ML workflow.
📣 Final Words
This is just the beginning of my ML journey. I hope this blog encourages other students and beginners to explore, build, break, fix, and share their work.
👉 Stay tuned — I’m currently working on deploying this model as a web app.
🔗 Connect with me on:GitHub !
💬 Connect with me on: LinkedIn !
Comments
Post a Comment