How I Built and Tuned a Machine Learning Model to Predict Laptop Prices

🧠 Introduction

As a computer engineering undergraduate passionate about machine learning (ML), I embarked on a practical project: predicting the price of laptops based on their specifications. The aim was not only to build a regression model but to truly understand each phase of the ML workflow — from problem analysis to model evaluation, and ultimately toward web deployment.

This post details the entire journey: technical challenges, tools used, key learnings, and future possibilities.

📌 Problem Statement

The goal of this project was to create a predictive model that can estimate a laptop’s market price (in LKR) given a set of features such as:

Brand (Company)
Processor (CPU)
Graphics card (GPU)
RAM (in GB)
Operating System (OpSys)
Weight (in kg)
Screen characteristics
Touchscreen and IPS support

The target variable was Price_LKR, and the task is clearly a supervised regression problem.

🧼 Data Preprocessing & Feature Engineering

The dataset contained a mix of categorical, numerical, and textual fields. Here’s how I prepared it for modeling:

1. Feature Cleaning

Converted "8GB" → 8 using .str.replace() and .astype(int)
Parsed weights like "1.37kg" → 1.37 as float
Simplified CPU/GPU info (e.g., extracted brand like Intel, AMD)
Grouped OS types: Windows, Mac, Linux, Chrome, Android

2. Categorical Encoding

To handle string labels, I used:

pd.get_dummies(data, dtype=int)

This converted all categorical variables into binary (0/1) columns suitable for modeling.

🏗️ Model Building & Evaluation

The cleaned dataset was split into training and test sets using train_test_split() with 20% held out for testing.

I experimented with several regression algorithms from scikit-learn:

🌲 What is Random Forest Regression?

Random Forest Regression is a powerful and widely used machine learning algorithm that belongs to the ensemble learning family. It combines the predictions of multiple decision trees to produce a more accurate and stable output.

🔎 How Does It Work?

Random Forest creates many decision trees using random subsets of the training data and features.
Each tree gives its own prediction for the output.
The final result is the average of all tree predictions (for regression tasks).

This helps reduce overfitting, a common issue in decision trees, and gives better generalization on unseen data.

🛠 Why I Used It in My Project:

I chose RandomForestRegressor from scikit-learn because:

It works well with both numerical and categorical (one-hot encoded) features.
It handles large feature spaces effectively (like our 44-feature input).
It doesn’t require feature scaling.
It’s robust to outliers and noise in the dataset.

🔍 Hyperparameter Tuning with GridSearchCV

To optimize RandomForestRegressor, I used:

GridSearchCV(estimator=rf, param_grid={
    'n_estimators': [10, 50, 100],
    'criterion': ['squared_error', 'absolute_error', 'poisson']
}, cv=5)

🧪 Final Evaluation

The best-performing model was:

RandomForestRegressor(criterion='absolute_error')

It achieved:

R² Score: 0.82 — explains 82% of price variation
MAE: Low, acceptable error in the LKR price range

⚙️ ML Pipeline Summary

🚧 Issues Faced

Laptop Price Predictor Web App Completed

web application interface

Key Features:
- Web-based form interface for user input
- Real-time price prediction
- Machine Learning model trained on a labeled dataset
- Hosted locally with Flask (deployment-ready!)
🔧 Tech Stack:
- Python 🐍
- Pandas, NumPy, Scikit-learn
- Flask (backend)
- HTML + CSS (frontend)

📦 Additional Features

Add GPU benchmarks or rating scores
Build a recommendation engine
Integrate user reviews (sentiment analysis)
Keep updating the model with new data

🧠 Reflections

This project gave me a firsthand understanding of what real ML work involves — not just model building, but data wrangling, debugging, evaluation, and iteration. I learned how preprocessing and clean data are the foundations of meaningful predictions.

As a machine learning beginner, this project helped me internalize how critical data understanding and pipeline structuring are in any ML workflow.

📣 Final Words

This is just the beginning of my ML journey. I hope this blog encourages other students and beginners to explore, build, break, fix, and share their work.

👉 Stay tuned — I’m currently working on deploying this model as a web app.
🔗 Connect with me on:GitHub !
💬 Connect with me on: LinkedIn !

Search This Blog

Applied Machine Learning Projects - From Data to Deployment