How I Built and Tuned a Machine Learning Model to Predict Laptop Prices


🧠 Introduction

As a computer engineering undergraduate passionate about machine learning (ML), I embarked on a practical project: predicting the price of laptops based on their specifications. The aim was not only to build a regression model but to truly understand each phase of the ML workflow — from problem analysis to model evaluation, and ultimately toward web deployment.

This post details the entire journey: technical challenges, tools used, key learnings, and future possibilities.


📌 Problem Statement

The goal of this project was to create a predictive model that can estimate a laptop’s market price (in LKR) given a set of features such as:

  • Brand (Company)

  • Processor (CPU)

  • Graphics card (GPU)

  • RAM (in GB)

  • Operating System (OpSys)

  • Weight (in kg)

  • Screen characteristics

  • Touchscreen and IPS support

The target variable was Price_LKR, and the task is clearly a supervised regression problem.


🧼 Data Preprocessing & Feature Engineering

The dataset contained a mix of categorical, numerical, and textual fields. Here’s how I prepared it for modeling:

1. Feature Cleaning

  • Converted "8GB"8 using .str.replace() and .astype(int)

  • Parsed weights like "1.37kg"1.37 as float

  • Simplified CPU/GPU info (e.g., extracted brand like Intel, AMD)

  • Grouped OS types: Windows, Mac, Linux, Chrome, Android

2. Categorical Encoding

To handle string labels, I used:

pd.get_dummies(data, dtype=int)

This converted all categorical variables into binary (0/1) columns suitable for modeling.


🏗️ Model Building & Evaluation

The cleaned dataset was split into training and test sets using train_test_split() with 20% held out for testing.

I experimented with several regression algorithms from scikit-learn:


🌲 What is Random Forest Regression?

Random Forest Regression is a powerful and widely used machine learning algorithm that belongs to the ensemble learning family. It combines the predictions of multiple decision trees to produce a more accurate and stable output.

🔎 How Does It Work?

  • Random Forest creates many decision trees using random subsets of the training data and features.

  • Each tree gives its own prediction for the output.

  • The final result is the average of all tree predictions (for regression tasks).

This helps reduce overfitting, a common issue in decision trees, and gives better generalization on unseen data.

🛠 Why I Used It in My Project:

I chose RandomForestRegressor from scikit-learn because:

  • It works well with both numerical and categorical (one-hot encoded) features.

  • It handles large feature spaces effectively (like our 44-feature input).

  • It doesn’t require feature scaling.

  • It’s robust to outliers and noise in the dataset.

🔍 Hyperparameter Tuning with GridSearchCV

To optimize RandomForestRegressor, I used:

GridSearchCV(estimator=rf, param_grid={
    'n_estimators': [10, 50, 100],
    'criterion': ['squared_error', 'absolute_error', 'poisson']
}, cv=5)

🧪 Final Evaluation

The best-performing model was:

RandomForestRegressor(criterion='absolute_error')

It achieved:

  • R² Score: 0.82 — explains 82% of price variation

  • MAE: Low, acceptable error in the LKR price range


⚙️ ML Pipeline Summary




🚧 Issues Faced




Laptop Price Predictor Web App Completed

 

                                                             web application interface

  •  Key Features:

    • Web-based form interface for user input

    • Real-time price prediction

    • Machine Learning model trained on a labeled dataset

    • Hosted locally with Flask (deployment-ready!)

    🔧 Tech Stack:

    • Python 🐍

    • Pandas, NumPy, Scikit-learn

    • Flask (backend)

    • HTML + CSS (frontend)

📦 Additional Features

  • Add GPU benchmarks or rating scores

  • Build a recommendation engine

  • Integrate user reviews (sentiment analysis)

  • Keep updating the model with new data


🧠 Reflections

This project gave me a firsthand understanding of what real ML work involves — not just model building, but data wrangling, debugging, evaluation, and iteration. I learned how preprocessing and clean data are the foundations of meaningful predictions.

As a machine learning beginner, this project helped me internalize how critical data understanding and pipeline structuring are in any ML workflow.


📣 Final Words

This is just the beginning of my ML journey. I hope this blog encourages other students and beginners to explore, build, break, fix, and share their work.

👉 Stay tuned — I’m currently working on deploying this model as a web app.
🔗 Connect with me on:GitHub !
💬 Connect with me on: LinkedIn !





Comments