Hi, I'm Harish

23 | Data Science | Software Engineering | AI & Machine Learning | Generative AI

Open My Resume Contact

About

My Introduction

I am a final-year Integrated MTech student specializing in Data Science at VIT, with a strong foundation in Python, Machine Learning, NLP, Computer Vision, and Data Analytics. I have hands-on experience in developing and deploying AI/ML models, building data-driven solutions, and extracting meaningful insights from large-scale datasets. Passionate about leveraging cutting-edge AI frameworks and advanced analytics, I thrive on solving complex real-world problems and contributing to innovative R&D projects. If you're looking for someone to drive impactful AI initiatives and turn data into actionable intelligence, let's connect!

2 Internships

4 Professional Certifications

1 Microsoft Certified Badge

1 Externship Workshop

Follow on LinkedIn

Leave a message

Experience

My journey in the academic & professional front

Academic

Professional

(INTEGRATED) Master of Technology - Computer Science and Engineering with Specialization in Data Science

Vellore Institute of Technology, India CGPA : 7.8

2020 - 2025

Class XII

Maths, Physics, Chemistry, Computer Science State Board of Secondary Education TamilNadu |THIRU G V C Higher Secondary School, India Percentage : 79 %

2020

Class X

State Board of Secondary Education TamilNadu | St joseph matriculation higher Secondary School, India Percentage : 87.8 %

2018

Software Developer Trainee

Fact Entry Data Solutions-A SIX Company

Dec 2025 - Present

Associate Software Engineer - Trainee Intern

MIMASOFT Technologies Private Limited Ref Link

Jun 2025 - Oct 2025

MLops and Data Science Intern

Fact Entry Data Solutions-A SIX Company Ref Link

Nov 2024 - May 2025

Data Science Intern (ML and SDE )

JP INFOTECH Ref Link

Nov 2023 - Jan 2024

Applied Data Science Extern

Smart Bridge - Smartinternz Ref Link

May 2023 - Jul 2023

Skills

My technical & other skills

Machine Learning

Deeplearning

Generative AI

Probability & Statistics

Computer Vision

Natural Language Processing

Data Mining

Ollama Models Configuring and finetuning

AI/ML Frameworks & Libraries : TensorFlow, PyTorch, Scikit-learn

Python

SQL

C++/C

Python-Flask / Python-Django

FastAPI

Docker

Kubernetes

GPU & Distributed Computing

Google Cloud Platform (GCP)

Microsoft Azure

Data Preparation

ETL Pipelines

Data Warehousing

SQL (MySQL)

NoSQL (MongoDB)

Data Lakes

ETL Tools

Power BI

Excel

Business Analytics

Data Storytelling

Git / Gitlab / Github

CI/CD

Automated Testing & Deployment

Projects

My Projects and Works

AI-Powered Document Q&A and Data Extraction System

FactEntry Data Solutions (A SIX Company)

Objective: Developed an AI-driven RAG system for extracting structured data from scanned documents, including tables, forms, and text.
Key Contributions: • Integrated Google Generative AI Embeddings (Google API) and LLaMA 3.3 (Groq API) for efficient retrieval and response generation.
• Implemented OCR-based text extraction using PyTesseract, enhanced with OpenCV preprocessing for improved accuracy.
• Leveraged Hugging Face LayoutLM for extracting data from complex layouts, including borderless tables and unstructured forms.
• Developed a Streamlit web interface for real-time document analysis and deployed it securely via ngrok.
Impact: Automated document processing, reducing manual effort while improving data accuracy and retrieval speed.

Link: To be updated

Hate Speech and Harmful Content Classification Using LLM

Vellore Institute of Technology | Capstone

Objective: Developed an AI-powered cybersecurity system for detecting hate speech and harmful content on the web and social media using NLP and LLMs.
• Designed a hybrid approach integrating ML/DL models with a fine-tuned LLM for advanced content moderation.
• Experimented with LLaMA 3.3, 3.2, DeepSeek 7B/14B, and WizardLM 7B via Ollama to optimize detection.
• Automated web scraping, text preprocessing, and structured content extraction.
• Deployed a real-time monitoring system for detecting and flagging harmful content.
Impact: Enhanced cybersecurity and content moderation, reducing manual effort while improving detection accuracy.

Visit and Explore my Custom Model PULL via DockerHub View on GitHub

Language Distribution & Document Clustering Tool

FactEntry Data Solutions (A SIX Company)

• Developed and deployed Language Distribution and Document Layout Clustering tools, releasing basic versions on PyPI and advanced versions for FactEntry.
Key Contributions:
• Used Tesseract OCR, LangDetect, and StableLM to analyze PDFs, providing language distribution breakdowns (e.g., 60% English, 30% Spanish).
• Built a clustering system using VGG16, ResNet, and PCA-based fine-tuning for document classification.
• Applied OCR and deep learning for accurate data extraction from scanned financial documents.
• Integrated LLaMA Vision for enhanced image analysis, improving document ranking and content grouping.

View on PyPI View on GitHub

Disease Prediction using Ensemble Learning

SMART INTERNZ

• Engineered a disease prediction model using Python and machine learning algorithms like Random Forest and SVM, achieving high accuracy in clinical trial predictions based on extensive patient historical data.
Key Contributions:
• Stored data in MongoDB Atlas, retrieved it with PyMongo, and transformed it into a binary dataset for analysis.

Real-Time Room Temperature & Humidity Visualization (AR)

J Component for ARVR Course

• Engineered a real-time AR system integrating the MQTT protocol to capture and visualize environmental data (temperature and humidity) using Unity-based 3D models.
• Enhanced user engagement with actionable ML-driven insights and provided interactive climate reporting.

Extractive Summarization & Information Retrieval

J Component for Web Mining Course

• Designed an advanced web scraping and information retrieval system leveraging natural language processing (NLP).
• Enabled the system to understand user queries in English, translate them into SQL, and extract targeted data with 95% accuracy from multiple websites.
• Stored extracted data in MySQL using SQLite3 in Python.
• Automated user queries in English to retrieve relevant data efficiently.

Application Tracking System

JP INFOTECH

Developed an NLP-based application tracking system using Python to analyze resumes, identify weak sentences, suggest improvements, and recommend keywords based on job descriptions. Implemented an n-gram model to predict the next word a user might type, providing a typing assist feature. • Developed an OCR-based system leveraging PyTesseract, combined with OpenCV preprocessing techniques scanned documents with 95% accuracy.
• Built a pipeline for recognizing complex tabular structures, including borderless tables, by integrating Hugging Face models and custom algorithms.
• Automated document processing across various formats (PDF, images) by integrating multi-API workflows, reducing manual efforts.
• Optimized data extraction workflows for large-scale datasets, enhancing performance and reducing processing time by 30%.
• Collaborated on ML-driven NLP projects to generate automated suggestions for keyword optimization and grammar correction.

Certification and Accomplishments

My Course work Certification ,Badges and Accomplishments

Microsoft Certified: Azure AI Fundamentals

Microsoft || 2023
• Successfully completed the AI-900 exam, Demonstrated foundational knowledge of AI, including machine learning, computer vision, nature language processing, and Azure AI services.
• Acquired hands-on experience with Azure Machine Learning, Cognitive Services, and responsible AI principles.

View

SmartInternz Externship Program on Applied Data Science Powered by Google Developers

SMART INTERNZ || 2023
• Gained hands-on experience in machine learning and deep learning, applying techniques to solve real-world problems with guidance from industry experts.
• Participated in assessments on data preprocessing, model development, and evaluation, improving technical skills and problem-solving abilities through collaboration and feedback.

View

Data Analysis and Visualization with Power BI

Microsoft || Coursera • 2025
• Data Analysis, Data Visualization, Power BI, Design Reports, Design Dashboards.

View

Preparing Data for Analysis with Microsoft Excel

Microsoft || Coursera • 2024
• Explored features for efficiently viewing and analysing large data blocks, enhancing data management and interpretation skills.
Gained proficiency in creating accurate calculations.

View

Harnessing the Power of Data with Power BI

Microsoft || Coursera • 2024
• Conducted hands-on analyses, including product launch, market campaign, and sales trend analyses.
• developed job-relevant skills that contribute to data-driven decision-making processes by understanding the complete data analysis lifecycle.

View

Extract, Transform, and Load Data in Power BI

Microsoft || Coursera • 2024
• Gained skills in data analysis, transformation, configuration, and Power BI tools such as Power Query.
•Learned to set up data sources, configure storage modes, clean and transform data, and use Editor for code modifications.

View

Contact

Get in touch with me

Email

harishkumar56278@gmail.com

Location

Ranipet,Tamil Nadu, India | Chennai, India

Hi, I'm Harish

About

Experience

(INTEGRATED) Master of Technology - Computer Science and Engineering with Specialization in Data Science

Class XII

Class X

Software Developer Trainee

Associate Software Engineer - Trainee Intern

MLops and Data Science Intern

Data Science Intern (ML and SDE )

Applied Data Science Extern

Skills

Data Science & AI

Machine Learning

Deeplearning

Generative AI

Probability & Statistics

Computer Vision

Natural Language Processing

Data Mining

Ollama Models Configuring and finetuning

AI/ML Frameworks & Libraries : TensorFlow, PyTorch, Scikit-learn

Programming & Backend Development

Python

SQL

C++/C

Python-Flask / Python-Django

FastAPI

Docker

Kubernetes

Computing & Cloud

GPU & Distributed Computing

Google Cloud Platform (GCP)

Microsoft Azure

Data Engineering & ETL

Data Preparation

ETL Pipelines

Data Warehousing

SQL (MySQL)

NoSQL (MongoDB)

Data Lakes

ETL Tools

Data Analytics & Visualization

Power BI

Excel

Business Analytics

Data Storytelling

DevOps & Version Control

Git / Gitlab / Github

CI/CD

Automated Testing & Deployment

Projects

AI-Powered Document Q&A and Data Extraction System

Hate Speech and Harmful Content Classification Using LLM

Language Distribution & Document Clustering Tool

Disease Prediction using Ensemble Learning

Real-Time Room Temperature & Humidity Visualization (AR)

Extractive Summarization & Information Retrieval

Application Tracking System

Certification and Accomplishments

Microsoft Certified: Azure AI Fundamentals

SmartInternz Externship Program on Applied Data Science Powered by Google Developers

Data Analysis and Visualization with Power BI

Preparing Data for Analysis with Microsoft Excel

Harnessing the Power of Data with Power BI

Extract, Transform, and Load Data in Power BI

Contact

Email

harishkumar56278@gmail.com

Location

Call and Schedule a meet

+91 73973 27957