About
Highly accomplished Computer Science graduate with a strong foundation in Data Science, Machine Learning, and Deep Learning, evidenced by multiple IIT research internships and published papers. Proven expertise in designing scalable ML pipelines, building production-ready AI models, and deploying solutions using cloud and MLOps, leveraging Python, C++, and SQL. Seeking to apply research-backed innovation and advanced problem-solving skills as a Data Scientist, AI/ML Engineer, or Deep Learning Engineer.
Work
Indian Institute of Technology (Banaras Hindu University)
|AI/ML Research Intern (Hybrid)
Varanasi, Uttar Pradesh, India
→
Summary
Led research and implementation of continual learning strategies for molecular property prediction, mitigating catastrophic forgetting and balancing model stability with plasticity.
Highlights
Led implementation and co-authored research addressing catastrophic forgetting in molecular property prediction, applying continual learning and refresh-learning strategies to enhance model robustness.
Designed and deployed the MTL-PORL (Multi-Task Learner - Pareto Optimized Refresh Learning) framework using ChemBERTa, integrating refresh learning with Pareto optimization to balance stability and plasticity.
Engineered multi-task/hierarchical gradient aggregation and hyper-gradient based unlearning pipelines, significantly improving knowledge retention across sequential learning episodes.
Conducted extensive experiments on BBBP, bitter, and sweet molecular datasets, achieving Anytime Avg. Accuracies up to 94.89% and Test Accuracies up to 96.86%, while reducing forgetting measures to as low as -0.0063.
Indian Institute of Technology, Patna
|DL/NLP Research Intern (Remote)
Patna, Bihar, India
→
Summary
Designed and implemented an efficient OCR and NLP pipeline for processing Hindi legal documents, including summarization for low-resource languages.
Highlights
Designed and implemented an efficient OCR pipeline utilizing PyMuPDF and Tesseract to accurately extract text from over 1,000 Hindi legal documents.
Developed LLM-based summarization systems specifically tailored for low-resource Indian languages, effectively processing complex legal Hindi texts.
Expanded and preprocessed a dataset of over 2,000 Hindi legal documents, significantly improving model training efficacy and diversity.
Collaborated on advanced NLP strategies, integrating open-source tools like Open Hathi, to enhance summarization accuracy in the legal domain.
Indian Institute of Technology, Bhilai
|ML Research Intern (Remote)
Bhilai, Chhattisgarh, India
→
Summary
Developed and optimized time-series anomaly detection models and ETL pipelines, enhancing data processing efficiency and model stability in resource-constrained environments.
Highlights
Developed advanced time-series anomaly detection models leveraging PySpark and SQL, successfully reducing false positives by 20%.
Engineered robust data preprocessing and ETL pipelines, optimizing the cleaning, imputation, normalization, and structuring of high-frequency sensor data for ML models.
Deployed production-ready Apache Airflow DAGs, automating data validation, model scheduling, and monitoring, which significantly reduced manual intervention and enhanced system reliability.
Optimized feature engineering and training workflows, leading to improved model throughput and enhanced stability within resource-constrained environments.
Education
Chhattisgarh Swami Vivekanand Technical University
→
B.Tech (Hons)
Computer Science & Engineering (Data Science)
Grade: 8.5/10 CGPA
Sheth Vidya Mandir English High School
→
Junior College (Senior Secondary)
Science
Grade: 95%
Ryan International School
→
Senior Secondary (Secondary)
Science
Grade: 86%
Languages
English
Hindi
Certificates
Skills
Machine Learning & Deep Learning
TensorFlow, PyTorch, Keras, Scikit-learn, Hugging Face Transformers, BERT, CNNs, RNNs, GNNs, Continual Learning, ChemBERTa, RDKit, Pareto Optimization, ResNet50, MobileNetV2.
Data Science & Analytics
Python (Pandas, NumPy, SciPy), R, SQL, Statistics, Probability, Hypothesis Testing, Feature Engineering, Model Evaluation, PySpark, Tableau, Power BI, Matplotlib, Seaborn, Time-Series Analysis.
Data Engineering & MLOps
ETL pipelines, Apache Spark, Hadoop, BigQuery, Apache Airflow, DAG, AWS (S3, EC2), GCP, Docker, CI/CD, GitHub Actions, REST APIs, Streamlit.
Computer Vision & NLP
OpenCV, Tesseract, PyMuPDF, Transformer Models, OCR, LLM-based Summarization, Text Analytics, NLP, Large Language Models, Summarization.
Programming & Systems
Python, C/C++, JavaScript (Node.js), Bash, Linux/Unix, Object-Oriented Design, Data Structures, Algorithms, Solidity, React, IPFS, Hardhat, Ethers.js, NetworkX.
Projects
Overcoming Catastrophic Forgetting in Molecular Property Prediction Using Continual Learning
→
Summary
Implemented refresh-learning strategies with PyTorch and RDKit to mitigate catastrophic forgetting in predictive models.
Time Series Anomaly Detection using Graph Neural Networks (GNN)
→
Summary
Developed GNN models with PyTorch and NetworkX to detect anomalies in time-series data.
Development of Legal Language Summarization Systems for Hindi
→
Summary
Architected OCR + NLP pipeline using PyMuPDF, Pytesseract, and transformer-based summarization to process 1,000+ legal PDFs.