Udacity Bertelsmann Scholar | Interested in Natural Language Processing, Deep Learning, Machine Learning
RECENTLY I HAVE READ THE FOLLOWING PAPERS:
From Theory to reality
Research Interests : Natural Language Processing, Deep Learning, Machine Learning, Information Retrieval. Read more about these projects below.
1. PHOTOREALISTIC TEXT-TO-IMAGE DIFFUSION MODELS WITH DEEP LANGUAGE UNDERSTANDING (BY GOOGLE BRAIN)
2. FEDNLP: BENCHMARKING FEDERATED LEARNING METHODS FOR NATURAL LANGUAGE PROCESSING TASKS
(NAACL 2022 FINDINGS)
Applying federated learning concept in downstream NLP tasks
3. “I’D RATHER JUST GO TO BED”: UNDERSTANDING INDIRECT ANSWERS (GOOGLE'S PAPER AT ACL)
By: Md Mosharaf Hossain, Venelin Kovatchev, Z Pranoy Dutta, Tiffany Kao, Elizabeth Wei, and Eduardo Blanco
(ACL paper by Tongfei Chen, Zhengping Jiang, Adam Poliak, Keisuke Sakaguchi, Benjamin Van Durme)
COURSEWORK: NATURAL LANGUAGE PROCESSING (NLP)
This was a 96-hour credit course involving quizzes, assignments, midterm, and end-semester examinations. Through this course, I have learned Language Models, Part-of-Speech Tagging, Hidden Markov Models, Grammars and Parsing, Statistical Constituency Parsing, Dependency Parsing, Word sense, and wordnet, Statistical Machine translation, Semantic web ontology, Question Answering, Dialogue Systems and Chatbots development, Sentiment analysis.
In one assignment I used News Headlines dataset for Sarcasm Detection:
https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection
For this, I used the Multinomial Naive Bayes model for classifying the headlines, used Hidden Markov Model (HMM) Parts of speech (POS) for displaying the first n rows of ‘headline’
COURSEWORK: DEEP LEARNING
This was a 128-hour credit course involving quizzes, assignments, Midterm, and Endterm examinations. In this course, Along with machine learning prerequisites, I have learned Deep feedforward networks, regularization, and optimization for deep models, Convolutional NeuralNet, Recurrent Nets, Autoencoders, Generative adversarial Nets (GAN).
In computer vision assignment, I worked on CIFAR-10- It dataset consists of 60000 32x32 colour images in 10 classes using google colab for Data Visualization and augmentation, used ResNet50 pretrained model, for Model Evaluation I used confusion matrix, precision, recall and two most incorrectly classified images for each class in the test dataset. For Hyperparameter Tuning I used Dropout and regularization, Please find your dataset from here: https://www.tensorflow.org/datasets/catalog/cifar10
In another assignment, I worked on NLP Dataset: Sentiment Analysis dataset - 1.6 Million tweets. Please find dataset from here - https://www.kaggle.com/kazanova/sentiment140
COURSEWORK: INFORMATION RETRIEVAL
This was a 128-hour credit course involving quizzes, assignments, midterm, and end sem examinations. In this course, I have learned Vector Space Model, efficient Text Mining, Web search, Cross-Lingual Retrieval, Multimedia Information Retrieval, Recommender Systems.
In the literature survey assignment, I worked on the "Deep Understanding of Query Intent" topic using python for understanding users' true intent when they search on online search engines about various topics.
I have read 6 papers for the Literature survey to describe existing methods, limitations, and future work. In most of the papers CNN, BERT and variants of LSTM were used for understanding bidirectional contexts to predict the next work as much correct as possible, whereas BERT gave a better result because of its architectural setup and superior performance. The survey document is available here: https://drive.google.com/file/d/13GD2lqL75wko7Z1wsxO9tPKk4ALy4p7Z/view?usp=sharing
COURSEWORK: MACHINE LEARNING
This was a 128 hours course involving quizzes, assignments, midterm, and end sem examinations. In this course, I have learned along with mathematical prerequisites of Bayesian Learning, Linear models for classification, Linear models for Regression, Decision trees, Instance-based Learning, Ensemble Learning, Support Vector Machine, Support Vector Machine, unsupervised learning, Neural network.
In one assignment I worked on predicting the length of Facebook comments or a post depending on various features provided in the dataset:
https://www.kaggle.com/kiranraje/prediction-facebook-comment#Dataset.csv
I used Python to perform this task.
COURSEWORK: DATA MINING
This was a 96 hours course involving quizzes, assignments, midterm, and end sem examinations.
I did a case study cum assignment to construct a classification model based on the available data of 37k data points with 25 features for hospital dataset for performing various activities pertaining to the data, such as preparing the dataset for analysis; checking for any correlations; creating a model; evaluating the performance of the ensemble XGBoost model.
COURSEWORK: BIG DATA SYSTEMS
This was a 128 hours-course involving quizzes, assignments, midterm, and end sem examinations. In this course, I have learned Parallel and Distributed Processing, Memory Hierarchy in Distributed Systems, Big Data Analytics and lifecycle, Distributed Computing - Design Strategy, Hadoop, Spark, Apache Kafka, SparkML.
As part of 1 assignment on Hadoop and Hive, I worked on New York City Taxi dataset analysis with visualization.
In 2nd assignment I worked on Hadoop for historical sales data analysis.