RECENTLY I HAVE READ THE FOLLOWING PAPERS:

​From Theory to reality

Research Interests : Natural Language Processing, Deep Learning, Machine Learning, Information Retrieval. Read more about these projects below.

1. PHOTOREALISTIC TEXT-TO-IMAGE DIFFUSION MODELS WITH DEEP LANGUAGE UNDERSTANDING  (BY GOOGLE BRAIN)

Applying federated learning concept in downstream NLP tasks

3. “I’D RATHER JUST GO TO BED”: UNDERSTANDING INDIRECT ANSWERS          (GOOGLE'S PAPER AT ACL)

By: Md Mosharaf Hossain, Venelin Kovatchev, Z Pranoy Dutta, Tiffany Kao, Elizabeth Wei, and Eduardo Blanco

(ACL paper by Tongfei Chen, Zhengping Jiang, Adam Poliak, Keisuke Sakaguchi, Benjamin Van Durme)

NATURAL LANGUAGE PROCESSING (NLP)

This was a 96 hours course involving quizzes, assignments, midterm, and end-semester examinations. Through this course, I have learned Language Models, Part-of-Speech Tagging, Hidden Markov Models, Grammars and Parsing, Statistical Constituency Parsing, Dependency Parsing, Word sense, and wordnet, Statistical Machine translation, Semantic web ontology, Question Answering, Dialogue Systems and Chatbots development, Sentiment analysis. 

In one assignment I used News Headlines dataset for Sarcasm Detection:
https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection

For this, I used the Multinomial Naive Bayes model for classifying the headlines, used Hidden Markov Model (HMM) Parts of speech (POS) for displaying the first n rows of ‘headline’

DEEP LEARNING

​This was a 128 hours-course involving quizzes, assignments, midterm, and end sem examinations. In this course, Along with machine learning prerequisites, I have learned Deep feedforward networks, regularization, and optimization for deep models, Convolutional NeuralNet, Recurrent Nets, Autoencoders, Generative adversarial Nets (GAN). 

In computer vision assignment, I worked on CIFAR-10- It dataset consists of 60000 32x32 colour images in 10 classes using google colab for Data Visualization and augmentation, used ResNet50 pretrained model, for Model Evaluation I used confusion matrix, precision, recall and two most incorrectly classified images for each class in the test dataset. For Hyperparameter Tuning I used Dropout and regularization, Please find your dataset from here: https://www.tensorflow.org/datasets/catalog/cifar10

In another assignment, I worked on NLP Dataset: Sentiment Analysis dataset - 1.6 Million tweets. Please find dataset from here - https://www.kaggle.com/kazanova/sentiment140

INFORMATION RETRIEVAL


​This was a 128 hours-course involving quizzes, assignments, midterm, and end sem examinations. In this course, I have learned Vector Space Model, efficient Text Mining, Web search, Cross-Lingual Retrieval, Multimedia Information Retrieval, Recommender Systems.

In the literature survey assignment, I worked on the "Deep Understanding of Query Intent" topic using python for understanding users' true intent when they search on online search engines about various topics.


I have read 6 papers for the Literature survey to describe existing methods, limitations, and future work. In most of the papers CNN, BERT and variants of LSTM were used for understanding bidirectional contexts to predict the next work as much correct as possible, whereas BERT gave a better result because of its architectural setup and superior performance. The survey document is available here: https://drive.google.com/file/d/13GD2lqL75wko7Z1wsxO9tPKk4ALy4p7Z/view?usp=sharing  

MACHINE LEARNING

This was a 128 hours course involving quizzes, assignments, midterm, and end sem examinations. In this course, I have learned along with mathematical prerequisites of Bayesian Learning, Linear models for classification, Linear models for Regression, Decision trees, Instance-based Learning, Ensemble Learning, Support Vector Machine, Support Vector Machine, unsupervised learning, Neural network.

In one assignment I worked on predicting the length of Facebook comments or a post depending on various features provided in the dataset:
https://www.kaggle.com/kiranraje/prediction-facebook-comment#Dataset.csv
I used Python to perform this task.

DATA MINING

This was a 96 hours course involving quizzes, assignments, midterm, and end sem examinations.
I did a case study cum assignment to construct a classification model based on the available data of 37k data points with 25 features for hospital dataset for performing various activities pertaining to the data, such as preparing the dataset for analysis; checking for any correlations; creating a model; evaluating the performance of the ensemble XGBoost model.

BIG DATA SYSTEMS

This was a 128 hours-course involving quizzes, assignments, midterm, and end sem examinations. In this course, I have learned Parallel and Distributed Processing, Memory Hierarchy in Distributed Systems, Big Data Analytics and lifecycle, Distributed Computing - Design Strategy, Hadoop, Spark, Apache Kafka, SparkML.

As part of 1 assignment on Hadoop and Hive, I worked on New York City Taxi dataset analysis with visualization.

In 2nd assignment I worked on Hadoop for historical sales data analysis.

DEEP LEARNING

In an effort to gain a better understanding of The Internet of Things, I have recently begun to use a new technique to investigate the organization and functionality of the diverse parts of my experimental model. I am currently looking to expand this work by collaborating with other labs who have the facilities and prior experience to investigate this project further.

DATA MINING & MACHINE LEARNING

Building upon work done by a former lab colleague, I have developed a powerful tool for use in the identification and characterization of the processes in my model system. A major advantage of this development is its improved sensitivity, which allows it to detect subtle dynamic property changes in response to my experimental setup.