Links in small caps.

Samir Bajaj

Builder.
Entrepreneur.
Mentor.
Writer.
Perpetual Learner.
Wannabe Musician.

Twitter LinkedIn Github Kaggle

Here is a non-exhaustive list of the personal and professional projects that I have worked on; most of them are in private repos for legal or other reasons.

ML & NLP

  • Siri—A pioneering personal digital assistant that laid the foundation for voice-enabled applications. See my resumé for details of my contributions.
  • Document Understanding—An end-to-end system that combines OCR and NLP technologies to extract structured data from business documents like invoices and receipts, at scale.
  • Fake News Detector—This one initially started as a school project, later became the basis of my Kaggle entry. This project explores various neural network-based architectures in classifying the authenticity of a news report. Written in Python using TensorFlow 1.0.
  • Sentiment Analysis—A multi-class classifier for movie reviews, using stacked, bidirectional LSTMs. Python and Keras.
  • Image Classification—Explores transfer learning using pre-trained VGG and ResNet architectures. Python and Keras.
  • POS Tagging—One of my earlier projects to understand CRFs. This one was written in Java using Mallet.
  • Named Entity Recognizer—This NER system uses word-level as well as character-level LSTMs to classify entities in text. Python and Keras.
  • Language Modeling—Built N-gram as well as RNN-based language models, mostly as part of other projects like Spelling Correction.
  • Transformer—An implementation of the design in the paper Attention is All You Need. Python and Keras.
  • Question Answering—An attempt at using SQuAD for a basic QA system. Python and Keras.
  • Fraud Detector—Using XGBoost and Scala, developed a simple model to detect outliers.

Systems

  • ETL/Data Pipelines—Every ML project starts with data, and every company has disparate silos of data that need to be extracted, processed, and consolidated into a form that can be consumed for training, analysis, or inference. I have built data pipelines to address business needs at various organizations in my career.
  • A DSL for Business Rules—Defined and implemented an XML-based rules engine that allowed business analysts and other stakeholders to specify contract terms for vendors and customers in the field. An interpreter subsequently consumed the rules and executed the corresponding logic to process transactions in accordance with the intended contract. Written in Java.
  • Container-Based Function Server—A front end to a Kubernetes-based deployment system with a simple RESTful interface that accepts a container image path and the desired resource allocation as inputs, and creates a microservice with a custom HTTP endpoint. Scala, Akka.
  • Model Store Service—An HDFS-backed object store complemented by a MySQL-based metadata repository to track versioned ML models. Scala, Akka.
  • Frictionless Deployment of ML Models—A model server enhanced to deliver predictions without having to deploy any microservices. The system accepts as input the format or source of the model, as well as the data needed to generate the prediction, and makes a forward pass to compute the result. Python, Flask.