Elie Niring
Elie Niring
PROFESSIONAL SUMMARY
Innovative Generative AI Researcher & Comprehensive Machine Learning Developer with over 12 years
of expertise in leveraging deep learning, artificial intelligence, and statistical techniques to solve data
science challenges, enhancing organizational understanding, profitability, and market presence. Adept at
crafting algorithms, and deploying innovative solutions to complex business issues promptly, and
effectively; experienced in knowledge management systems, and language ontologies. Proficient in
executing solutions utilizing popular Generative AI, NLP frameworks, and libraries in Python (Langchain,
LlamaIndex, Hugging Face, NLTK, spaCy) or vector databases (Pinecone, FAISS). Familiar with the
implementation of Neural Networks, Support Vector Machines (SVM), and Random Forest techniques.
Keeps abreast of the latest advancements in data science, operations research, and Natural Language
Processing to ensure the use of cutting-edge techniques, algorithms, and technologies. Possesses
expertise in remote sensing; skilled in identifying, and creating suitable algorithms to uncover patterns,
and validate findings through experimental, and iterative methods. Strong interpersonal, and analytical
abilities, capable of multitasking, and adapting in high-pressure environments; a creative problem solver
with logical thinking skills, and keen attention to detail.
TECHNICAL EXPERIENCE
IDEs: Jupyter, Google Colab, PyCharm, R Studio
Python Libraries: Tensorflow, Pytorch, NLTK, Numpy, Pandas, OpenCV, Python Image
Library, Scikit-Learn, Scipy, Matplotlib, Seaborn, HuggingFace
Computer Vision: Convolutional Neural Network (CNN), HourGlass CNN, RCNNs, YOLO,
Generative Adversarial Network (GAN)
Tree Algorithms: Decision Tree, Bagging, Random Forest, AdaBoost, Gradient Boost,
XGBoost, Random Search and Grid Search
PROFESSIONAL EXPERIENCE
BAXTER – Deerfield, IL Sep 2023 – Present
Senior Data Scientist Architect – ML-Ops
In this project, I leveraged Elasticsearch for RAG implementation, enhancing few-shot prompting for the
Text2SQL engine. By deploying the application with FastAPI and Docker, I achieved reduced latency
through caching and automated scaling via Kubernetes. Utilizing advanced Large Language Models and
transformer architectures, analyzed consumer sentiment from platforms like Yammer and Cultura,
implementing classification algorithms to identify emerging trends. Additionally, I designed a CI/CD
pipeline to streamline data processing and model deployment, contributing to a solution that personalizes
Medicare plan recommendations during the annual enrollment period.
Utilized Elasticsearch for RAG implementation, establishing it as a source for few-shot prompting to
the Text2SQL engine.
Decreased latency in the analytics engine application by applying caching techniques.
Deployed the application using FastAPI, and Docker containers, incorporating automated scaling via
the Kubernetes framework.
Achieved a 30% reduction in LLM inference costs by utilizing Route LLM.
Leveraged advanced Large Language Models (LLMs), and transformer-based architectures to
analyze patterns, and trends in consumer comments, and posts from platforms like Yammer, and
Cultura, aiming to capture real-time insights into sentiment, and emerging trends.
Implemented classification algorithms to categorize consumer comments, and posts into predefined
topics, enabling the identification of trending content, and a deeper understanding of consumer
sentiment on specific subjects.
Developed techniques to automatically generate new topics from Yammer, and Cultura posts, and
comments, enriching the topic classification system with more relevant, and dynamic categories.
Seamlessly deployed, and managed both processes through Azure Pipelines, ensuring automated,
scalable, and efficient delivery.
Played a pivotal role in a collaborative project, overseeing crucial stages such as developing an
unsupervised outlier identification algorithm, and implementing the CI/CD pipeline.
Executed unsupervised outlier detection using a unique approach that involved five distinct outlier
detection methods to label the dataset, aggregating findings to create an outlier_percent column for
streamlined filtering.
Developed robust, and scalable data science solutions in Python, focusing on data preprocessing,
feature engineering, and machine learning model development.
Wrote efficient, and maintainable code for data analysis, model training, and deployment, adhering to
best practices in software development.
Implemented, and optimized machine learning algorithms in Python, utilizing popular libraries such as
TensorFlow, PyTorch, Scikit-Learn, and Pandas.
Leveraged Azure Cloud, Databricks, Jenkins, Docker, and Kubernetes to design, implement, and
manage a robust CI/CD pipeline for automated data ingestion, processing, model training, and
deployment.
Developed, and deployed scalable machine learning models using Azure Machine Learning,
optimizing performance, and cost-effectiveness for tasks such as classification, regression, and
clustering.
Hosted on Databricks, the pipeline seamlessly extracted raw data from Snowflake, conducted ETL
operations, identified outliers, and uploaded processed data to Azure Blob Storage.
Designed, and deployed scalable machine learning solutions in Azure, utilizing services like Azure
Machine Learning, Azure Databricks, and Azure Synapse Analytics.
Developed, and managed data pipelines using Azure Data Factory, and Azure Data Lake to support
data science workflows.
Implemented, and optimized cloud-based infrastructure for data storage, processing, and model
deployment.
Ensured the security, and compliance of data science applications by configuring, and managing
Azure Identity, and Access Management (IAM) roles, Key Vault, and encryption standards.
Monitored, and optimized the performance of machine learning models in production using Azure
Monitor, and Application Insights.
Leveraged transformer-based architectures like GPT, BERT, and T5 to build, and optimize models
capable of understanding, and generating human-like text.
Designed, and implemented NLP pipelines that preprocess, tokenize, and clean large text corpora for
model training, and inference.
Created, and managed complex SQL queries for extracting, transforming, and loading (ETL) large
datasets from relational databases, data warehouses, and cloud platforms.
Optimized SQL queries, and database structures to enhance data retrieval, and analytics processes.
Conducted data validation, cleansing, and preparation using SQL to ensure data quality, and integrity
for machine learning models.
Orchestrated Jenkins tasks for data preparation, merging dataset files into a unified CSV, and storing
it in the workspace's datastore with versioning.
Initiated model training tasks in Azure, executing code on a Databricks cluster, and saving the output
as a new model in AzureML.
Successfully deployed models using an Embedded Architecture approach, integrating models with
the app within a Docker image for efficient deployment.
Implemented creational design patterns in the CI/CD pipeline for reusability, and behavioral patterns
in algorithms, and integrations to enhance efficiency.
Collaborated within a team structure led by a Data Science Manager, working alongside three Data
Scientists to achieve project objectives.
Utilized tools such as Snowflake, Jenkins, Azure Cloud, Docker, Databricks, PySpark, and Twistlock
to streamline various aspects of the project.
Adopted a canary deployment process, starting with limited access, and gradually expanding to
ensure smooth, and controlled model deployment.
Contributed significantly to the overarching project goal of developing a workflow for personalizing
Medicare plan recommendations for members seeking new plans during annual enrollment.
Leveraged Langchain, and Azure OPENAI to build a full-fledged scalable Generative AI application.
Build a Natural language Data Analytics, and Text2SQL engine using Agentic workflow in langchain.
Integrating RAG (Retrieval Augmented Generation) with the Natural language Data analytics, and
Text2SQL engine to respond to queries specific to the organization.
NEW YORK LIFE INSURANCE – New York City, NY Nov 2015 – Jul 2017
Data Scientist
As a Data Scientist at New York Life Insurance, I developed customized product recommendations using
advanced machine learning algorithms, focusing on Collaborative Filtering to enhance customer
engagement and attract new clients. I led the design and deployment of various machine learning models,
including logistic regression and neural networks, while innovating optimization algorithms for diverse
applications. By conducting in-depth research on statistical techniques and leveraging tools like R and
Tableau for data visualization, I gained valuable insights into customer behavior, ensuring high data
integrity through meticulous cleaning and analysis.
Developed tailored product recommendations by implementing sophisticated machine learning
algorithms, focusing on Collaborative Filtering to meet the unique needs of current customers, and
attract new ones.
Led the design, and deployment of a variety of machine learning algorithms, utilizing techniques such
as logistic regression, random forest, KNN, SVM, neural networks, linear regression, lasso
regression, and k-means for comprehensive modeling.
Innovated optimization algorithms specifically designed for data-driven models, broadening their use
across multiple machine learning approaches, including supervised, unsupervised, and reinforcement
learning.
Conducted thorough research on statistical machine learning techniques, covering forecasting,
supervised learning, classification, and Bayesian methods, to integrate cutting-edge methods into the
modeling framework.
Enhanced the technical complexity of solutions by incorporating machine learning, and advanced
technologies, leading to improved overall model performance.
Performed exploratory data analysis, and created impactful data visualizations using R, and Tableau
to deepen insights into underlying data patterns.
Collaborated effectively with data engineers to implement the ETL process, playing a key role in
optimizing SQL queries for efficient data extraction, and merging from Oracle databases.
Utilized a diverse skill set in R, Python, and Spark to develop a range of models, and algorithms,
addressing various analytical needs within the project.
Maintained data integrity through thorough checks, effective data cleaning, exploratory analysis, and
feature engineering, using both R, and Python to uphold high data quality standards.
EDUCATION
Master of Science in Information Technology
Carnegie Mellon University – 2012