Search

Thesis
Peer Reviewed

Fast, Efficient, and Robust Learning with Brain-Inspired Hyperdimensional Computing

UC San Diego Electronic Theses and Dissertations (2022)

With the emergence of the Internet of Things (IoT), devices will generate massive datastreams demanding services that pose huge technical challenges due to limited device resources. Furthermore, IoT systems increasingly need to run complex and energy intensive Machine Learning (ML) algorithms, but do not have the resources to run many state-of-the-art ML models, instead opting to send their data to the cloud for computing. This results in insufficient security, slower moving data, and energy intensive data centers. In order to achieve real-time learning in IoT systems, we need to redesign the algorithms themselves using strategies that more closely model the ultimate efficient learning machine: the human brain.

This dissertation focuses on increasing the computing efficiency of machine learning onIoT devices with the application of Hyperdimensional Computing (HDC). HDC mimics several desirable properties of the human brain, including: robustness to noise, robustness to hardware failures, and single-pass learning where training happens in one-shot without storing the training data points or using complex gradient-based algorithms. These features make HDC a promising solution for today’s embedded devices with limited storage, battery, and resources, and the potential for noise and variability. Research in the HDC field has targeted improving these key features of HDC and expanding to include even more features. There are four main paths in HDC research: (1) Algorithmic changes for faster and more energy efficient learning, (2) Novel architectures to accelerate HDC, usually targeting lower power IoT devices, (3) Extending HDC applications beyond classification, (4) Exploiting the robust property of HDC for more efficient and faster inference, and (5) HDC Theory, its connection to neuroscience and mathematics. This dissertation contributes to four of these research paths in HDC.

Our contributions include: (1) We introduce the first adaptive bitwidth model for HDC.In this work we propose a new quantization method and during inference we iterate through the bits along all dimensions taking the hamming distance. At each iteration, we check if the current hamming distance passes a threshold similarity, if it does, we terminate execution early to save energy and time. (2) We create a redesign of the entire HDC process with a locality-based encoding, quantized retraining, and online dimension reduction during inference, all accelerated by a new novel FPGA design. In this work we our locality-based encoding removes random memory accesses from HDC encoding as well as adds sparsity for more efficiency. We also introduce a general method to quantize to any desired model bitwidth. Finally, we propose a method to find any insignificant dimensions in the HDC model and remove them for more energy efficiency during inference. (3) We extend HDC to support multi-label classification. We perform multi-label classification by creating a binary classification model for each label. Upon inference, our models determine if each label exists independently. This is different than prior work that took the power set of the labels to reduce the problem to a single label classification as HDC scales poorly with this method. (4) Finally, we experimentally evaluate the robustness of HDC for the first time and create a new analog PIM architecture with reduced precision Analog to Digital Converters (ADC), exploiting that robustness. We test HDC robustness in a federated learning environment where edge devices send encoded hypervectors to a central server wirelessly. We evaluate the impact of any wireless transmission errors on this data and show that HDC is 48× more robust than other classifiers. We then use this knowledge that HDC is robust to create a more efficient analog PIM circuit by reducing the bitwidth of the ADCs.

Cover page: Fast, Efficient, and Robust Learning with Brain-Inspired Hyperdimensional Computing

Article
Peer Reviewed

Enhanced Noise-Resilient Pressure Mat System Based on Hyperdimensional Computing.

UC San Diego Previously Published Works (2024)

Traditional systems for indoor pressure sensing and human activity recognition (HAR) rely on costly, high-resolution mats and computationally intensive neural network-based (NN-based) models that are prone to noise. In contrast, we design a cost-effective and noise-resilient pressure mat system for HAR, leveraging Velostat for intelligent pressure sensing and a novel hyperdimensional computing (HDC) classifier that is lightweight and highly noise resilient. To measure the performance of our system, we collected two datasets, capturing the static and continuous nature of human movements. Our HDC-based classification algorithm shows an accuracy of 93.19%, improving the accuracy by 9.47% over state-of-the-art CNNs, along with an 85% reduction in energy consumption. We propose a new HDC noise-resilient algorithm and analyze the performance of our proposed method in the presence of three different kinds of noise, including memory and communication, input, and sensor noise. Our system is more resilient across all three noise types. Specifically, in the presence of Gaussian noise, we achieve an accuracy of 92.15% (97.51% for static data), representing a 13.19% (8.77%) improvement compared to state-of-the-art CNNs.

Cover page: Enhanced Noise-Resilient Pressure Mat System Based on Hyperdimensional Computing.

Article
Peer Reviewed

Swapping Metagenomics Preprocessing Pipeline Components Offers Speed and Sensitivity Increases

UC San Diego Previously Published Works (2022)

Increasing data volumes on high-throughput sequencing instruments such as the NovaSeq 6000 leads to long computational bottlenecks for common metagenomics data preprocessing tasks such as adaptor and primer trimming and host removal. Here, we test whether faster recently developed computational tools (Fastp and Minimap2) can replace widely used choices (Atropos and Bowtie2), obtaining dramatic accelerations with additional sensitivity and minimal loss of specificity for these tasks. Furthermore, the taxonomic tables resulting from downstream processing provide biologically comparable results. However, we demonstrate that for taxonomic assignment, Bowtie2's specificity is still required. We suggest that periodic reevaluation of pipeline components, together with improvements to standardized APIs to chain them together, will greatly enhance the efficiency of common bioinformatics tasks while also facilitating incorporation of further optimized steps running on GPUs, FPGAs, or other architectures. We also note that a detailed exploration of available algorithms and pipeline components is an important step that should be taken before optimization of less efficient algorithms on advanced or nonstandard hardware. IMPORTANCE In shotgun metagenomics studies that seek to relate changes in microbial DNA across samples, processing the data on a computer often takes longer than obtaining the data from the sequencing instrument. Recently developed software packages that perform individual steps in the pipeline of data processing in principle offer speed advantages, but in practice they may contain pitfalls that prevent their use, for example, they may make approximations that introduce unacceptable errors in the data. Here, we show that differences in choices of these components can speed up overall data processing by 5-fold or more on the same hardware while maintaining a high degree of correctness, greatly reducing the time taken to interpret results. This is an important step for using the data in clinical settings, where the time taken to obtain the results may be critical for guiding treatment.

Cover page: Swapping Metagenomics Preprocessing Pipeline Components Offers Speed and Sensitivity Increases

Article

Wastewater and surface monitoring to detect COVID-19 in elementary school settings: The Safer at School Early Alert project

UC San Diego Previously Published Works (2023)

BACKGROUND: Schools are high-risk settings for SARS-CoV-2 transmission, but necessary for children's educational and social-emotional wellbeing. Previous research suggests that wastewater monitoring can detect SARS-CoV-2 infections in controlled residential settings with high levels of accuracy. However, its effective accuracy, cost, and feasibility in non-residential community settings is unknown. METHODS: The objective of this study was to determine the effectiveness and accuracy of community-based passive wastewater and surface (environmental) surveillance to detect SARS-CoV-2 infection in neighborhood schools compared to weekly diagnostic (PCR) testing. We implemented an environmental surveillance system in nine elementary schools with 1700 regularly present staff and students in southern California. The system was validated from November 2020 - March 2021. FINDINGS: In 447 data collection days across the nine sites 89 individuals tested positive for COVID-19, and SARS-CoV-2 was detected in 374 surface samples and 133 wastewater samples. Ninety-three percent of identified cases were associated with an environmental sample (95% CI: 88% - 98%); 67% were associated with a positive wastewater sample (95% CI: 57% - 77%), and 40% were associated with a positive surface sample (95% CI: 29% - 52%). The techniques we utilized allowed for near-complete genomic sequencing of wastewater and surface samples. INTERPRETATION: Passive environmental surveillance can detect the presence of COVID-19 cases in non-residential community school settings with a high degree of accuracy. FUNDING: County of San Diego, Health and Human Services Agency, National Institutes of Health, National Science Foundation, Centers for Disease Control.

Cover page: Wastewater and surface monitoring to detect COVID-19 in elementary school settings: The Safer at School Early Alert project

Article
Peer Reviewed

Safer at school early alert: an observational study of wastewater and surface monitoring to detect COVID-19 in elementary schools

UC San Diego Previously Published Works (2023)

Background

Schools are high-risk settings for SARS-CoV-2 transmission, but necessary for children's educational and social-emotional wellbeing. Previous research suggests that wastewater monitoring can detect SARS-CoV-2 infections in controlled residential settings with high levels of accuracy. However, its effective accuracy, cost, and feasibility in non-residential community settings is unknown.

Methods

The objective of this study was to determine the effectiveness and accuracy of community-based passive wastewater and surface (environmental) surveillance to detect SARS-CoV-2 infection in neighborhood schools compared to weekly diagnostic (PCR) testing. We implemented an environmental surveillance system in nine elementary schools with 1700 regularly present staff and students in southern California. The system was validated from November 2020 to March 2021.

Findings

In 447 data collection days across the nine sites 89 individuals tested positive for COVID-19, and SARS-CoV-2 was detected in 374 surface samples and 133 wastewater samples. Ninety-three percent of identified cases were associated with an environmental sample (95% CI: 88%-98%); 67% were associated with a positive wastewater sample (95% CI: 57%-77%), and 40% were associated with a positive surface sample (95% CI: 29%-52%). The techniques we utilized allowed for near-complete genomic sequencing of wastewater and surface samples.

Interpretation

Passive environmental surveillance can detect the presence of COVID-19 cases in non-residential community school settings with a high degree of accuracy.

Funding

County of San Diego, Health and Human Services Agency, National Institutes of Health, National Science Foundation, Centers for Disease Control.

Cover page: Safer at school early alert: an observational study of wastewater and surface monitoring to detect COVID-19 in elementary schools