Electrical and Computer Engineeringhttp://hdl.handle.net/10012/99082024-03-28T08:55:46Z2024-03-28T08:55:46ZA Centralized System Performance Monitoring InfrastructureMohammed Sajjad Jafri, Mohammed Sajjad Jafrihttp://hdl.handle.net/10012/204032024-03-23T02:30:55Z2024-03-22T00:00:00ZA Centralized System Performance Monitoring Infrastructure
Mohammed Sajjad Jafri, Mohammed Sajjad Jafri
In this thesis, we introduce a centralized performance monitoring infrastructure. In the current computing landscape, performance monitoring architectures are becoming more and more important for different academic and industrial applications. Performance counters reveal valuable insight into the functioning of the platform. This information can then be exploited for debugging applications, improving performance, identifying bottlenecks, and much more. In our proposed infrastructure, we envision a configurable Advanced Performance Monitoring Unit (APMU) connected to a set of monitoring Event Units (EVU) that are installed in various hardware system IPs across the platform. These EVUs send hardware event information to the APMU. The APMU has smart counters that are capable of operating on the incoming events, and an instruction processor that can implement any desired software mechanisms on the counter data. Our design allows for an efficient collection and correlation of event data, allowing the APMU to get a more holistic insight into the system behaviour, revealing microarchitecture-specific information. We intend to allow users the ability to develop EVUs for IPs relevant to them. For instance, in the implementation phase of this work, we developed an AXI4-based Snooping Unit as a concrete example of a custom-EVU. Therefore, to help integrate such custom EVUs with an APMU infrastructure, we also standardize an EVU-APMU interface. We provide the specification for this interface, ensuring that users can connect any custom-EVU to an APMU, as long as both abide by the interface specification.
In this work, we implement two design IPs. One is the previously mentioned AXI4-based Snooping Unit and the other is a RISC-V compliant APMU. We also provide a software stack to support programming on its processor. The implemented design is emulated on an AMD Virtex UltraScale+ FPGA VCU118 device. To evaluate the implementation of our design, we present the hardware synthesis results for the FPGA, and the execution results of a latency-based regulation case study, demonstrating the functionality of our design.
2024-03-22T00:00:00ZDesign of practical computer vision system with real-time object detection capabilitychen, guanyuhttp://hdl.handle.net/10012/203852024-03-08T03:31:01Z2024-03-07T00:00:00ZDesign of practical computer vision system with real-time object detection capability
chen, guanyu
Computer vision nowadays relies heavily on machine learning techniques to interpret
useful information from images or videos. Object detection is one such computer vision
technique for identifying and locating objects in images. This type of application is of
great interest for its potential use in various fields including product inspection, analysis,
security, etc.
As another important technique in computer vision, object recognition for identifying
objects in images has been accomplished earlier. Classic models including LeNet and
VGG16 have already adopt CNN-like architectures. In comparison, an object detection
model would not only identify objects, but also label each detected object with a bounding
box. Provided ground truth labels about both object class and bounding box coordinates,
object detection models can be trained regularly for making both predictions. Certain
families of object detection models are listed as follows: In R-CNN, the Region Proposal
Network (RPN) produces region proposals, corresponding to rectangular regions in the
image in which targeting object is possibly present. YOLO divides the input image into
grids and predicts the bounding box and class confidence simultaneously for each grid.
SSD is a similar model to YOLO but has better accuracy by using features at different
scales. As a result of improved hardware performance and innovative network architecture
in recent years, real-time object detection has become possible with both satisfying speed
and accuracy.
The goal of this thesis is to implement a real-time object detection system based on
some of the already published models, with the Proposal Connection Network (PCN)
discussed in more detail. PCN in simple terms is a two-stage, anchor-free object detection
model with unique advantages. Following the demonstration of system design and setup
are training and experimental processes, focusing primarily on performance analysis and
comparison among models.
2024-03-07T00:00:00ZRobust Sonographic Muscle Quality Assessment: A Live, Accurate Tissue Speed-of-Sound Estimation FrameworkXiao, Dihttp://hdl.handle.net/10012/203652024-02-24T03:31:11Z2024-02-23T00:00:00ZRobust Sonographic Muscle Quality Assessment: A Live, Accurate Tissue Speed-of-Sound Estimation Framework
Xiao, Di
Muscle quality can act as an indicator for physical health through qualitative – yet measurable – changes in muscle architecture and composition. One option for assessing muscle quality is through musculoskeletal ultrasound, which can provide metrics such as echogenicity or texture. These existing ultrasound-based metrics relating to muscle quality may depend on scanner capabilities and operator assessment of the resulting B-mode images. Tissue speed-of-sound (SoS) can robustly augment muscle quality assessment as an intrinsic property of the tissue; however, sonographic measurement of muscle SoS is frequently limited by hardware which requires bilateral access to the tissue, or which cannot perform simultaneous ultrasound imaging in real-time.
In this dissertation, I designed an ultrasound framework for live, accurate SoS estimation in vivo. A novel global SoS estimation algorithm with real-time potential was created using the principles of high-frame-rate ultrasound with a standard pulse-echo ultrasound probe. This algorithm was experimentally validated to be highly accurate both in vitro and in vivo with demonstrable live imaging capacity; a portable research scanner and laptop were used to realize the framework. In the process of framework design, the underlying engineering challenges of real-time data transfer rates from the probe to the system and image formation efficiency were addressed using deep learning principles and sparse matrix formulations respectively. Lastly, the novel framework was then used to conduct a human study consisting of forty volunteers. The study served to demonstrate the applicability of a live SoS system, establish a replicable experimental protocol for SoS measurement in large muscles, and investigate relationships between demographics and muscle SoS.
This dissertation research is intended to bridge the engineering innovations of ultrasound algorithm development with the clinical applications for SoS as a tissue biomarker. For the targeted muscle quality application, the results confirm the physiological relevance of muscle SoS and exhibit the utility of real time SoS estimation with simultaneous B-mode imaging. The significant relationships between muscle SoS and demographic factors support the potential for clinical translation of such a real-time system. The realization of live SoS estimation can help derive new insights between the tissue SoS and pathological conditions or imaging applications.
2024-02-23T00:00:00ZFair and Efficient Resource Scheduling in Heterogeneous Multi-Agent SystemsOmidi, Mohammadhadihttp://hdl.handle.net/10012/203622024-02-23T03:30:57Z2024-02-22T00:00:00ZFair and Efficient Resource Scheduling in Heterogeneous Multi-Agent Systems
Omidi, Mohammadhadi
The performance of machine-learning applications heavily relies on the choice of the underlying hardware architecture, encompassing factors such as computational power, scalability, memory, and storage capabilities. These hardware choices significantly impact the efficiency and effectiveness of machine-learning systems. Resource-intensive programs can lead to competition for system resources, causing delays, while inefficient resource usage can saturate resources and harm user experience. To address resource variation among applications, resource sharing is implemented, allowing applications to dynamically allocate resources as needed, promoting efficient resource utilization. However, resource-allocation strategies often prioritize performance, potentially overlooking fairness among users or applications, especially in shared environments. Balancing performance optimization and fair resource-allocation is a complex challenge, requiring mechanisms that encourage resource sharing, prevent envy, and ensure a fair distribution of resources. Incorporating these characteristics promotes collaboration, minimizes negative emotions, and prioritizes the well-being of all participants in the system.
This research introduces an innovative resource-allocation mechanism that addresses shortcomings in traditional methods. Our method prioritizes both fairness and efficiency in resource distribution, utilizing a token-based mechanism to ensure fairness and implementing individual preferences based on learned thresholds through an Actor-Critic method to improve efficiency. A computer simulation involving 40 accelerators and 20 agents in different environments demonstrates a performance improvement 1.28× compared to standard approaches. This study contributes by shedding light on the complex challenges of resource- allocation in heterogeneous systems and providing a practical solution with our approach.
2024-02-22T00:00:00Z