Publications
Large pages have been the de facto mitigation technique to address the translation overheads of virtual memory, with prior work mostly focusing on the large page sizes supported by the x86 architecture, i.e., 2MiB and 1GiB. ARMv8-A and RISC-V support additional intermediate translation sizes, i.e., 64KiB and 32MiB, via OS-assisted TLB coalescing, but their performance potential has largely fallen under the radar due to the limited system software support. In this paper, we propose Elastic Translations (ET), a holistic memory management solution, to fully explore and exploit the aforementioned translation sizes for both native and virtualized execution. ET implements mechanisms that make the OS memory manager coalescingaware, enabling the transparent and efficient use of intermediatesized translations. ET also employs policies to guide translation size selection at runtime using lightweight HW-assisted TLB miss sampling. We design and implement ET for ARMv8-A in Linux and KVM. Our real-system evaluation of ET shows that ET improves the performance of memory intensive workloads by up to 39% in native execution and by 30% on average in virtualized execution.
Published at:
57th IEEE/ACM International Symposium on Microarchitecture (MICRO’24)
With the proliferation of Serverless Computing, the Function-asa-Service (FaaS) paradigm is nowadays ubiquitous. As a result, the domain has attracted extensive research, both in industry and academia, identifying opportunities and addressing limitations across all aspects of this new Cloud paradigm. Recently, FaaS providers have released production workload traces of their commercial platforms. These expose important characteristics, such as the execution time of function invocations, their number and the distribution of their inter-arrival times, which must be taken into account for a concrete evaluation of innovative solutions. Nevertheless, the Serverless ecosystem still lacks a unified evaluation methodology based on such information. In this paper we attempt to fill this gap, by developing a methodology for fitting existing, real, open-source workloads found in FaaS benchmarking suites to production FaaS workload traces, in a way that sufficiently preserves the aforementioned core statistical properties of such traces. Based on this, we build FaaSRail, an opensource load generator that receives a target maximum request rate and a target total execution duration as inputs from the user and generates representative, scaled down FaaS load.
Published at:
33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2024
The Standard Portable Intermediate Representation (SPIR- V) is a low-level binary format designed for representing shaders and compute kernels that can be consumed by OpenCL for computing kernels, and Vulkan for graphics rendering. As a binary representation, SPIR-V is meant to be used by compilers and runtime systems, and is usually performed by C/C++ programs and the LLVM software and compiler ecosystem. However, not all programming environments, runtime systems, and language implementations are C/C++ or based on LLVM.
This paper presents the Beehive SPIR-V Toolkit; a frame- work that can automatically generate a Java composable and functional library for dynamically building SPIR-V binary modules. The Beehive SPIR-V Toolkit can be used by optimizing compilers and runtime systems to generate and validate SPIR-V binary modules from managed runtime systems. Furthermore, our framework is architected to accommodate new SPIR-V releases in an easy-to-maintain manner, and it facilitates the automatic generation of Java libraries for other standards, besides SPIR-V. The Beehive SPIR-V Toolkit also includes an assembler that emits SPIR-V binary mod- ules from disassembled SPIR-V text files, and a disassembler that converts the SPIR-V binary code into a text file. To the best of our knowledge, the Beehive SPIR-V Toolkit is the first Java programming framework that can dynamically generate SPIR-V binary modules.
Published at:
2023 Workshop on Virtual Machines and Language Implementations (VMIL 2023)
Adopting heterogeneous execution on GPUs and FPGAs in managed runtime systems, such as Java, is a challenging task due to the complexities of the underlying virtual machine. The majority of current work has been focusing on compiler toolchains to solve the challenge of transparent just-in-time compilation of different code segments onto the accelerators. However, apart from providing automatic code generation, one of the challenges is also the seamless interoperability with the host memory manager and the Garbage Collector (GC). Currently, heterogeneous programming models on top of managed runtime systems, such as Aparapi and TornadoVM, need to block the GC when running native code (e.g, JNI code) in order to prevent the GC from moving data while the native code is still running on the hardware accelerator.
To tackle this challenge, this paper proposes a novel Unified Memory (UM) memory allocator for heterogeneous programming frameworks for managed runtime systems. In this paper we show how, by providing small changes to a Java runtime system, automatic memory management can be enhanced to perform object reclamation not only on the host, but on the device also. This is done by allocating the Java Virtual Machine’s object heap in unified memory which is visible to all hardware accelerators. In this manner, we enable transparent page migration of Java heap-allocated objects between the host and the accelerator, since our UM system is aware of pointers and object migration due to GC collections. This technique has been implemented in the context of MaxineVM, an open source research VM for Java written in Java. We evaluated our approach on a discrete and an integrated GPU, showcasing under which conditions UM can benefit execution across different benchmarks and configurations. Our results indicate that when hardware acceleration is not employed, UM does not pose significant overheads unless memory intensive workloads are encountered which can exhibit up to 12% (worst case) and 2% (average) slowdowns. In addition, if hardware acceleration is used, UM can achieve up to 9.3x speedup compared to the non-UM baseline implementation.
Published at:
20th International Conference on Managed Programming Languages & Runtimes (MPLR'23)
Java benchmarking suites like Dacapo and Renaissance are employed by the research community to evaluate the performance of novel features in managed runtime systems. These suites encompass various applications with diverse behaviors in order to stress test different subsystems of a managed runtime. Therefore, understanding and characterizing the behavior of these benchmarks is important when trying to interpret experimental results.
This paper presents an in-depth study of the memory behavior of 30 Dacapo and Renaissance applications. To realize the study, a characterization methodology based on a two-faceted profiling process of the Java applications is employed. The two-faceted profiling offers comprehensive insights into the memory behavior of Java applications, as it is composed of high-level and low-level metrics obtained through a Java object profiler (NUMAProfiler) and a microarchitectural event profiler (PerfUtil) of MaxineVM, respectively. By using this profiling methodology we classify the Dacapo and Renaissance applications regarding their intensity in object allocations, object accesses, LLC, and main memory pressure. In addition, several other aspects such as the JVM impact on the memory behavior of the application are discussed.
Published at:
20th International Conference on Managed Programming Languages & Runtimes (MPLR'23)
The Intrusion Detection System (IDS) is an effective tool utilized in cybersecurity systems to detect and identify intrusion attacks. With the increasing volume of data generation, the possibility of various forms of intrusion attacks also increases. Feature selection is crucial and often necessary to enhance performance. The structure of the dataset can impact the efficiency of the machine learning model. Furthermore, data imbalance can pose a problem, but sampling approaches can help mitigate it. This research aims to explore machine learning (ML) approaches for IDS, specifically focusing on datasets, machine algorithms, and metrics. Three datasets were utilized in this study: KDD 99, UNSW-NB15, and CSE-CIC-IDS 2018. Various machine learning algorithms were chosen and examined to assess IDS performance. The primary objective was to provide a taxonomy for interconnected intrusion detection systems and supervised machine learning algorithms. The selection of datasets is crucial to ensure the suitability of the model construction for IDS usage. The evaluation was conducted for both binary and multi-class classification to ensure the consistency of the selected ML algorithms for the given dataset. The experimental results demonstrated accuracy rates of 100% for binary classification and 99.4In conclusion, it can be stated that supervised machine learning algorithms exhibit high and promising classification performance based on the study of three popular datasets.
Published at:
Applied Sciences Journal
The address translation (AT) overhead has been widely studied in literature and the new 5-level paging is expected to make translation even costlier. Multiple solutions have been proposed to alleviate the issue either by reducing the number of TLB misses or by reducing their overhead. The solution widely adopted by industry involves extending the page sizes supported by the hardware and software, with the most common being 2MB and 1GB. We evaluate the usefulness of intermediate translation sizes, using memory-intensive work-loads running on an ARMv8-A server.
Published at:
18th European Conference on Computer System (EuroSys 2023)
In this talk, we will present the newly EU-funded project AERO (Accelerated EU Cloud) whose mission is to bring up and optimize the software stack of cloud deployments on top of the EU processor. After providing an overview of the AERO project, we will expand on two main components of the software stack to enable seamless acceleration of various programming languages on RISC-V architectures; namely, ComputeAorta which enables the generation of RISC-V vector instructions from SPIR-V binary modules, and TornadoVM which enables transparent hardware acceleration of managed applications. Finally, we will describe how the ongoing integration of ComputeAorta and TornadoVM will enable a plethora of applications from managed languages to harness RISC-V auto-vectorization completely transparently to developers.
Published at:
RISC-V Summit Europe 2023, 5-9 June 2023, Barcelona, Spain
We advocate and originally design FaaSCell, an intra-node orchestrator for serverless functions. It aims to enable single-node resource management and performance studies, while remaining compatible with the distributed software stack of FaaS. FaaSCell could potentially be integrated with Kubernetes and its ecosystem, or with any other upper-layer component or platform that may be used for cluster-wide orchestration of FaaS deployments.
Published at:
1st Workshop on SErverless Systems, Applications and MEthodologies (SESAME 2023)
Scaling up the performance of managed applications on Non-Uniform Memory Access (NUMA) architectures has been a challenging task, as it requires a good understanding of the underlying architecture and managed runtime environments (MRE). Prior work has studied this problem from the scope of specific components of the managed runtimes, such as the Garbage Collectors, as a means to increase the NUMA awareness in MREs.
In this paper, we follow a different approach that complements prior work by studying the behavior of managed applications on NUMA architectures during mutation time. At first, we perform a characterization study that classifies several Dacapo and Renaissance applications as per their scalability-critical properties. Based on this study, we propose a novel lightweight mechanism in MREs for optimizing the scalability of managed applications on NUMA systems, in an application-agnostic way. Our experimental results show that the proposed mechanism can result in relative performance ranging from 0.66x up to 3.29x, with a geometric mean of 1.11x, against a NUMA-agnostic execution.
Published at:
2023 ACM SIGPLAN International Symposium on Memory Management (ISMM 2023), June 2023 Orlando, Florida, United States
In recent years, the Java Virtual Machine has evolved from a cross-ISA virtualization layer to a system that can also offer multilingual support. GraalVM paved the way to enable the interoperability of Java with other programming languages, such as Java, Python, R and even C++, that can run on top of the Truffle framework in a unified manner. Additionally, there have been numerous academic and industrial endeavors to bridge the gap between the JVM and modern heterogeneous hardware resources. All these efforts beacon the opportunity to use the JVM as a unified system that enables interoperability between multiple programming languages and multiple heterogeneous hardware resources.
In this paper, we focus on the interoperability of code that accelerates applications on heterogeneous hardware with multiple programming languages. To realize that concept, we employ TornadoVM, a state-of-the-art software for enabling various JDK distributions to exploit hardware acceleration. Although TornadoVM can transparently generate heterogeneous code at runtime, there are several challenges that hinder the portability of the generated code to other programming languages and systems. Therefore, we analyze all challenges and propose a set of modifications at the compiler and runtime levels to constitute Java as a prototyping language for the generation of heterogeneous code that can be used by other programming languages and systems.
Published at:
7th MoreVMs workshop (MoreVMs'23)
Title lorem ipsum dolor sit amet.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar consectetur ex eu molestie. Aliquam at nisl lobortis, ornare mauris ut, rhoncus eros.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar consectetur ex eu molestie. Aliquam at nisl lobortis, ornare mauris ut, rhoncus eros. Morbi sed rhoncus purus. Nunc hendrerit lacus non turpis tristique, eu tempus augue dignissim. Morbi lacinia porta aliquam. Phasellus bibendum suscipit lobortis. Etiam nec lectus id tortor placerat feugiat. Sed tristique leo aliquet, faucibus mi at, pellentesque leo. Suspendisse lacus odio, varius non volutpat et, sollicitudin a turpis. Pellentesque lacus sem, rhoncus sed felis sed, bibendum suscipit dui. Fusce et nunc a leo hendrerit iaculis vitae in tortor. Praesent molestie sodales nisl, vel vehicula nunc pulvinar in. Ut porta cursus consectetur. Aliquam eget tortor turpis. Nunc pretium dolor eget erat pulvinar, ac maximus mauris ullamcorper.
Nunc varius cursus eros, nec sodales metus malesuada ut. Donec congue magna eget libero viverra ultricies. Maecenas in diam eu nulla placerat commodo id venenatis elit. Praesent viverra pretium enim blandit facilisis. Proin maximus ultricies nisi eu varius. Ut id justo in nibh posuere eleifend. Etiam tempus, nibh quis viverra viverra, nisl justo bibendum turpis, id aliquet dui lacus ut ante. Proin ante ante, lacinia sit amet nisi porta, fermentum bibendum neque. Fusce ligula dolor, lacinia vel consequat molestie, convallis ut nisi. Ut pellentesque mattis diam, ac faucibus tellus condimentum ut. Mauris mattis, ex vitae ullamcorper bibendum, erat sem iaculis elit, eget faucibus turpis nisl a turpis. Cras nec urna in elit condimentum condimentum. Fusce rhoncus posuere mauris, id consequat nunc gravida id. Mauris aliquam dolor vitae massa tempor, at euismod mauris hendrerit.