Perceptum Subscribe
SUN MAY 03
Weekly issue

Percepti's weekly briefing.

Top 25 recent arXiv papers, ranked and summarized for a fast weekly scan.

2604.26624v1 · Apr 29 · Distributed Systems

DMRlib: Easy-coding and Efficient Resource Management for Job Malleability

Sergio Iserte, Rafael Mayo, Enrique S. Quintana-Ortí, Antonio J. Peña

A new software library makes it much easier to write supercomputer programs that can grow or shrink while running, boosting overall job throughput by more than 3x.

Why it matters

Data centers and HPC clusters are expensive, power-hungry, and almost always under contention. If jobs could elastically resize while running, the scheduler could pack work more tightly, finish more jobs per hour, and even save energy. The catch has always been that rewriting scientific code to support resizing is painful. This library lowers that barrier so more applications can benefit.

Method

  • — Builds a library called DMRlib that sits on top of MPI, the standard messaging system for HPC programs.
  • — Exposes a small, MPI-style set of calls so programmers add only a few lines to mark where their app can safely resize.
  • — Ships ready-made communication patterns for redistributing data when the number of processes changes, so users don't write that plumbing themselves.
  • — Works with the cluster's job scheduler to actually expand or shrink running jobs when resources free up or get tight.
  • — Tests two job-submission styles: rigid (user fixes the size) and moldable (scheduler picks the size at launch).
  • — Compares against traditional non-resizable workloads using metrics like jobs finished per second, how well processors are kept busy, and energy consumed.
  • — Demonstrates the approach on applications with different scaling behaviors, including iterative solvers and N-body-style simulations.

Result

Across the tested workload scenarios, letting jobs resize while running improved overall throughput by more than 3x compared to standard non-resizable workloads. The cluster also kept processors busier and showed favorable energy-use patterns, with the biggest wins appearing when incoming jobs were submitted in flexible (moldable) form rather than fully rigid. The library let researchers add malleability with a small, familiar-looking code change rather than a heavy rewrite.

Caveats

The 3x figure is from controlled experiments and depends on workload mix; real production traces may show smaller gains. The approach still requires the scheduler, MPI runtime, and application to all cooperate, which is not standard in most clusters today. Apps with awkward data layouts or tight coupling may not slot into the predefined communication patterns. The paper does not claim it works seamlessly for arbitrary code, and energy savings can hinge on hardware specifics. Wider adoption would need integration with mainstream schedulers like Slurm and more diverse benchmarks.

Builds on

  • Iserte et al, 2018

    The same group's earlier DMR API work introduced the underlying mechanism for turning MPI apps malleable. DMRlib repackages and simplifies that into a higher-level, easier-to-use library with predefined patterns.

  • Comprés et al, 2016

    Proposed MPI infrastructure and API extensions for elastic execution. DMRlib targets the same goal but emphasizes minimal code changes and ready-made redistribution patterns rather than new MPI primitives.

  • El Maghraoui et al, 2007

    Early work on dynamic malleability for iterative MPI applications. DMRlib generalizes beyond iterative apps and packages the redistribution logic for reuse.

  • Lemarinier et al, 2016

    Architected malleable MPI apps for priority-driven scheduling. DMRlib shares the malleability goal but focuses on developer ergonomics and throughput/energy metrics.

Original abstract

Process malleability has proved to have a highly positive impact on the resource utilization and global productivity in data centers compared with the conventional static resource allocation policy. However, the non-negligible additional development effort this solution imposes has constrained its adoption by the scientific programming community. In this work, we present DMRlib, a library designed to offer the global advantages of process malleability while providing a minimalist MPI-like syntax. The library includes a series of predefined communication patterns that greatly ease the development of malleable applications. In addition, we deploy several scenarios to demonstrate the positive impact of process malleability featuring different scalability patterns. Concretely, we study two job submission modes (rigid and moldable) in order to identify the best-case scenarios for malleability using metrics such as resource allocation rate, completed jobs per second, and energy consumption. The experiments prove that our elastic approach may improve global throughput by a factor higher than 3x compared to the traditional workloads of non-malleable jobs.

DMRlib is a thin, MPI-style library that lets MPI applications become malleable with minimal code changes, and the authors show this can more than triple workload throughput versus rigid scheduling.

Why it matters

Cluster utilization studies (e.g., Lublin and Feitelson) consistently show that rigid allocation leaves significant idle capacity, and prior malleability research has shown 1.5-3x gains under favorable conditions. But adoption has been blocked by the engineering cost of writing resize-aware MPI code: handling MPI_Comm_spawn, redistributing distributed arrays, checkpoint/restart, and integrating with a scheduler that actually issues resize commands. DMRlib's contribution is less a new algorithm than a usability and integration layer that makes the existing performance argument practically accessible. For practitioners, the relevant question is whether the abstraction is general enough and whether scheduler integration is realistic in production environments like Slurm or PBS.

Method

  • — DMRlib is built on top of an earlier DMR API from the same group, exposing a minimalist, MPI-like syntax so programmers mark malleability points with a few directives rather than hand-coding spawn/redistribute logic.
  • — Provides a catalog of predefined communication patterns for data redistribution when process count changes (e.g., block, cyclic, replicated layouts typical in iterative solvers and stencil codes).
  • — At reconfiguration points, the runtime coordinates with the resource manager: it can either expand the MPI communicator by spawning new ranks, or shrink it by draining ranks after redistributing their state.
  • — Supports both expand and shrink operations and is designed to handle iterative and non-iterative scientific applications, building on the authors' prior work extending malleability beyond iterative codes (Iserte et al., 2018).
  • — Evaluation uses synthetic workloads driven by Lublin-Feitelson-style models to generate realistic job streams, with two submission modes: rigid (fixed size, no resize) and moldable (scheduler chooses initial size, then malleable resizing during execution).
  • — Tested on representative HPC kernels including a conjugate-gradient-style iterative solver and an N-body-like simulation, covering different scalability profiles (strong-scaling-friendly vs. communication-bound).
  • — Metrics: resource allocation rate (effective utilization), completed jobs per second (throughput), and energy consumption.
  • — Comparisons are against non-malleable baselines under the same submission distributions, isolating the contribution of runtime resizing vs. scheduling flexibility.

Result

The headline number is a >3x improvement in global throughput (jobs completed per unit time) for the elastic configuration relative to a traditional non-malleable workload. The biggest gains appear when jobs are submitted in moldable form, because the scheduler then has freedom both at launch and during execution. Resource allocation rate is correspondingly higher, indicating that the cluster keeps more cores productively occupied. Energy results, while not articulated as a single headline figure in the abstract, follow throughput in the expected direction: completing more work in less wall time tends to reduce per-job energy even when instantaneous power is similar, and moldable+malleable scenarios outperform rigid ones on this axis. The applications studied span different scalability patterns, and the authors argue this strengthens the generality claim, though the absolute throughput uplift varies by application class.

Caveats

Several things deserve scrutiny. First, the 3x figure depends heavily on workload generation parameters and the chosen scheduling policy; rigid baselines can look artificially weak if the workload has many small, easily-packable jobs. Second, the experiments are run with the authors' own resource manager integration, not a stock Slurm/PBS deployment, so portability claims need real-world validation. Third, malleability cost (data redistribution time, MPI dynamic process overhead, communicator reconstruction) is non-trivial and can dominate for short jobs or fine-grained resize events; the paper should ideally characterize the break-even resize frequency. Fourth, the predefined communication patterns are a productivity win precisely when an application's data layout matches one of them; codes with irregular meshes, AMR, or custom domain decompositions will still need bespoke redistribution. Fifth, fault tolerance during resize and interaction with checkpoint/restart are largely out of scope. Finally, MPI dynamic process management (spawn) has historically had uneven implementation quality across MPI distributions, which affects practical deployability.

Builds on

  • Iserte et al, 2018 (DMR API)

    DMR API provided the underlying runtime mechanism for turning MPI apps malleable, including scheduler coordination. DMRlib is the higher-level, MPI-style ergonomic wrapper that makes DMR usable with minimal code changes.

  • Comprés et al, 2016

    Proposed MPI runtime and API extensions for elastic execution. DMRlib pursues the same elasticity goal but at the library level over standard MPI, prioritizing developer effort over standardization.

  • El Maghraoui et al, 2009

    Demonstrated malleability for iterative MPI applications with custom reconfiguration logic. DMRlib generalizes the redistribution patterns and removes much of the per-application boilerplate.

  • Lemarinier et al, 2016

    Architected malleable MPI apps for priority-driven adaptive scheduling. DMRlib emphasizes ease of coding and reports throughput/energy under rigid vs. moldable submission rather than priority dynamics.

Original abstract

Process malleability has proved to have a highly positive impact on the resource utilization and global productivity in data centers compared with the conventional static resource allocation policy. However, the non-negligible additional development effort this solution imposes has constrained its adoption by the scientific programming community. In this work, we present DMRlib, a library designed to offer the global advantages of process malleability while providing a minimalist MPI-like syntax. The library includes a series of predefined communication patterns that greatly ease the development of malleable applications. In addition, we deploy several scenarios to demonstrate the positive impact of process malleability featuring different scalability patterns. Concretely, we study two job submission modes (rigid and moldable) in order to identify the best-case scenarios for malleability using metrics such as resource allocation rate, completed jobs per second, and energy consumption. The experiments prove that our elastic approach may improve global throughput by a factor higher than 3x compared to the traditional workloads of non-malleable jobs.

DMRlib provides an MPI-style abstraction layer over the authors' DMR runtime so MPI applications become malleable with localized code changes, and an empirical study quantifies when moldable-plus-malleable scheduling delivers >3x throughput over rigid baselines.

Why it matters

Process malleability has been studied since at least the late 1990s (Feitelson and Rudolph's taxonomy), and several systems — ReSHAPE, Charm++/AMPI, FLEX-MPI, the EuroMPI elastic-MPI proposals — have demonstrated significant gains. The gap between demonstrated potential and production deployment is almost entirely a software-engineering and integration gap: dynamic process management in MPI is fragile, data redistribution is application-specific, and resource managers don't speak a common protocol for resize. Work that genuinely lowers application-side cost is therefore high-leverage. DMRlib is also timely because power-capped and cloud-bursted HPC environments, where resource availability fluctuates, increasingly need elasticity — Sarood et al.'s overprovisioned/power-capped scenario and Iserte and Rojek's GPU energy study both motivate runtime resize as a power-management tool, not just a throughput tool. The paper is therefore worth reading as a usability and integration progress report on a long-standing research line.

Method

  • — Architecture: DMRlib wraps the previously published DMR API (Iserte et al., 2018) which mediates between the MPI runtime and the resource manager. DMRlib adds an MPI-style developer surface plus a catalog of canonical redistribution patterns.
  • — Programming model: malleability points are explicit (cooperative resize), not preemptive. The application reaches a safe point, calls into DMRlib, and the library handles communicator reconstruction, process spawn/drain, and data redistribution according to a declared pattern (block, block-cyclic, replicated, master-worker, etc.).
  • — Reconfiguration mechanics: expansion uses MPI_Comm_spawn (or equivalent dynamic process creation) to add ranks; shrink drains state from departing ranks via the chosen redistribution pattern before they exit. The new intracommunicator becomes the application's working communicator.
  • — Scheduler coupling: the resource manager pushes resize decisions; the application acknowledges at the next safe point. This decouples policy (scheduler) from mechanism (DMRlib) but means latency between decision and realization is workload-dependent.
  • — Workload generation: Lublin-Feitelson workload model (2003) supplies job arrival distributions and size characteristics, a standard choice for HPC scheduling experiments.
  • — Submission modes evaluated: rigid (user-fixed size, no resize), moldable (scheduler picks size at launch, fixed thereafter), and moldable+malleable (scheduler picks size at launch and may resize during execution). Pure malleable-from-rigid-submission is also implicitly part of the comparison.
  • — Applications: a CG-style iterative linear solver (Hestenes-Stiefel lineage; Saad's iterative methods baseline), an N-body simulation (Aarseth-style gravitational), and an RNA-seq mapping kernel (Medina et al., 2016). These span communication-bound iterative, compute-bound irregular, and embarrassingly-parallel-with-IO regimes respectively.
  • — Metrics: resource allocation rate (fraction of cores doing useful work), completed jobs per second (system throughput), and energy consumption (likely from RAPL or node-level meters; abstract does not specify).
  • — Baselines: non-malleable execution under the same submission distributions, which is the right comparison for isolating the value of runtime elasticity.
  • — Novelty claims: minimalist MPI-like API surface, predefined communication patterns to amortize redistribution engineering across applications, and an integrated empirical characterization of when malleability pays off as a function of submission mode.

Result

>3x global throughput improvement is reported for the elastic approach versus traditional non-malleable workloads, with the largest gains in moldable submission scenarios where the scheduler has both initial sizing freedom and runtime resize freedom. Resource allocation rate increases correspondingly, consistent with malleability's role as a packing aid: when a small job finishes and frees a fragment of nodes, malleable jobs can absorb that fragment instead of letting it idle until a backfill match arrives. Energy consumption results follow throughput; with shorter makespans at similar instantaneous power, energy-per-job decreases. Across the three application archetypes, the iterative CG-like code benefits most cleanly because its data layout matches DMRlib's predefined patterns and resize cost amortizes over many iterations. The N-body case is more sensitive to redistribution overhead because of irregular access patterns. The RNA-seq kernel benefits from elasticity primarily through better packing rather than per-job speedup. The 3x figure should be read as an upper-bound-ish result for favorable workload mixes, not a universal expectation.

Caveats

Several methodological and engineering issues warrant scrutiny. (1) Workload sensitivity: throughput uplift from malleability is notoriously sensitive to job size distribution, arrival rate, and the rigid baseline's backfill policy (e.g., EASY vs. conservative). The abstract does not specify the baseline scheduler's backfill aggressiveness; a weak rigid baseline inflates the malleability win. A sweep over backfill configurations would strengthen the claim. (2) Resize cost characterization: there is no headline number for redistribution latency or MPI_Comm_spawn cost, which can be hundreds of milliseconds to seconds depending on the MPI implementation and fabric. The break-even analysis (minimum job duration for malleability to pay off) is the most important missing ablation. (3) Pattern coverage: the predefined redistribution patterns are a productivity win only when applications match them. AMR codes, unstructured-mesh solvers with custom partitioners (ParMETIS-managed), and codes with persistent one-sided windows or custom datatypes will still require bespoke redistribution logic, and the paper does not delineate the supported pattern set quantitatively. (4) MPI dynamic process management portability: MPI_Comm_spawn has uneven support and performance across Open MPI, MPICH, MVAPICH, and Cray MPI; the experiments likely ran on one stack, and portability claims need cross-implementation validation. (5) Resource manager integration: integration appears to be with the authors' research scheduler, not stock Slurm/PBS/LSF. Without a Slurm plugin or a PMIx-based path, production adoption is blocked. The PMIx ecosystem is the obvious target and is not discussed. (6) Fault tolerance: dynamic process count interacts non-trivially with checkpoint/restart, ULFM-style fault-tolerant MPI, and node failures during resize. The paper does not address robustness when a spawn fails or a drained rank's state cannot be redistributed. (7) Energy measurement methodology: the abstract claims energy benefits but does not state whether measurements come from RAPL, IPMI, or wall meters, nor whether DVFS was held constant. Energy claims without methodology are weak. (8) Comparison gaps: no head-to-head with Charm++/AMPI (Acun et al., 2014; Gupta et al., 2014), which is the most mature malleability runtime, or with FLEX-MPI (Martín et al., 2013/2015). A side-by-side on developer effort (LOC delta, time-to-malleable) would directly support the usability claim. (9) Statistical reporting: throughput claims should include variance across workload seeds; a single 3x point estimate is insufficient. (10) Hidden assumption: cooperative resize requires applications to reach safe points frequently. Long-running monolithic phases (e.g., a multi-hour FFT) defeat the model. The paper should quantify the resize-latency distribution actually observed.

Builds on

  • Iserte et al, 2018 (DMR API)

    DMR API is the runtime substrate that handles scheduler-application handshake and dynamic process management. DMRlib is the ergonomic library layer on top, contributing the MPI-style API and the predefined redistribution patterns rather than new runtime mechanism.

  • Comprés et al, 2016

    Proposed infrastructure and API extensions to MPI itself for elastic execution. DMRlib takes the alternative path of staying within standard MPI and providing elasticity as a library, which trades expressive power for portability and zero-MPI-modification deployment.

  • El Maghraoui et al, 2007/2009

    Demonstrated malleability for iterative MPI applications with custom per-application logic. DMRlib generalizes this by factoring redistribution into reusable patterns and supporting non-iterative applications via the prior Iserte et al. 2018 work.

  • Lemarinier et al, 2016

    Architected malleable MPI applications for priority-driven adaptive scheduling at EuroMPI. DMRlib shares the malleable-MPI goal but emphasizes developer ergonomics and quantifies throughput/energy under rigid vs. moldable submission rather than priority response.

Original abstract

Process malleability has proved to have a highly positive impact on the resource utilization and global productivity in data centers compared with the conventional static resource allocation policy. However, the non-negligible additional development effort this solution imposes has constrained its adoption by the scientific programming community. In this work, we present DMRlib, a library designed to offer the global advantages of process malleability while providing a minimalist MPI-like syntax. The library includes a series of predefined communication patterns that greatly ease the development of malleable applications. In addition, we deploy several scenarios to demonstrate the positive impact of process malleability featuring different scalability patterns. Concretely, we study two job submission modes (rigid and moldable) in order to identify the best-case scenarios for malleability using metrics such as resource allocation rate, completed jobs per second, and energy consumption. The experiments prove that our elastic approach may improve global throughput by a factor higher than 3x compared to the traditional workloads of non-malleable jobs.

2604.26530v1 · Apr 29 · Quantum Physics · AI · Gr Qc

Fundamental Physics, Existential Risks and Human Futures

Adrian Kent

A physicist argues that quantum theory is probably not the final word, and that the next physics could reshape AI, computing, and humanity's long-term future.

Why it matters

Most discussions of AI risk and humanity's long-term future assume the laws of physics are basically settled and the only big unknowns are technological and social. This paper pushes back: if our deepest theory of nature is incomplete, then forecasts about computation, intelligence, and what minds can do may be missing key ingredients. Taking foundational physics seriously could change which technologies we expect, which existential risks we prioritize, and how we think about the relationship between minds and machines.

Method

  • — This is a perspective / position paper, not an experiment or proof - the author reflects on 25 years of his own work in the foundations of physics.
  • — He surveys three linked puzzles: the quantum reality problem (what is actually happening when a measurement occurs?), how quantum theory should fit with gravity, and whether consciousness plays any role in physical law.
  • — He argues that popular 'fixes' like the many-worlds interpretation don't fully work, and that alternatives involving real physical wavefunction collapse or extra hidden variables remain live options.
  • — He points to laboratory experiments now becoming feasible - tabletop tests of gravity-induced entanglement and tests of collapse models - that could push physics beyond standard quantum theory.
  • — He then speculates about consequences: new measurement rules or new dynamics could enable forms of information processing different from today's quantum computing.
  • — Caveat: the arguments are largely conceptual and programmatic. Concrete predictions are sketched, not derived in detail in this piece.

Result

There are no benchmark numbers or experimental results here - this is a synthesis essay. The main 'output' is a coherent worldview: quantum theory is probably an approximation to deeper laws; those deeper laws may include new ways the world evolves and new ways measurements work; some of this is testable with near-term experiments (for example, looking for gravitationally induced entanglement between small masses, or for tiny deviations from quantum predictions in increasingly large systems); and the implications stretch from cosmology to AI safety. The author also flags that mainstream physics surveys show no consensus on quantum foundations, which he treats as evidence that the field is genuinely unsettled rather than merely philosophical.

Caveats

The paper is speculative by design. It does not present a finished alternative theory, just reasons to expect one. Many physicists will disagree - for instance, defenders of many-worlds or of purely operational readings of quantum mechanics will reject the premise that there is a 'reality problem' at all. The links from foundational physics to AI and existential risk are suggestive rather than rigorous; nothing here shows that a post-quantum theory must change AI capabilities. Readers should treat this as a research agenda and a set of bets, not as established results. What's needed next is concrete: experiments that actually detect deviations from quantum mechanics, and worked-out models showing what new computational powers (if any) those deviations would unlock.

Builds on

  • Ghirardi, Rimini & Weber, 1986

    A concrete proposal in which wavefunctions occasionally and spontaneously 'collapse,' giving a real physical mechanism for measurement. Kent draws on this tradition as evidence that alternatives to standard quantum theory are viable and testable, though he develops his own variants.

  • Bell, 1976

    Bell's idea of 'local beables' - things that really exist at points in spacetime - underlies Kent's preference for theories where quantum reality is anchored to physical events rather than abstract wavefunctions. Kent extends this into Lorentz-invariant 'beable' models.

  • Bose et al., 2017

    Proposed a tabletop experiment using entanglement between masses to test whether gravity is quantum. Kent uses such proposals as evidence that the quantum-gravity interface is becoming experimentally accessible, which is central to his claim that new physics is within reach.

  • Chalmers, 1996

    Argued consciousness is a fundamental feature not reducible to standard physics. Kent takes this seriously as motivation for considering whether physical law itself might involve mind, though he does not endorse Chalmers' specific framework.

Original abstract

Over the past 25 years, I have been involved in some intriguing developments in the foundations of physics, exploring the quantum reality problem, the relationship between quantum theory and gravity and the interplay between consciousness and physical laws. These investigations make it plausible that we will find physics beyond quantum theory, potentially including both new evolution laws and new types of measurement. There is also a significant chance they could have potentially transformative impact on information processing and on the development of and our future with AI.

A foundations-of-physics perspective arguing that quantum theory is likely incomplete and that successor theories could materially change information processing, AI, and existential-risk thinking.

Why it matters

Quantum foundations is often treated as philosophy adjacent to physics, with little bearing on technology or AI safety. This paper argues the opposite: that foundational commitments - whether one takes Everettian branching seriously, whether dynamical collapse is real, whether gravity is quantized - have downstream implications for what kinds of computation are physically realizable, what 'measurement' even means in an AI context, and whether mind has any non-trivial role in physical law. For an ML/CS audience, the relevant claim is that the assumed substrate of computation (standard quantum mechanics) may itself be an approximation, and post-quantum effects could open or close avenues for information processing that are invisible to current complexity-theoretic analysis.

Method

  • — Format: a reflective perspective/position paper synthesizing 25 years of the author's own work; not an experimental or formal-theoretic contribution.
  • — Critique of interpretations: argues that Everettian/many-worlds accounts fail to provide adequate stories for probability, scientific confirmation, and evolutionary biology (drawing on his contributions to the Saunders-Barrett-Kent-Wallace volume), motivating a search for theories with a single actual reality.
  • — Single-world realist programs: takes GRW-style spontaneous collapse, CSL, and de Broglie-Bohm pilot-wave theory seriously as templates, while developing his own 'Lorentzian quantum reality' and 'beable-guided' frameworks where local beables are defined via late-time photodetection or generalized probability laws.
  • — Quantum-gravity stance: invokes his refutation of the Eppley-Hannah argument and recent work with Fedida on mixture-equivalence principles to argue that gravity need not be quantized in the standard sense, opening room for hybrid or post-quantum theories at the gravitational interface.
  • — Empirical hooks: cites the Carlesso et al. review of non-interferometric collapse tests and the Bose-Mazumdar-style gravity-mediated entanglement proposals as near-term experiments capable of falsifying or constraining standard quantum mechanics.
  • — Measurement postulates: leans on his recent argument (and exchange with Masanes-Galley-Müller) that the Born rule and projective measurement postulates are not derivable from the rest of quantum mechanics, hence are independent physical assumptions that could be modified.
  • — Speculative bridge to AI: suggests that new evolution laws or new measurement rules could enable novel information-processing primitives, and that this should be folded into long-term thinking about AI capabilities and existential risk, though no concrete computational model is proposed.
  • — Assumptions/caveats: the argument relies on the reader accepting that quantum foundations has genuine empirical content and that the absence of consensus (per Schlosshauer-Kofler-Zeilinger surveys) signals real incompleteness rather than mere interpretive taste.

Result

There are no quantitative results - the paper's outputs are conceptual. The structured claim is roughly: (1) standard quantum mechanics has unresolved foundational issues that are not merely interpretive; (2) several concrete alternative frameworks (GRW, CSL, pilot-wave, Kent's Lorentzian beable models, hodology) make different empirical predictions; (3) experiments such as gravity-mediated entanglement tests and improved bounds on collapse rates are now sensitive enough to discriminate between some of these; (4) plausible post-quantum physics could include modified evolution and modified measurement, both of which would in principle alter the space of physically realizable computations; (5) AI safety and existential-risk discussions, which currently assume a fixed physical substrate, should accommodate this uncertainty. The author treats the lack of foundational consensus among physicists as itself evidence that betting on novelty is reasonable.

Caveats

The paper is programmatic. The link from 'new physics is plausible' to 'AI futures will be transformed' is asserted rather than argued in detail - there is no model showing what computational class a post-quantum theory would correspond to, nor any analysis of whether such effects would be exploitable at scales relevant to AI hardware. Critics will note that (a) operationalist and QBist readers reject the framing of a reality problem; (b) defenders of many-worlds (Wallace, Saunders, Deutsch) have rebuttals to the probability and confirmation objections that this paper does not engage with in depth; (c) the consciousness-in-physics thread is the most speculative and risks being unfalsifiable; (d) tabletop quantum-gravity experiments are extraordinarily hard and null results to date constrain but do not settle anything. What needs proof next: a concrete, predictive post-quantum theory whose deviations are testable at currently accessible scales, and a worked example showing that those deviations could (or could not) be harnessed for computation beyond BQP. Without that, the AI-relevance claim remains suggestive.

Builds on

  • Ghirardi, Rimini & Weber, 1986

    Foundational dynamical-collapse model adding stochastic localization to the Schrödinger equation. Kent uses GRW (and the later CSL extension by Ghirardi-Pearle-Rimini, 1990) as an existence proof that empirically distinct alternatives to standard QM can be written down, while pursuing his own Lorentz-invariant beable variants instead.

  • Bell, 1976

    Introduced 'local beables' as the ontological primitives a serious quantum theory should specify. Kent's Lorentzian quantum reality and beable-guided proposals are direct descendants, generalizing the framework to relativistic settings and to modified probability laws.

  • Kent, 2010 (One world versus many)

    His own critique of Everettian accounts of probability, evolution, and confirmation. The present paper takes those criticisms as established and uses them to motivate single-world realist alternatives.

  • Bose et al., 2017

    Proposed gravitationally-induced entanglement between mesoscopic masses as a test of whether gravity is quantum. Kent cites this experimental program as a near-term route to empirical evidence about post-quantum or hybrid quantum-gravitational dynamics, complementing his theoretical refutation of older arguments (Eppley-Hannah) that gravity must be quantized.

Original abstract

Over the past 25 years, I have been involved in some intriguing developments in the foundations of physics, exploring the quantum reality problem, the relationship between quantum theory and gravity and the interplay between consciousness and physical laws. These investigations make it plausible that we will find physics beyond quantum theory, potentially including both new evolution laws and new types of measurement. There is also a significant chance they could have potentially transformative impact on information processing and on the development of and our future with AI.

A programmatic perspective contending that unresolved issues in quantum foundations, quantum gravity, and the physics of observers point toward post-quantum dynamics and measurement laws with potentially significant consequences for computation, AI, and existential-risk analysis.

Why it matters

For foundations specialists, the paper is a useful synthesis of Kent's program: Lorentzian beable theories, late-time photodetection-based ontologies, hodology, and beable-guided generalizations of the Born rule, framed against Everettian and operationalist alternatives. For the quantum-gravity community, it lines up with the post-Eppley-Hannah revival of hybrid classical-quantum models and with the Bose-Marletto-Vedral experimental program. The genuinely unusual contribution is the bridge to AI/x-risk discourse: most existential-risk analysis tacitly assumes BQP-bounded substrates and a settled physics; Kent flags that both assumptions are live and that foundational physics is, in expectation, decision-relevant for long-horizon technology forecasting. Whether one accepts the bridge or not, the paper makes explicit a set of assumptions usually left implicit.

Method

  • — Genre: a reflective perspective essay synthesizing roughly 25 years of the author's own work plus selected community results; not an original technical contribution. Arguments are conceptual, occasionally referencing his prior formal results without re-deriving them.
  • — Anti-Everettian commitments: relies on his 2010 critique of many-worlds accounts of probability, confirmation, and (notably) evolutionary biology - the latter argument being that natural selection presupposes a single actual lineage of outcomes, which branching ontologies undermine. He does not engage in detail with the Deutsch-Wallace decision-theoretic derivation of the Born rule or with Saunders-style self-locating uncertainty responses, treating those debates as effectively closed against Everett.
  • — Single-world realism: organizes the alternatives into (i) dynamical collapse (GRW 1986, CSL 1990, and the Carlesso et al. 2022 experimental program), (ii) hidden-variable / pilot-wave theories (de Broglie 1928, Bohm 1952, Valentini's nonequilibrium extension), and (iii) his own beable-based proposals - Lorentzian quantum reality (2015), late-time photodetection ontology (2017), beable-guided generalized probability laws (2013), hodology (2022), and 'beyond boundary conditions' cosmological framing (1998).
  • — Measurement-postulate independence: leans heavily on his arXiv:2307.06191 result (accepted in Quantum) that the projective-measurement and Born-rule postulates are not derivable from unitary QM plus structural assumptions, contra Masanes-Galley-Müller. This is load-bearing: it licenses the move that 'new measurement laws' is a coherent direction for post-quantum theory, not just a reinterpretation of existing formalism.
  • — Quantum gravity: refutation of Eppley-Hannah (CQG 2018) is cited to undermine the standard no-go argument that gravity must be quantized; the Fedida-Kent (2024) work on mixture-equivalence principles formalizes constraints any post-quantum theory of gravity must satisfy, and the Bose et al. (2017) gravity-mediated entanglement proposal is invoked as the canonical empirical test. The 2005 'Nonlinearity without superluminality' result is implicit background, since hybrid and modified-measurement theories typically introduce nonlinearities that must be made consistent with relativistic causality.
  • — Consciousness/observer thread: invokes James (1879), Schrödinger, and Chalmers (1996) to keep open the possibility that observers play a non-trivial role in physical law. This is the most speculative leg and is presented as a serious option rather than a defended position.
  • — Cosmological/algorithmic thread: nods to Solomonoff (1964) and Rissanen (1978) MDL ideas, suggesting that algorithmic-complexity priors may be relevant for selecting among post-quantum cosmological theories, and references DESI's recent dark-energy results as evidence that even mainstream cosmology is unsettled.
  • — Bridge to AI/x-risk: argues qualitatively that (a) new evolution laws could change effective computational complexity classes, (b) new measurement laws could provide novel I/O primitives between physical systems and observers, and (c) if observers/consciousness enter physics nontrivially, then claims about machine consciousness and AI alignment acquire physical (not merely functional) content. No formal computational model accompanies these claims.
  • — Implicit assumptions: that foundational disagreement (Schlosshauer-Kofler-Zeilinger 2013) tracks genuine empirical underdetermination rather than sociology; that beable-style ontologies are the right language for post-quantum theory; that consciousness has explanatory work to do in physics; that long-horizon AI forecasting is sensitive to substrate physics rather than dominated by software/economic factors.

Result

No quantitative results - the deliverables are positions and a research agenda. Concretely: (1) a unified case that QM's measurement postulates, its compatibility with gravity, and its account of observers are jointly underdetermined; (2) a taxonomy in which Kent's own Lorentzian-beable and hodological frameworks are positioned as relativistic, single-world realist alternatives to GRW/CSL and Bohmian mechanics; (3) identification of the Bose-Mazumdar-style entanglement experiments and improved CSL bounds (Carlesso et al.) as the near-term empirical frontier; (4) a claim that the measurement-postulate independence result (Kent 2025, Quantum) blocks structural-derivation programs (Masanes-Galley-Müller) and thereby leaves room for genuinely new measurement physics; (5) a high-level argument that AI and existential-risk discourse should treat foundational physics as a source of model uncertainty rather than a fixed background. The empirical anchor is qualitative: existing collapse-model bounds and gravity-entanglement proposals are cited as making the program testable in principle, without committing to a specific predicted deviation.

Caveats

Several weaknesses are worth flagging. (i) The Everett critique is asserted more than argued here; readers persuaded by Wallace's Emergent Multiverse or by recent work on self-locating probability will find the dismissal too quick, and the evolutionary-biology argument against branching is contentious - one can reasonably hold that selection operates branch-wise. (ii) The bridge from foundations to AI capabilities is the paper's most ambitious move and the least supported; nothing rules out the possibility that post-quantum corrections are real but operationally negligible at AI-relevant scales (collapse rates inferred from current bounds suggest deviations far below room-temperature electronic noise). A worked example showing a computational primitive enabled by, say, a specific CSL or beable-guided modification would strengthen the case enormously; absent that, the claim is decision-theoretic hand-waving. (iii) The consciousness thread risks unfalsifiability and is in tension with the otherwise operationalizable empirical program; Kent does not specify what observable signature 'consciousness in physics' would have. (iv) The mixture-equivalence and Eppley-Hannah-refutation results are load-bearing but the paper does not flag that hybrid classical-quantum theories generically face well-known difficulties (signaling, energy non-conservation, decoherence-induced effective quantization) that any concrete model must address. (v) The treatment of the Masanes-Galley-Müller exchange is one-sided; their response (ref. 29) is not engaged in detail. (vi) Missing ablations of the program: no discussion of how QBism, relational QM, or consistent histories fit the taxonomy; no engagement with recent algorithmic-information approaches to wavefunction selection beyond a bare Solomonoff/MDL nod; and no concrete commitment about whether Kent's own beable-guided generalized probability laws make distinct predictions from standard QM at currently accessible energies. (vii) The DESI citation is used loosely - evolving dark energy is interesting but not obviously connected to the measurement problem. Failure modes for the program: if next-generation collapse-model and gravity-entanglement experiments confirm standard QM to higher precision, the empirical case for novelty narrows considerably, and the AI-relevance claim collapses to a pure prior. Strong follow-ups: (a) derive an explicit post-quantum computational model (e.g., what oracle or complexity class does a beable-guided theory with modified Born rule realize?); (b) connect mixture-equivalence constraints to concrete predictions in proposed BMV-type experiments at realistic noise levels; (c) operationalize the 'observers in physics' claim into a measurable asymmetry (e.g., observer-dependent collapse rates) or retire it; (d) engage Wallace and Saunders directly on the evolution/probability arguments rather than relying on the 2010 volume; (e) build a quantitative x-risk analysis that propagates posterior uncertainty over physical theories into AI capability forecasts, so the 'fundamental physics matters for AI' claim becomes testable as a forecasting hypothesis rather than a slogan.

Builds on

  • Kent, 2010 (One world versus many)

    His own anti-Everettian arguments concerning probability, confirmation, and evolution. The present paper assumes those critiques succeed and uses them to motivate single-world realist programs; it does not re-litigate the Wallace/Saunders responses.

  • Kent, 2015 (Lorentzian quantum reality)

    Introduces relativistic beable postulates and toy models; together with the 2017 late-time photodetection paper and the 2022 hodology paper, this is the technical core of Kent's positive program that the essay synthesizes.

  • Ghirardi, Rimini & Weber, 1986; Ghirardi, Pearle & Rimini, 1990

    Dynamical collapse models providing the canonical example of an empirically distinct alternative to standard QM. Kent treats them as existence proofs and as the experimental target of programs like Carlesso et al. (2022), while preferring relativistic beable variants over nonrelativistic stochastic dynamics.

  • Bose et al., 2017

    Spin-entanglement-witness proposal for testing whether gravity mediates entanglement, hence whether it must be quantum. Combined with Kent's 2018 refutation of Eppley-Hannah and the Fedida-Kent (2024) mixture-equivalence work, this defines the empirical frontier the essay points to for post-quantum gravity.

Original abstract

Over the past 25 years, I have been involved in some intriguing developments in the foundations of physics, exploring the quantum reality problem, the relationship between quantum theory and gravity and the interplay between consciousness and physical laws. These investigations make it plausible that we will find physics beyond quantum theory, potentially including both new evolution laws and new types of measurement. There is also a significant chance they could have potentially transformative impact on information processing and on the development of and our future with AI.

2604.26654v1 · Apr 29 · Neural Computing

Evolutionary feature selection for spiking neural network pattern classifiers

Michal Valko, Nuno C. Marques, Marco Castelani

Researchers combined a brain-inspired 'spiking' neural network with an evolutionary algorithm that picks useful input features, getting smaller networks that handle noisy data well.

Why it matters

Most machine learning systems use very simplified models of neurons. Biology offers richer alternatives that may be more robust, but they're harder to train and tune. This paper is a small early step showing that pairing a biologically realistic neuron model with an automated 'try many combinations' search for the right inputs can make these models practical for real classification tasks. If that holds up at scale, it points toward more efficient, noise-tolerant systems that look more like brains than like typical deep nets.

Method

  • — Use the JASTAP neuron model, a biologically inspired neuron that communicates with timed pulses rather than just numbers, as the building block of the network.
  • — Compare against the standard multi-layer perceptron, the textbook neural network where neurons output continuous values.
  • — Apply an evolutionary algorithm — a search method that imitates natural selection — to two jobs at once: picking which input features to use, and training the network's weights.
  • — Test on the classic IRIS dataset, a small flower-classification benchmark, including versions with added noise to see how robust the system is.
  • — Measure classification accuracy and the size of the resulting network (fewer features and neurons is better).
  • — Caveat: this is described as preliminary work on one small dataset, so the results are suggestive rather than conclusive.

Result

On the IRIS dataset, the JASTAP-based network plus evolutionary feature selection produced smaller networks that handled noisier inputs without losing classification accuracy compared to the standard multi-layer perceptron approach. The abstract doesn't give specific accuracy numbers, but the takeaway is qualitative: biologically realistic neurons combined with automated feature selection didn't sacrifice performance, and seemed to gain robustness.

Caveats

This is explicitly a 'preliminary' result on IRIS, which is a tiny, easy benchmark — three flower types with four measurements each. We don't know yet whether the approach scales to bigger, harder problems like images or text. Evolutionary algorithms are also slow because they evaluate many candidate networks, so training cost is a real concern. The paper is from a niche corner of neural network research, and the biologically realistic neuron model is less standardized than mainstream deep learning tools, which makes reproduction and comparison harder. Stronger evidence would require multiple datasets, head-to-head timing comparisons, and tests on noise levels that mimic real-world sensors.

Builds on

  • Castellani & Marques, 2005

    An earlier technical report from the same group on using evolutionary algorithms for feature selection and training in standard multi-layer perceptrons. The current paper extends that procedure to the biologically realistic JASTAP neuron model.

  • Janco, Stavrovsky & Pavlasek, 1994

    Introduced the JASTAP neuron model — a biologically inspired neuron with graded responses. This paper adopts that model as its core network unit instead of the usual perceptron.

  • Bohte, Kok & Poutré, 2002

    Showed how to do error-backpropagation-style learning in spiking neural networks. The current paper sits in this lineage of trying to make biologically realistic spiking networks actually trainable for classification.

  • Rumelhart, Hinton & Williams, 1986

    The classic backpropagation paper that defined how multi-layer perceptrons are trained. This work positions JASTAP networks as an alternative to that standard model.

Original abstract

This paper presents an application of the biologically realistic JASTAP neural network model to classification tasks. The JASTAP neural network model is presented as an alternative to the basic multi-layer perceptron model. An evolutionary procedure previously applied to the simultaneous solution of feature selection and neural network training on standard multi-layer perceptrons is extended with JASTAP model. Preliminary results on IRIS standard data set give evidence that this extension allows the use of smaller neural networks that can handle noisier data without any degradation in classification accuracy.

The authors extend an evolutionary feature-selection-plus-training procedure from MLPs to a biologically realistic spiking neuron model (JASTAP), reporting smaller networks with preserved accuracy under noise on IRIS.

Why it matters

Spiking neural networks (SNNs) are biologically realistic and theoretically attractive for noise tolerance and energy efficiency, but they're notoriously hard to train compared to standard MLPs trained with backpropagation. Two practical levers help: (1) reducing input dimensionality so the SNN has less to learn, and (2) using a search procedure that doesn't rely on differentiability. Evolutionary algorithms address both. This paper is an early demonstration that the same evolutionary feature-selection wrapper that works for MLPs transfers cleanly to a biologically realistic spiking model, and that the resulting models are at least competitive on a standard benchmark while being smaller and more robust to noise. For practitioners watching the SNN space, it's a data point that bio-realistic units don't have to mean accuracy regressions when paired with the right training scaffolding.

Method

  • — Network model: the JASTAP neuron, a biologically realistic model with graded responses and pulse-based communication originally introduced by Janco, Stavrovsky & Pavlasek (1994). It's positioned as an alternative to the standard sigmoid/MLP unit and falls within the broader spiking neural network family.
  • — Baseline: standard multi-layer perceptron (MLP) trained via backpropagation, following the Rumelhart–Hinton–Williams lineage.
  • — Training/search procedure: an evolutionary algorithm that jointly performs (a) feature selection — choosing which subset of inputs to feed the network — and (b) network parameter optimization. This is the same wrapper Castellani & Marques (2005) applied to MLPs, now extended to JASTAP networks.
  • — Why evolutionary search: spiking models often lack clean gradients, so gradient-free optimization is a natural fit; it also handles the discrete feature-subset selection problem in the same loop as continuous parameter tuning.
  • — Evaluation: the IRIS benchmark from the UCI repository (Hettich, Bay & Merz, 1998), a 3-class, 4-feature classification task. Authors also evaluate under noisier variants of the data to probe robustness.
  • — Reported metrics (qualitative from abstract): classification accuracy and resulting network size; specific numbers are not given in the abstract.
  • — Stated novelty: not the JASTAP model itself, not the evolutionary feature-selection wrapper itself, but the integration of the two and the empirical claim that the combination yields smaller networks that tolerate noise without accuracy loss.
  • — Caveats acknowledged in framing: results are described as 'preliminary' and limited to a single, small standard dataset.

Result

On IRIS, the JASTAP-plus-evolutionary-feature-selection system reportedly matches the classification accuracy of standard MLPs while using smaller networks and tolerating noisier input data. The abstract frames this qualitatively rather than reporting specific accuracy percentages, error rates, or feature-subset sizes. Concretely, the takeaways are: (1) extending the evolutionary feature-selection procedure to JASTAP doesn't degrade performance relative to the MLP version, (2) the JASTAP variant appears to handle noise more gracefully, and (3) network size after evolutionary selection is smaller, suggesting that the biologically realistic units extract information more compactly when paired with a good feature subset. Without numbers in the abstract, comparisons to other SNN training approaches (e.g., SpikeProp from Bohte et al., 2002) can't be made directly from this summary.

Caveats

The most obvious limitation is the benchmark: IRIS is a 150-sample, 3-class, 4-feature toy problem that nearly any classifier handles well. Claims about scalability, generalization, or robustness need replication on harder datasets — at minimum other UCI benchmarks, ideally something with real sensor noise or higher dimensionality. Second, the abstract gives no concrete numbers, so the magnitude of the noise tolerance and size reduction effects is unclear. Third, evolutionary algorithms have substantial computational cost; the paper presumably trades wall-clock training time for the ability to train an otherwise hard-to-train spiking model, but the cost-benefit isn't quantified in the abstract. Fourth, the JASTAP model is relatively niche — most modern SNN work uses leaky integrate-and-fire or related variants — which limits direct comparability to mainstream SNN literature. Fifth, joint feature-selection-plus-training procedures are known to risk overfitting on small datasets unless cross-validation is rigorous, and IRIS is small enough for this to matter. What needs proof next: larger benchmarks, head-to-head comparison with gradient-based SNN training (e.g., SpikeProp), explicit noise-level sweeps with quantitative robustness curves, and timing/compute comparisons against the MLP baseline.

Builds on

  • Castellani & Marques, 2005

    Internal technical report developing the evolutionary procedure that simultaneously performs feature selection and neural network training for standard MLP pattern classifiers. The current paper directly extends that procedure to swap in JASTAP networks for MLPs.

  • Janco, Stavrovsky & Pavlasek, 1994

    Introduced the JASTAP neuron model — a biologically realistic neuron-like element with graded response. The current paper uses this as its core computational unit instead of a sigmoid/MLP unit.

  • Bohte, Kok & Poutré, 2002

    Developed error-backpropagation for temporally encoded networks of spiking neurons (SpikeProp), a foundational result for trainable SNNs. The current paper takes a different route — evolutionary search rather than gradient-based training — to make a biologically realistic spiking model usable for classification.

  • Rumelhart, Hinton & Williams, 1986

    The canonical backpropagation paper for MLPs. This work uses MLPs trained in this lineage as the baseline against which the JASTAP-plus-evolution approach is compared.

Original abstract

This paper presents an application of the biologically realistic JASTAP neural network model to classification tasks. The JASTAP neural network model is presented as an alternative to the basic multi-layer perceptron model. An evolutionary procedure previously applied to the simultaneous solution of feature selection and neural network training on standard multi-layer perceptrons is extended with JASTAP model. Preliminary results on IRIS standard data set give evidence that this extension allows the use of smaller neural networks that can handle noisier data without any degradation in classification accuracy.

2604.26437v1 · Apr 29 · Vision

Are Data Augmentation and Segmentation Always Necessary? Insights from COVID-19 X-Rays and a Methodology Thereof

Aman Swaraj, Arnav Agarwal, Hitendra Singh Bhadouria, Sandeep Kumar, Karan Verma

Researchers show that cropping chest X-rays down to just the lungs matters a lot for COVID-19 AI, while heavy image augmentation can actually hurt accuracy.

Why it matters

Chest X-ray classifiers for COVID-19 went viral during the pandemic, and many published models reported very high accuracy. But if a model is making its decision based on text labels in the corner of an X-ray or on the patient's shoulder rather than the lungs, that accuracy is meaningless in a real hospital. This paper is a useful reality check: it shows that two design choices many practitioners treat as either optional or always-good-more-is-better actually decide whether the model is trustworthy. That has implications well beyond COVID for any medical imaging pipeline.

Method

  • — Task: classify chest X-rays as COVID-19 vs. other categories using CNNs.
  • — Tool to peek inside the model: class activation mapping, which produces a heatmap showing which pixels drove the prediction.
  • — Experiment 1 (segmentation): compare models trained on full X-rays vs. X-rays where only the lung region is kept, and have medical experts inspect the heatmaps.
  • — Experiment 2 (augmentation): train the same model on datasets with no augmentation and with progressively more augmented images, then track test accuracy.
  • — Proposed pipeline, called SDL-COVID, combines lung segmentation with a measured, not excessive, amount of augmentation.
  • — Caveat: the abstract does not specify which CNN architectures, which datasets, or how segmentation was done.

Result

With expert review of the heatmaps, the authors confirm that without lung segmentation the model often relies on irrelevant parts of the image, so segmentation is necessary for trustworthy predictions. For augmentation, test accuracy climbs at first but then drops once you go past a certain volume of synthetic images, a classic overfitting signature. Their final SDL-COVID pipeline reaches 95.21% precision and, importantly, a low false negative rate, meaning it rarely misses actual COVID cases. Specific comparisons against named baselines are not given in the abstract.

Caveats

The abstract reports precision but not recall, F1, or AUC, and does not say how big the test set was or whether it came from a different hospital than the training data, which is the usual way these COVID X-ray models fail. 'Beyond a certain threshold' for augmentation is also vague, so it is hard to know when augmentation hurts in practice. Many published COVID X-ray datasets are known to be small and biased (for example, COVID images and non-COVID images coming from different sources), and the abstract does not discuss this. Finally, claiming a methodology is 'reliable' really needs external validation on data the model has never seen, ideally from multiple hospitals.

Original abstract

Purpose: Rapid and reliable diagnostic tools are crucial for managing respiratory diseases like COVID-19, where chest X-ray analysis coupled with artificial intelligence techniques has proven invaluable. However, most existing works on X-ray images have not considered lung segmentation, raising concerns about their reliability. Additionally, some have employed disproportionate and impractical augmentation techniques, making models less generalized and prone to overfitting. This study presents a critical analysis of both issues and proposes a methodology (SDL-COVID) for more reliable classification of chest X-rays for COVID-19 detection. Methods: We use class activation mapping to obtain a visual understanding of the predictions made by Convolutional Neural Networks (CNNs), validating the necessity of lung segmentation. To analyze the effect of data augmentation, deep learning models are implemented on two levels: one for an augmented dataset and another for a non-augmented dataset. Results: Careful analysis of X-ray images and their corresponding heat maps under expert medical supervision reveals that lung segmentation is necessary for accurate COVID-19 prediction. Regarding data augmentation, test accuracy significantly drops beyond a certain threshold with additional augmented images, indicating model overfitting. Conclusion: Our proposed methodology, SDL-COVID, achieves a precision of 95.21% and a lower false negative rate, ensuring its reliability for COVID-19 detection using chest X-rays.

An empirical study showing lung segmentation is essential and aggressive data augmentation is harmful for CNN-based COVID-19 chest X-ray classification, packaged as a pipeline called SDL-COVID.

Why it matters

Early-pandemic literature produced many CNN classifiers reporting >95% accuracy on small, heterogeneous COVID X-ray datasets. Subsequent audits (e.g., DeGrave et al.) showed many were exploiting shortcut features like dataset-source artifacts, text markers, or anatomy outside the lungs. This paper contributes a practitioner-facing analysis that ties two design decisions, segmentation and augmentation magnitude, directly to model reliability via CAM inspection and accuracy-vs-augmentation curves. For medical imaging teams, the takeaway is that gains from heavier augmentation can be illusory and that pre-segmenting the region of interest is not just nice to have, it changes what the model learns.

Method

  • — Problem framing: COVID-19 classification from chest X-rays with CNN classifiers; the focus is on methodology rather than novel architecture.
  • — Interpretability check: Class Activation Maps are generated for predictions on full vs. lung-segmented X-rays. Heatmaps are reviewed under expert medical supervision to determine whether model attention falls within lung fields.
  • — Augmentation ablation: models are trained at two configurations, augmented and non-augmented, with augmentation volume varied to characterize the test-accuracy curve. The authors report a threshold beyond which accuracy declines, consistent with overfitting to augmentation-induced artifacts.
  • — Proposed pipeline (SDL-COVID): segmentation-first preprocessing of chest X-rays followed by CNN classification with controlled augmentation. The abstract does not specify the segmentation network (e.g., U-Net), the backbone CNN (e.g., ResNet, VGG, DenseNet), the augmentation operations (rotation, flips, intensity jitter, etc.), or the dataset composition.
  • — Evaluation: primary metric reported is precision (95.21%) plus a qualitative claim of low false negative rate. Recall, specificity, F1, AUROC, and confusion matrices are not given in the abstract.
  • — Assumptions/caveats baked into the design: (1) lung-segmentation masks are accurate enough not to remove diagnostic signal; (2) clinician-reviewed CAMs are a sufficient proxy for clinical reliability; (3) the augmentation threshold generalizes across architectures and datasets.

Result

Two qualitative findings and one headline number. First, expert review of CAMs confirms that without segmentation, CNNs frequently attend to extra-pulmonary regions, and segmentation moves attention into clinically meaningful areas. This is offered as direct evidence that high accuracy without segmentation can be untrustworthy. Second, scaling augmented training data improves accuracy up to a point and then degrades it, indicating that excessive augmentation pushes the model toward overfitting on synthetic perturbations rather than improving generalization. Third, the proposed SDL-COVID pipeline achieves 95.21% precision with a reportedly low false negative rate, which the authors frame as the appropriate operating point for screening (minimizing missed COVID cases). The abstract does not benchmark against named prior models or report cross-dataset generalization.

Caveats

Several limitations are visible from the abstract alone. (1) Precision alone is insufficient; for a screening tool, recall/sensitivity and AUROC matter more, and the 'low false negative rate' claim should be quantified. (2) The well-known issue with public COVID X-ray datasets is source bias, where COVID-positive and -negative images come from different repositories, leading to spuriously high in-distribution performance; the abstract does not describe steps to mitigate this such as patient-level splits or external validation. (3) The 'augmentation threshold' is dataset- and architecture-specific, so the prescriptive claim that heavy augmentation is harmful needs more controlled experiments across backbones (e.g., ResNet50, EfficientNet) and augmentation policies (e.g., RandAugment, MixUp). (4) Reliance on CAMs is itself contested: CAMs can be misleading, low-resolution, and not always faithful to the model's true reasoning, so 'expert-validated CAMs' is suggestive but not definitive evidence of clinical reliability. (5) The segmentation step adds a dependency on a separate model whose errors propagate; the abstract does not characterize how segmentation failures affect classification. (6) Finally, the scope is COVID-19 vs. other classes; whether the segmentation/augmentation conclusions transfer to pneumonia subtype classification or other thoracic tasks is not addressed. Likely reviewer pushback: this is a methodology paper restating known best practice (segment first, augment carefully) without strong novelty unless the empirical curves and CAM analysis are unusually rigorous. The strongest version of the next paper would add multi-architecture ablations, a quantitative augmentation-vs-accuracy curve, external test sets from different hospitals, and comparison to faithful interpretability methods (e.g., integrated gradients) rather than CAM alone.

Original abstract

Purpose: Rapid and reliable diagnostic tools are crucial for managing respiratory diseases like COVID-19, where chest X-ray analysis coupled with artificial intelligence techniques has proven invaluable. However, most existing works on X-ray images have not considered lung segmentation, raising concerns about their reliability. Additionally, some have employed disproportionate and impractical augmentation techniques, making models less generalized and prone to overfitting. This study presents a critical analysis of both issues and proposes a methodology (SDL-COVID) for more reliable classification of chest X-rays for COVID-19 detection. Methods: We use class activation mapping to obtain a visual understanding of the predictions made by Convolutional Neural Networks (CNNs), validating the necessity of lung segmentation. To analyze the effect of data augmentation, deep learning models are implemented on two levels: one for an augmented dataset and another for a non-augmented dataset. Results: Careful analysis of X-ray images and their corresponding heat maps under expert medical supervision reveals that lung segmentation is necessary for accurate COVID-19 prediction. Regarding data augmentation, test accuracy significantly drops beyond a certain threshold with additional augmented images, indicating model overfitting. Conclusion: Our proposed methodology, SDL-COVID, achieves a precision of 95.21% and a lower false negative rate, ensuring its reliability for COVID-19 detection using chest X-rays.

2604.26473v1 · Apr 29 · Robotics

Alter-Art: Exploring Embodied Artistic Creation through a Robot Avatar

Do Won Park, Samuele Bordini, Giorgio Grioli, Manuel G. Catalano, Antonio Bicchi

Researchers built a robot that artists can 'inhabit' from a distance, letting them dance, act, and paint as if their body were the robot.

Why it matters

If artists can fully embody a robot somewhere else, art stops being limited by where the human body is or what it can physically do. A dancer with a disability could perform on stage, a painter could work with materials too dangerous to touch, and an actor could appear in a theater hundreds of miles away. The same idea also matters for non-art uses like remote work in hazardous environments, since it pushes telepresence beyond just steering a machine to actually feeling like you are it.

Method

  • — The team uses a humanoid robot called Alter-Ego as the 'avatar' the artist controls remotely.
  • — The artist wears a VR headset and motion-tracking gear so they see through the robot's eyes and the robot copies their movements in real time.
  • — The robot uses soft, springy joints instead of stiff motors, so its arms can press a brush into a canvas or react gently when bumped, more like a human limb.
  • — They tested the setup in three very different art forms: a dance piece, a theater scene with human actors, and a painting session.
  • — Instead of measuring accuracy with numbers, they collected the artists' own descriptions of what it felt like to create through the robot.
  • — The study is exploratory: a small number of artists, no formal experiment with control groups, and feedback is qualitative.

Result

Artists reported that within a short time they stopped feeling like they were 'driving' a robot and started feeling like the robot was their body. The robot's physical limits, like slower arms or a different reach, became part of the creative process rather than a problem to overcome: dancers leaned into the robot's particular way of moving, the actor adjusted timing and gestures to fit the avatar, and the painter discovered new brushstroke styles that came from the robot's compliant arms. Each art form was affected differently. Dance felt the most about presence and rhythm, theater raised questions about identity (am I the robot or the character?), and painting was the most shaped by physical constraints because of the direct contact with materials. The authors argue that this sense of embodiment should be treated as a core design goal for social and assistive robots, not just a nice extra.

Caveats

This is more of a creative exploration than a controlled study. There are no statistics, no comparison to other telepresence systems, and only a handful of artists tried it, so we don't know how general the findings are. 'Sense of presence' is reported by the artists themselves, which is valuable but easy to bias and hard to reproduce. Important practical issues are not deeply addressed: the latency of teleoperation, how tiring the gear is to wear, what happens when the network glitches mid-performance, and whether non-artists or audiences feel the same connection. The robot's hands are also limited, so fine tasks like detailed painting or handling delicate props are still out of reach. The next steps would need bigger and more diverse user groups, audience reactions, and head-to-head comparisons with simpler remote-control setups to show that full embodiment really matters.

Original abstract

As with every emerging technology, new tools in the hands of artists reshape the nature of artwork creation. Current frameworks for robotics in arts deploy the robot as an autonomous creator or a collaborator, thus leaving a certain gap between the human artist and the machine. Now, we stand at the dawn of an era where artists can escape physical limitations and reshape their creative identity by inhabiting an alternative body. This new paradigm allows artists not only to command a robot remotely, but also to {\it be} a robot, to see and feel through it, experiencing a new embodied reality. Unlike virtual reality, where art is created in a digital dimension, in this case art creation is still firmly grounded in the material world: clay molded by mechanical hands, paint swept across a canvas or gestures performed on a physical stage alongside human actors. Through the robot avatar Alter-Ego, we explore the Alter-Art paradigm in dance, theater, and painting; it integrates immersive teleoperation and compliant actuation to enable a first-person creative experience. Analyzing qualitative artistic feedback, we investigate how embodiment shapes creative agency, identity and interaction with the environment. Our findings suggest that artists rapidly develop a sense of presence within the robotic body. The robot's physical constraints influence the creative process, manifesting differently across artistic domains. We highlight embodiment as a central design principle, contributing to social robotics and expanding the possibilities for telepresence and accessible artistic expression.

Alter-Art proposes embodied teleoperation of a compliant humanoid as a new artistic paradigm, with case studies in dance, theater, and painting.

Why it matters

For social robotics and HRI, this reframes telepresence around the operator's phenomenology rather than task throughput, suggesting embodiment should be a first-class design objective alongside dexterity and safety. For the arts and accessibility, it points toward tools that let artists with mobility limitations or those separated by distance perform in the physical world, not just in VR. It also offers a concrete domain in which compliant actuation, immersive control interfaces, and identity-laden interaction collide, which is useful for benchmarking ideas that pure task-based robotics cannot easily evaluate.

Method

  • — Platform: the Alter-Ego humanoid robot avatar, a wheeled-base humanoid with two arms and a head-mounted camera system used as the artist's surrogate body.
  • — Teleoperation stack: immersive first-person control where the artist wears a VR headset and motion-capture/wearable interfaces; the robot's head streams stereo (or wide-angle) video back, and the artist's upper-body motions drive the robot's arms and head.
  • — Compliant actuation: the arms use variable/soft impedance actuators rather than stiff position control, enabling safe contact with canvases, props, and human partners and allowing physically grounded interaction (brush pressure, gentle pushes on stage).
  • — Three case studies as the core empirical unit: (i) dance, where embodiment, rhythm and presence dominate; (ii) theater, performed alongside human actors, where character identity and the avatar's identity interact; (iii) painting, where direct material contact makes the robot's mechanical properties most visible in the artifact itself.
  • — Novelty vs. prior work: previous robotic-art frameworks treat the robot as an autonomous artist (generative/algorithmic art on a manipulator) or as a collaborator (turn-taking with a human). Here the robot is an avatar; the creative agency stays with the human, but the human's body is replaced. This shifts the research questions from 'is the robot creative?' to 'how does inhabiting a robot change the artist?'.
  • — Evaluation: qualitative artistic feedback from participating artists across the three domains, focused on perceived agency, sense of presence, identity, and how physical constraints were experienced. No quantitative HRI metrics or psychophysical scales are emphasized in the abstract.
  • — Assumptions/caveats baked into the design: low-latency, reliable teleop link; artists willing and able to use immersive interfaces; tasks within the avatar's kinematic and force envelope; physical world (not VR) as the canvas of interest.

Result

The reported finding is that artists develop a sense of presence in the robot quickly, in line with embodiment literature, and treat the avatar's body as their own during creation. Across the three domains the effect of constraints differs: in dance, the robot's morphology and dynamics modulate movement vocabulary and the felt connection to the audience; in theater, embodiment surfaces tensions between the artist's identity, the robot's identity, and the staged character, with the avatar becoming a third presence on stage; in painting, compliant arms and limited end-effector dexterity directly imprint on the produced strokes and composition, leading artists to adapt or even prefer certain emergent qualities. The authors argue this supports treating embodiment as a central design principle for social robots and telepresence systems, beyond accuracy of motion mapping. The paper does not, per the abstract, report quantitative metrics, controlled comparisons, or audience-side evaluations.

Caveats

The contribution is conceptual and exploratory. The evidence is qualitative feedback from a small number of artists with no control condition (e.g., screen-based teleop, non-compliant arms, third-person view), making it impossible to attribute the reported sense of presence specifically to compliant actuation, first-person view, or novelty effects. Latency, calibration, fatigue, and failure modes of the teleop pipeline are not characterized in the abstract; for live theater and dance these are critical. Generalization is limited: a few trained artists self-selected for openness to new tools may not represent broader populations, including artists with disabilities the work hopes to serve. The avatar's hand dexterity will bound the painting and prop-handling claims, and we have no evidence about audience reception, which is arguably the ultimate test for performing arts. Finally, ethical and authorship questions (who is the author when the body is partly mechanical, what happens if the robot is shared across artists, accessibility cost) are flagged by the framing but not resolved. Convincing follow-ups would include controlled HRI studies with validated presence/agency questionnaires, ablations over compliance and viewpoint, audience-perception studies, longitudinal use by artists with motor impairments, and quantitative analysis of how the avatar's mechanical signature shows up in the resulting artworks.

Original abstract

As with every emerging technology, new tools in the hands of artists reshape the nature of artwork creation. Current frameworks for robotics in arts deploy the robot as an autonomous creator or a collaborator, thus leaving a certain gap between the human artist and the machine. Now, we stand at the dawn of an era where artists can escape physical limitations and reshape their creative identity by inhabiting an alternative body. This new paradigm allows artists not only to command a robot remotely, but also to {\it be} a robot, to see and feel through it, experiencing a new embodied reality. Unlike virtual reality, where art is created in a digital dimension, in this case art creation is still firmly grounded in the material world: clay molded by mechanical hands, paint swept across a canvas or gestures performed on a physical stage alongside human actors. Through the robot avatar Alter-Ego, we explore the Alter-Art paradigm in dance, theater, and painting; it integrates immersive teleoperation and compliant actuation to enable a first-person creative experience. Analyzing qualitative artistic feedback, we investigate how embodiment shapes creative agency, identity and interaction with the environment. Our findings suggest that artists rapidly develop a sense of presence within the robotic body. The robot's physical constraints influence the creative process, manifesting differently across artistic domains. We highlight embodiment as a central design principle, contributing to social robotics and expanding the possibilities for telepresence and accessible artistic expression.

2604.27169v1 · Apr 29 · Language · Machine Learning

Semantic Structure of Feature Space in Large Language Models

Austin C. Kozlowski, Andrei Boutyline

Large language models organize word meanings in their internal space in ways that closely match how humans psychologically associate those same words.

Why it matters

AI safety and interpretability researchers want to understand what's actually happening inside language models. This paper offers evidence that models aren't just doing statistical tricks — they've built something structurally similar to human conceptual maps. That has practical consequences: if you try to nudge a model to be less biased on one dimension (say, gender), you'll inevitably nudge it on related dimensions too, whether you wanted to or not. It also bridges AI research with decades of work in psychology and sociology on how humans organize meaning.

Method

  • — Took 360 common words that psychologists studied back in 1958, where humans rated each word on 32 scales like beautiful-ugly, soft-hard, fast-slow.
  • — Fed those words into a large language model (Llama 3 family) and extracted the model's internal numerical representation of each word.
  • — Built a semantic axis for each scale by taking the model's representation of one end (e.g. 'beautiful') and subtracting the other end ('ugly'), giving a direction in the model's internal space.
  • — Projected each word onto each axis to get the model's implicit 'rating' of that word, then compared those ratings to the human survey ratings from 1958.
  • — Checked whether axes that humans treat as related (like good-bad and beautiful-ugly) also point in similar directions inside the model.
  • — Ran a steering experiment: artificially pushed a word along one axis inside the model and watched how its position on other axes changed.

Result

The model's projections lined up strongly with human ratings across the 32 scales. Beyond that, the geometric similarity between two axes inside the model predicted how correlated those scales were in the human survey — meaning the model recovered not just individual word meanings but the relational structure between concepts. Most of the variation across the 32 axes lived in just a few underlying dimensions, echoing a classic finding in psychology where human judgments collapse onto a small number of core factors (like evaluation, potency, activity). Finally, the steering experiment confirmed causal spillover: making a word 'more beautiful' inside the model also made it 'better' and 'softer,' and the size of the spillover matched how geometrically close those axes were.

Caveats

The study uses a 1958 psychology dataset, so the human ratings reflect mid-20th-century American intuitions, which may not generalize. It looks at single words in isolation, not phrases or context-dependent meanings. The work is correlational at the representation level; while the steering experiment adds causal evidence, it's still inside one model family. There's also no test of whether this structure helps or hurts downstream tasks — it's a finding about the geometry, not about model behavior in the wild. And because the structure is so entangled, debiasing efforts that target one axis will almost certainly affect others, which is more of a warning than a solution.

Builds on

  • Jenkins, Russell & Suci, 1958

    This is the original psychology study that collected human ratings of 360 words on dozens of semantic scales. The new paper uses these decades-old human ratings as the ground truth to compare against the language model's internal geometry.

  • Osgood, Suci & Tannenbaum, 1957

    Introduced the semantic differential technique and the finding that human word judgments cluster onto a few core dimensions (evaluation, potency, activity). The paper shows language models reproduce this same low-dimensional structure.

  • Kozlowski, Taddy & Evans, 2019

    Earlier work using word embeddings to study cultural meaning by projecting words onto axes like rich-poor. This paper extends the technique from older embeddings to modern large language model hidden states and adds causal steering experiments.

  • Panickssery et al, 2023

    Developed contrastive activation steering for Llama 2, showing you can push models in a direction by adding a vector. This paper uses similar steering ideas to test whether moving a word on one semantic axis spills over to related axes.

Original abstract

We show that the geometric relations between semantic features in large language models' hidden states closely mirror human psychological associations. We construct feature vectors corresponding to 360 words and project them on 32 semantic axes (e.g. beautiful-ugly, soft-hard), and find that these projections correlate highly with human ratings of those words on the respective semantic scales. Second, we find that the cosine similarities between the semantic axes themselves are highly predictive of the correlations between these scales in the survey. Third, we show that substantial variance across the 32 semantic axes lies on a low-dimensional subspace, reproducing patterns typical of human semantic associations. Finally, we demonstrate that steering a word on one semantic axis causes spillover effects on the model's rating of that word on other semantic scales proportionate to the cosine similarity between those semantic axes. These findings suggest that features should be understood not only in isolation but through their geometric relations and the meaningful subspaces they form.

Projecting LLM hidden-state word representations onto semantic-differential axes reproduces human psychological ratings, inter-scale correlations, low-dimensional structure, and causal spillover under steering.

Why it matters

Mechanistic interpretability has largely focused on identifying individual features (via probing, SAEs, contrastive directions). This paper argues that the relations between features carry just as much information, and that those relations align with decades-old findings from psychometrics and cultural sociology (Osgood's semantic differential, evaluation/potency/activity factors). That has two implications. First, interpretability tooling that treats features as independent will miss systematic entanglement. Second, alignment interventions like feature steering or activation addition will have predictable, geometry-determined side effects — debiasing along one axis will move the model along correlated axes whether you want it to or not. It also lends empirical weight to the linear representation hypothesis at the level of structured semantic subspaces.

Method

  • — Stimulus set: the 360 words from Jenkins, Russell & Suci (1958), each rated by humans on 32 bipolar semantic-differential scales (e.g. beautiful-ugly, soft-hard, strong-weak).
  • — Model: Llama 3 family. The paper extracts hidden-state representations of single words; the abstract does not specify which layer(s) but the body presumably analyzes one or more residual stream layers.
  • — Axis construction: for each of the 32 scales, build an axis vector by differencing the hidden-state representations of the two pole words (e.g. h(beautiful) − h(ugly)). This is the standard semantic differential / contrastive direction approach used in word-embedding sociology and in CAA-style steering.
  • — Word scoring: project each of the 360 word vectors onto each axis (cosine or dot product) to obtain a model-implied rating per word per scale.
  • — Test 1 — word-level alignment: correlate model projections with human ratings across the 360 words for each of the 32 scales.
  • — Test 2 — axis-level alignment: compute pairwise cosine similarities between the 32 axis vectors and correlate them with the human-survey correlations between the 32 scales.
  • — Test 3 — dimensionality: run PCA (or analogous decomposition) on the axis set to check whether a few components capture most variance, paralleling Osgood's evaluation/potency/activity finding.
  • — Test 4 — causal steering: intervene on a word's hidden state by adding a scaled axis vector, then re-measure projections on all other axes. Check whether spillover magnitude tracks cosine similarity between the steered axis and the measured axis.
  • — Assumptions: (i) the linear representation hypothesis — that semantic attributes correspond to linear directions; (ii) pole-difference vectors are reasonable axis estimators; (iii) single-token / single-word probes are meaningful despite LLMs operating on context.
  • — Caveats baked into the design: 1958 norms are culturally and temporally narrow; only 32 scales and 360 words; one model family; out-of-context word probing.

Result

Across the 32 scales, model projections correlate highly with human ratings — the abstract characterizes the correlation as 'high' but does not give a numeric value; the body presumably reports per-scale correlations and an aggregate. More striking is the second-order result: the cosine-similarity matrix between the 32 LLM axes is itself highly predictive of the correlation matrix between the same 32 scales in human ratings. That means the model recovers the relational geometry of human semantic space, not just per-word meanings. PCA-style analysis shows that substantial variance across the 32 axes lies on a low-dimensional subspace, reproducing the kind of compressed structure (a few dominant factors) that Osgood and successors found in human judgment data. The steering experiment closes the loop causally: pushing a word along one axis produces shifts on other axes whose magnitude is proportional to the cosine similarity between the source and target axes. So the geometry is not just descriptive — it predicts intervention outcomes.

Caveats

Several limitations are worth flagging. (1) Stimulus age: the 1958 norms encode mid-century American associations; modern human ratings might agree less with the model, and disagreements would be informative but aren't tested here. (2) Single model family: results are reported on Llama 3; whether the same structure holds across model scales, training data, instruction tuning, and architectures (e.g. Gemma, Mistral, GPT-class) is open. The body may include more, but the abstract commits only to one. (3) Pole-difference axes are known to be noisy estimators (see Boutyline & Johnston 2025); reliability of individual axes likely varies, and scales with weaker axes may drive most of the residual error. (4) Out-of-context word probing: LLM representations are deeply contextual, and single-word hidden states may underrepresent polysemy or context-dependent meaning shifts. (5) The steering spillover result is consistent with linear superposition but does not distinguish a causally entangled representation from one where downstream readout heads happen to share components. (6) No downstream behavioral test: it's unclear whether this geometric alignment translates to model outputs in generation, classification, or alignment-relevant tasks. (7) The low-dimensional finding is qualitatively reminiscent of evaluation/potency/activity but the abstract doesn't claim a clean three-factor recovery — readers should not over-interpret. Likely pushback: interpretability researchers focused on SAEs may argue that residual-stream contrastive axes are too coarse and that monosemantic SAE features would tell a different story. The natural next step is to repeat the analysis on SAE feature spaces (Bricken et al., Templeton et al., Gemma Scope) and check whether the same relational geometry emerges, and whether spillover under SAE-feature steering follows the same cosine law.

Builds on

  • Jenkins, Russell & Suci, 1958

    Source of the 360-word, 32-scale semantic-differential dataset used as human ground truth. The paper directly imports their stimuli and ratings to benchmark LLM hidden-state geometry.

  • Kozlowski, Taddy & Evans, 2019

    Established projecting words onto pole-difference axes in word embeddings to study cultural meaning (e.g. class, gender). This paper inherits the axis-projection methodology but applies it to LLM hidden states and adds inter-axis structure plus causal steering.

  • Park, Choe & Veitch, 2023

    Formalized the linear representation hypothesis and the geometry of concepts in LLMs. The current paper provides empirical support for structured, relationally consistent linear representations and extends the picture from individual concepts to a connected semantic manifold.

  • Panickssery et al, 2023

    Contrastive activation addition for steering Llama 2. This paper uses analogous contrastive directions but tests a new prediction: spillover magnitude under steering is proportional to cosine similarity between source and target axes, providing a quantitative law for steering side effects.

Original abstract

We show that the geometric relations between semantic features in large language models' hidden states closely mirror human psychological associations. We construct feature vectors corresponding to 360 words and project them on 32 semantic axes (e.g. beautiful-ugly, soft-hard), and find that these projections correlate highly with human ratings of those words on the respective semantic scales. Second, we find that the cosine similarities between the semantic axes themselves are highly predictive of the correlations between these scales in the survey. Third, we show that substantial variance across the 32 semantic axes lies on a low-dimensional subspace, reproducing patterns typical of human semantic associations. Finally, we demonstrate that steering a word on one semantic axis causes spillover effects on the model's rating of that word on other semantic scales proportionate to the cosine similarity between those semantic axes. These findings suggest that features should be understood not only in isolation but through their geometric relations and the meaningful subspaces they form.

Caught up.

papers richer, with enough detail to actually remember them.

New weekly roster soon.

Five papers in your inbox every morning.

Settings

Default reading depth

Shortcuts

Next paper
→ or Space
Previous paper
Brief depth
1
Full depth
2
Deep depth
3
Copy link
C
Open on arXiv
O
Toggle highlights
H
Minimize glossary
G
Show shortcuts
?
Close
Esc