Tag: Engineering

  • Huge SpaceX rocket explosion shredded the upper atmosphere

    Huge SpaceX rocket explosion shredded the upper atmosphere

    [ad_1]

    The SpaceX "Starship" rocket just about to take off, surrounded by plumes of smoke

    SpaceX’s Starship rocket is the most powerful launch vehicle ever built.Credit: Joe Marino/UPI/Shutterstock

    The huge explosions that destroyed SpaceX’s Starship mega-rocket last year also blew one of the biggest ‘holes’ ever detected in the ionosphere, a layer of thin air in the upper atmosphere. The hole stretched for thousands of kilometres and persisted for nearly an hour, a study found1.

    Study co-author Yury Yasyukevich, an atmospheric physicist at the Institute of Solar‐Terrestrial Physics in Irkutsk, Russia, says that the extent of the disturbance took his team by surprise: “It means we don’t understand processes which take place in the atmosphere.” He adds that such phenomena could have implications for future autonomous vehicles that might require precision satellite navigation. The results were published on 26 August in Geophysical Research Letters.

    Record-setting rocket

    On 18 November last year, SpaceX launched its Starship rocket — the biggest and most powerful rocket ever built — from a launchpad in Boca Chica, Texas. Starship’s first stage is designed to return safely to the surface for reuse but blew up shortly after separating from the upper stage, roughly 90 kilometres above the Gulf of Mexico. Minutes later, the self-destruction mechanism on the upper stage fired, triggering a second explosion at an altitude of around 150 kilometres.

    Yasyukevich and his collaborators were curious to find out how such massive explosions could affect the ionosphere, a zone of the atmosphere extending from about 50 to 1,000 kilometres above sea level in which the Sun’s radiation can strip some air molecules of their electrons. The result is that a small percentage of the ionosphere’s mass consists of electrons and positively charged ions, while the rest of the air molecules remain neutral. The exact ratio of ionized to neutral molecules varies with factors such as altitude and latitude.

    That ratio affects the speed at which the radio waves beamed down by global navigation satellites propagate in the ionosphere. Crucially, changes in the ratio have different effects on different radio frequencies. This enables researchers to measure the amount of ionization in real time by comparing the speeds of radio waves of two different frequencies, Yasyukevich explains.

    Such data have been used for decades to reveal how events ranging from earthquakes to underground nuclear tests affect the ionosphere. These natural and human-caused disruptions can temporarily nullify the effects of solar radiation by causing electrons and ions to recombine into neutral molecules.

    Neutralizing the air

    The team examined publicly available data from more than 2,500 ground stations across North America and the Caribbean that receive satellite navigation signals. They found that the Starship explosions produced shock waves that travelled faster than the speed of sound, turning the ionosphere into a region of neutral atmosphere — a “hole” — for nearly an hour over a region stretching from Mexico’s Yucatán peninsula to the southeastern United States. Rocket exhaust can trigger chemical reactions that produce temporary holes in the ionosphere even in the absence of an explosion, but in this case the shockwaves themselves had by far the larger effect, Yasyukevich says.

    “I was impressed by this case study,” says Kosuke Heki, a geophysicist at Hokkaido University in Sapporo, Japan, who was an open reviewer for the paper. But he thinks that the chemical effects of the large conflagration were the dominant cause of the hole.

    The hole was not quite as large as the one caused by the eruption of a Tongan volcano in early 2022, Heki says, but it beat the one produced by the historic meteor that fell near Chelyabinsk, Russia, in 2013 — the biggest in a century.

    Ionospheric disturbances can affect not only satellite navigation but also communications and radio astronomy. As launch frequencies increase, these effects might become more of a problem.

    [ad_2]

    Source link

  • Beetle-inspired flapping robots effortlessly deploy and retract their wings

    Beetle-inspired flapping robots effortlessly deploy and retract their wings

    [ad_1]

    • RESEARCH BRIEFINGS

    Insects are thought to use specific chest muscles to actively open and close their wings. However, high-speed imaging reveals that rhinoceros beetles flap their hindwings to deploy them for flight, and use their forewings to push their hindwings back to rest. This inspired the design of flapping microrobots with self-deploying, self-retracting wings.

    [ad_2]

    Source link

  • We need to prepare our transport systems for heatwaves — here’s how

    We need to prepare our transport systems for heatwaves — here’s how

    [ad_1]

    Climate change is making heatwaves more frequent and severe. In June, at least 1,300 people died because of heat during the annual Hajj pilgrimage in Saudi Arabia, marking a higher toll than in previous years. People living in the tropics experience the worst effects of heat. But by 2100, three-quarters of the world’s population could be exposed to climatic conditions that exceed a lethal threshold of temperature and humidity, compared with just under one-third in 20001.

    Transport systems, too, are adversely affected. Extreme heat buckles rails, melts wires and road surfaces and bursts tyres. This year, traffic in India has been disrupted by melting road surfaces. And last summer, bus passengers in Houston, Texas, had to wait in bus shelters hot enough to make people ill. During the 2022 heatwave in the United Kingdom, scores of flights were disrupted by melting runways and trains were delayed by warped rails.

    Such problems will grow as the world warms. For example, by 2050, Spain could face up to 500 cases of rail buckling each year, compared with 20 expected in 20252. By 2100, pavements in New Hampshire might need to be replaced every 4 years, compared with every 16 years today3. Also by the end of the century, annual costs of road and rail operation and maintenance in the European Union and United Kingdom are projected to increase by €0.9 billion (US$1 billion) compared with 2016 if there is 1.5 °C of global warming above pre-industrial levels, and by €4.8 billion for 4 °C of warming4.

    The effects will be felt unevenly, with disadvantaged communities hit the hardest. For example, in Oregon, between 2012 and 2017, the number of people taking the bus on very hot days (30 °C or more) dropped by 1.6% in lower-income neighbourhoods, by 1% in middle-income ones, and hardly at all in high-income areas5. Governments, bus and rail operators and cities should urgently prepare transport systems for a warmer future. But efforts so far have been inadequate. Just over 160 countries mention ‘transport’ in their policies for adapting to extreme heat. Yet, few have progressed from planning to implementation, owing to lack of funding, prioritization, coordination, technical expertise and capacity, as well as uncertainty over what the future holds.

    A comprehensive, strategic approach is needed. Existing frameworks for coping with extreme heat either focus only on disaster response or single out physical transport infrastructure without considering the people who use it. A wider range of climate-change hazards, vulnerabilities, risks and infrastructure conditions needs to be factored in.

    Here we propose a road map to strengthen transport systems’ heat resilience. In the short and medium term, transport operators should assess climate risks and the feasibility of various solutions and implement some ‘low regret’ options to improve resilience to extreme weather. In the longer term, strategies need to be designed to improve the resilience of transport steadily as conditions change. This road map reflects deliberations of an expert panel on transport infrastructure and extreme heat, convened with funding from the Global Facility for Disaster Reduction and Recovery.

    Set up a governance framework

    The first step is to establish what immediate risks extreme heat poses to local, regional and national transport networks and users. Municipalities and other areas should set up a transport task force of researchers, agencies, citizens, policymakers and governments. The task force should consider climate projections and impact assessments from the Intergovernmental Panel on Climate Change and the annual United Nations Framework Convention on Climate Change Conference of the Parties. Mitigation or adaptation strategies already in place for heatwaves should be identified. All modes of transport need to be represented, including by air, water, rail and roads.

    A train is stopped at in Melbourne, Australia, due to buckled train tracks from extreme heat that workers are cooling off.

    Cooling railway tracks with water or by using white paint can reduce rail buckling.

    Governance should be put in place — heat policies remain scarce worldwide. Ideally, a regional chief heat officer would coordinate efforts across sectors. As well as protecting human health, policies should consider the resilience of transport infrastructure to heat, how best to guide investments and how to avoid failures of crucial systems. First responders, utility companies and transport-concession firms should be involved in shaping policies.

    Transport should be embedded in all heatwave-related efforts. Impacts on movement of people and goods should be included, as is the case in the US National Integrated Heat Health Information System (NIHHIS), created by the National Oceanic and Atmospheric Administration and Centers for Disease Control and Prevention. The NIHHIS compiles and shares information about extreme heat with the public and with decision makers.

    International collaboration is crucial. This includes in capacity building, finance to support adaptation to heat and sharing tools and data for assessing risks. The Global Heat Health Information Network, run by the World Health Organization and the World Meteorological Organization, is one such forum.

    Best practices can be borrowed from other sectors. For example, flood risk governance has specific objectives, such as preventing a once-every-100-years flood. Heat governance needs equivalent targets for planning.

    A heat sensor applied to a car for transport systems in South Africa.

    Volunteers in Cape Town, South Africa, are attaching sensors to their cars to map heat across the city.Credit: Chris Morgan/World Bank

    Deploy sensors and monitoring

    Temperature sensors and monitoring systems should be installed along transport networks, especially around the most vulnerable parts, such as exposed sections of railway track and underground infrastructure with poor ventilation6. Combining temperature sensors with Internet of Things technologies can enable real-time monitoring of conditions. This would allow improved planning for maintenance as well as better-informed, local, dynamic responses to changing weather conditions — for instance, imposing speed restrictions to improve safety.

    Start feasibility studies

    Practical assessments should start as soon as possible to test existing heat-adaptation strategies in various geographical contexts. Tempe in Arizona, Abu Dhabi and Singapore are increasing the amount of shade provided for pedestrians, cyclists, passengers and transport workers, such as bus drivers and maintenance staff7. Phoenix, Arizona, has installed ‘cool pavements’, by applying coatings that reflect solar radiation. Spain is using heat sensors on rail tracks to provide early warnings, as well as heat-reflective coatings and paints to limit rail temperature increases. The United Kingdom sets speed limits during heatwaves to prevent rail buckling and reduce the risk of accidents.

    Cost–benefit analyses are needed to prioritize strategies, resources and investments. In Seoul, for example, transforming roads into green corridors has been a cost-effective way to reduce heatwave impacts for pedestrians, cyclists and public-transport users.

    Develop tools for adaptation

    Given uncertainty about the extent and impacts of climate change, governments and cities need to develop ‘resilience toolkits’ for keeping the transport sector running in extreme conditions. Such tools range from simple rules of thumb for local practitioners to ‘digital twins’ of transport systems for monitoring and planning. Current toolkits tend to focus only on local climate risks (see go.nature.com/3sazyuk) or on public health (the NIHHIS, for example).

    To inform these toolkits, transport managers need to gather meteorological data, predict future climate patterns, assess the heat resilience of infrastructure components and monitor user exposure to heat.

    Transport toolkits should be aligned with wider climate adaptation strategies or pathways. They should support an iterative process for continual improvement: both the infrastructure and toolkit should evolve as the climate changes8. Policymakers should identify pathways that optimize resilience with minimal financial outlay, unless investment permits pricier alternatives that will result in longer-term resilience. They can and should make decisions and adjustments as conditions evolve. Standards and design guidelines for heat-resilient systems will also need to be updated regularly as harsh conditions become more frequent6.

    A tourist bus rides past a thermometer displaying 49 Celsius degrees (120 Fahrenheit) in Madrid, Spain.

    Countries should put guidelines for public transport in place to account for increasingly hot conditions.Credit: Isabel Infantes/Reuters

    Invest in technological innovations

    Governments must invest in research and development to foster technological innovations that address the challenges posed by extreme heat. For example, thermochromic materials, which change colour with temperature, can be used for regulating temperature in both hot and cold conditions. However, current versions are expensive and degrade when exposed to ultraviolet radiation. Techniques for storing heat, such as graphite-powder-filled pavements, can help to regulate pavement temperatures. Renewable-energy technologies can also be installed in pavements to harvest heat from them and turn it into electricity. Bus stops are being upgraded with cooling panels and industrial fans.

    Consider heat equity

    Low-income households, people of colour, women, older people and those with disabilities are affected disproportionately by heatwaves. In 2018, for example, 20% of women cancelled metro trips in Delhi because of extreme heat, compared with around 10% of men9. Women might be more vulnerable to extreme heat than men, because of physiological differences, cultural norms or responsibilities, for instance. More research is necessary to better understand such disparities.

    To address heat inequity, planners and decision makers must engage with marginalized communities to understand their specific challenges and concerns and incorporate the input into heat strategies. Widespread tree-planting campaigns and financial incentives for green roofs, for example, can improve living conditions in cities for everyone. For example, in Los Angeles, California, the Cool LA programme was launched in 2019 to lay 400 kilometres of cool pavements and plant about 2,000 trees in low-income areas. Other interventions include establishing cooling centres and improving public-transport infrastructure and services. For instance, Kelowna in Canada offers free public transport for people travelling to and from cooling centres and sprays water vapour on some pedestrian areas when heat warnings are issued.

    Education and awareness campaigns are essential to ensure all community members can access information about heat risks and protective measures. More actions must be taken to integrate heat equity into the transport-planning process to avoid climate gentrification and favouring of the wealthiest.

    Resilient and inclusive systems can better serve the needs of all residents, irrespective of their socio-economic status or background. The time to act is now — the costs of inaction will only grow as our world continues to warm.

    [ad_2]

    Source link

  • Revolutionary Catalyst Coating Technology Skyrockets Fuel Cell Performance in Just 4 Minutes

    Revolutionary Catalyst Coating Technology Skyrockets Fuel Cell Performance in Just 4 Minutes

    [ad_1]

    Morphology Evolution of Oxide Nano Catalyst

    A collaborative research team has developed a new catalyst coating technology that enhances solid oxide fuel cell performance threefold in just four minutes, offering potential advancements in energy conversion technology. Credit: Korea Institute of Energy Research (KIER)

    A new oxide catalyst coating technique significantly enhances the performance of solid oxide fuel cells, tripling their efficiency. This breakthrough technology is versatile and can be applied to various applications, including solid oxide fuel cells and high-temperature electrolysis.

    Researchers have developed a groundbreaking catalyst coating technology for solid oxide fuel cells (SOFCs) that drastically enhances performance within just four minutes. The technology, which employs nanoscale praseodymium oxide catalysts, targets the oxygen reduction reaction at the air electrode, increasing the power output of SOFCs significantly. This new method, which is economical and compatible with existing manufacturing processes, promises broader applications, including high-temperature electrolysis for hydrogen production.

    Dr. Yoonseok Choi of the Hydrogen Convergence Materials Laboratory at the Korea Institute of Energy Research (KIER), along with Professor WooChul Jung from the Department of Materials Science and Engineering at KAIST and Professor Beom-Kyung Park from the Department of Materials Science and Engineering at Pusan National University, has successfully developed a catalyst coating technology that dramatically enhances the performance of solid oxide fuel cells (SOFCs) in just 4 minutes.

    Fuel cells are gaining attention as highly efficient and clean energy devices driving the hydrogen economy. Among them, solid oxide fuel cells (SOFCs), which have the highest power generation efficiency, can use various fuels such as hydrogen, biogas, and natural gas. They also allow for combined heat and power generation by utilizing the heat generated during the process, making them a subject of active research and development.

    Schematic Illustrations of the Electrochemical Coating Process on LSM–YSZ Electrode of SOFCs

    Schematic illustrations of the electrochemical coating process on LSM–YSZ electrode of SOFCs. Credit: Korea Institute of Energy Research (KIER)

    Challenges in SOFC Performance

    The performance of solid oxide fuel cells (SOFCs) is largely determined by the kinetics of oxygen reduction reaction (ORR) occurring at the air electrode (cathode). The reaction rate at the air electrode is slower than that of the fuel electrode (anode), thus limiting the overall reaction rate. To overcome this sluggish kinetics, researchers are developing new air electrode materials with high ORR activity. However, these new materials generally still lack chemical stability, requiring ongoing research.

    Yoon Seok Choi and Research Team

    Photo of the Joint Research Team (Yoon-Seok Choi, Senior Researcher, on the far right). Credit: Korea Institute of Energy Research (KIER)

    Instead, the research team focused on enhancing the performance of the LSM-YSZ composite electrode, a material widely used in industry due to its excellent stability. As a result, they developed a coating process for applying nanoscale praseodymium oxide (PrOx) catalysts on the surface of the composite electrode, which actively promotes the oxygen reduction reaction. By applying this coating process, they significantly improved the performance of solid oxide fuel cells.

    Simplified Electrochemical Deposition Method

    The research team introduced an electrochemical deposition method that operates at room temperature and atmospheric pressure, requiring no complex equipment or processes. By immersing the composite electrode in a solution containing praseodymium (Pr) ions and applying an electric current, hydroxide ions (OH-) generated at the electrode surface react with praseodymium ions, forming a precipitate that uniformly coats the electrode. This coating layer undergoes a drying process, transforming into an oxide that remains stable and effectively promotes the oxygen reduction reaction of the electrode in high-temperature environments. The entire coating process takes only 4 minutes.

    Additionally, the research team elucidated the mechanism by which the coated nano-catalyst promotes surface oxygen exchange and ionic conduction. They provided fundamental evidence that the catalyst coating method can address the low reaction rate of the composite electrode.

    By operating the developed catalyst-coated composite electrode and the conventional composite electrode for over 400 hours, the team observed that the polarization resistance was reduced tenfold. Additionally, the SOFC using this coated electrode exhibited a peak power density three times higher (142 mW/cm² → 418 mW/cm²) than that of the uncoated case, at 650 degrees Celsius. This represents the highest performance reported for SOFCs using LSM-YSZ composite electrodes in literature.

    Dr. Yoonseok Choi, co-corresponding author, stated, “The electrochemical deposition technique we developed is a post process that does not significantly impact the existing manufacturing process of SOFCs. This makes it economically viable for introducing oxide nano-catalysts, enhancing its industrial applicability.” He added, “We have secured a core technology that can be applied not only to SOFCs but also to various energy conversion devices, such as high-temperature electrolysis (SOEC) for hydrogen production.”

    Reference: “Revitalizing Oxygen Reduction Reactivity of Composite Oxide Electrodes via Electrochemically Deposited PrOx Nanocatalysts” by Seongwoo Nam, Jinwook Kim, Hyunseung Kim, Sejong Ahn, SungHyun Jeon, Yoonseok Choi, Beom-Kyeong Park and WooChul Jung, 22 March 2024, Advanced Materials.
    DOI: 10.1002/adma.202307286

    The study was conducted with support from the Ministry of Trade, Industry, and Energy’s Core Technology Development Program for New and Renewable Energy and the Ministry of Science and ICT’s Individual Basic Research Program.



    [ad_2]

    Source link

  • See how your body works in real time — wearable ultrasound is on its way

    See how your body works in real time — wearable ultrasound is on its way

    [ad_1]

    Full-body wearable ultrasound imaging technology about two centimeters in size with adhesive.

    A wearable ultrasound device that attaches to the skin using a bioadhesive.Credit: Chonghe Wang

    On a sunny day in April, one of us (C.W.) jogged along the Charles River in Cambridge, Massachusetts, with a series of small patches adhered to the skin. Each patch was an ultrasound sticker. On a smartphone, a live feed cycled through views of heart valves fluttering, muscles flexing, the diaphragm’s rise and fall and the flow of blood through arteries. Seamlessly connected to these hidden layers of physiology, the runner was able to watch the workings of their body in real time.

    Wearable ultrasound devices such as these stickers still face many challenges. But they represent our vision for the future of ultrasound imaging and wearable devices1,2. Here we outline the promise of such technologies, and highlight remaining hurdles towards rolling out the devices in the next 5–10 years.

    Imaging the body in real time

    In the field of personal health-care technology, wearables such as Fitbit and Apple Watch have become household names. These devices — small powerhouses of sensors and smart technology — can track steps, monitor heart rates and even perform electrocardiograms that once required a visit to the doctor’s office. They present biometrics in neat, digestible metrics, nudging people towards healthier lifestyles3. Wearable glucose monitors are also freeing people with diabetes from the frequent pricks of a needle, providing continuous readings of blood sugar.

    Yet existing wearables generally collect data from only within millimetres below the skin’s surface. Other technologies can see far beyond this superficial layer — magnetic resonance imaging (MRI), X-rays and ultrasound, for example, can image internal organs. Of these, ultrasound is emerging as the front-runner in the race towards wearable adaptation.

    Ultrasound operates on the principle of sonar, sending high-frequency sound waves into the body, which bounce back from internal structures to produce real-time images of dynamic processes such as a heart beating or blood flowing. Conventional point-of-care ultrasound imaging devices require a trained sonographer to press a handheld ultrasound probe against a patient who is static, meaning these devices are usually confined to hospitals and clinics. However, unlike X-rays, which require complex and potentially harmful ionizing radiation, ultrasound waves are relatively easy to produce, non-invasive and safe. These characteristics make ultrasound uniquely suited to a wearable form that is capable of continuous monitoring.

    Our team has been exploring wearable ultrasound technology for the past few years1,2, alongside other groups of researchers pursuing the same goal411. The device we have developed is just a few centimetres long, attached to the skin using a bioadhesive and connected by wires to a pocket-sized battery and data-transmission system. This system wirelessly transmits clinical-level data to a tablet or smartphone (see ‘Wearable ultrasound imaging’). Our prototype, although still in its infancy, can deliver continuous, high-quality images of deep tissues. The battery life currently allows for minute-long videos to be taken intermittently: multiple times per hour over several days. We have not yet conducted clinical trials, but we expect to reach that stage in the next few years.

    Wearable ultrasound imaging. A graphic showing how adhesive patches interact with the skin and how ultrasound scan images are recorded and transmitted.

    Source: C. Wang & X. Zhao

    Wearable ultrasound systems hold great promise for transforming health care, supporting a shift towards preventive care and proactive health management. In clinical settings, they offer the potential for constant monitoring of high-risk patients, tracking fetal health in high-risk pregnancies or overseeing recovery after surgery. Beyond hospitals, these devices could bring diagnostic and monitoring tools to remote areas, making medical imaging more accessible and affordable in low- and middle-income countries. As the technology is refined, we foresee its integration into the daily lives of individuals to manage chronic conditions such as hypertension, or to enable early detection of heart failure, abdominal aortic aneurysms and deep vein thrombosis3.

    The ability to image multiple organ systems continuously and simultaneously for extended periods also opens up opportunities to enhance our understanding of complex physiological and pathological processes. The data and insights obtained could substantially broaden our understanding of human biology and physiology at a systemic level.

    Seven steps to market

    Wearable ultrasound technology has already overcome some key technical challenges, but there are more ahead. These include refining the durability, flexibility and accuracy of these devices, as well as making them more comfortable to wear and extending their battery life.

    Miniaturization. The company Butterfly Network in Burlington, Massachusetts, has made important strides in miniaturizing ultrasound technology. The firm has created a compact, ultrasound-on-chip platform for its Butterfly iQ device12, a small handheld ultrasound unit for use in clinics that was approved by the US Food and Drug Administration (FDA) in 2017. Although Butterfly iQ is not wireless or fully wearable, it demonstrates that high-quality imaging can be achieved with a compact ultrasound device.

    Furthermore, recent advancements in ultrasonic-system-on-patch (USoP) technology have led to the development of fully integrated systems that combine ultrasound probes with miniaturized wireless control electronics in a soft, wearable format13.

    Although the USoP technology cannot provide high-quality imaging, it allows continuous tracking of physiological signals from deep tissues and can operate wirelessly. The next step for these technologies is to integrate the capability of clinical-quality ultrasound imaging with fully integrated, wireless and wearable ultrasound devices.

    Skin connection. To transmit and receive sound waves efficiently, an ultrasound probe needs to have a good connection with the skin. In the clinic, sonographers typically use a liquid gel to fill the gap between the probe and the skin, but the gel flows away quickly in wearable applications.

    A few research groups have made the ultrasound probes stretchable to conform to the skin’s curved surface without the need for liquid gel9,11,13. However, the stretchable probes face an intrinsic challenge: because high-quality imaging relies on precise knowledge of transducer positions in the ultrasound probe, stretching it could markedly impair its imaging performance.

    Our team has developed a bioadhesive ultrasound, which uses a thin, rigid ultrasound probe adhered to the skin with a specialized ‘couplant’ layer made of a hydrogel–elastomer hybrid that is soft yet tough, anti-dehydrating and bioadhesive1. This provides a robust and flexible connection between the probe and the skin, maintaining a good acoustic interface for high-quality imaging.

    A medical team walk on a mountainside path carrying medical equipment in China's Yunnan Province.

    A team of doctors carries ultrasound equipment to a remote village in China’s Yunnan province.Credit: Zhang Jiayang/Xinhua via Zuma

    Directionality. Another important challenge is ensuring that the ultrasound probe continues to point into the body in exactly the right direction even as the wearer moves around. We have been working on making our bioadhesive couplant adjustable so that the direction of the sound waves can be fine-tuned2. Currently, this requires an initial manual adjustment by clinicians to ensure that the probe is correctly aligned.

    In practice, the bioadhesive ultrasound patches generally maintain their orientations well during typical activities, such as jogging and walking. However, they can require readjustment when subjected to more extreme movements, such as rolling over during sleep or if they are bumped.

    Data analysis. Artificial intelligence (AI) can help with data analysis by interpreting the images produced and then alerting a clinician — or even the user — to potential problems and health concerns. AI-assisted data analysis is already used in clinical settings to interpret diagnostic data, including ultrasound images and X-rays. However, these systems face fresh challenges in wearable ultrasound, because the user’s movements introduce extra noise and variability into the data.

    We are working on neural networks and generative AI models that enhance image clarity and reduce false alarms. These advanced AI algorithms are being developed to filter out motion artefacts and to improve the accuracy of continuous monitoring, ensuring that only significant health concerns are flagged for further examination.

    Data transmission. Ensuring data privacy and developing robust wireless communication protocols that can handle the vast data streams from continuous imaging will be pivotal. Although our current system can handle intermittent data sampling, continuous data transmission remains a hurdle.

    The main challenges include developing secure and efficient data-compression methods and ensuring real-time transmission without compromising patient privacy. We are making progress, but this requires further research and development to improve bandwidth efficiency and ensure robust encryption methods to protect patient data.

    Translation. Collaboration between scientists, engineers, clinicians and regulatory bodies is already happening and is essential for achieving the full potential of wearable ultrasound technology. Partnerships with technology innovators are driving advancements in components such as batteries and sensors, and collaboration with data scientists is refining the algorithms needed for operation guidance and data analysis.

    As we advance wearable ultrasound technology, ongoing rigorous research, clinical trials and patient feedback will be crucial. These collaborations are fostering an environment in which innovative solutions can be developed and implemented, moving us closer to fully realizing the potential of wearable ultrasound systems.

    Regulation. Whereas there are clear regulatory pathways for conventional ultrasound devices, establishing similar pathways for wearable ultrasound devices will accelerate their safe and effective integration into clinical practice. Currently, the FDA and other regulatory bodies are working to adapt existing regulations to account for the unique aspects of wearable technology.

    The main challenges include creating standards for continuous monitoring, ensuring data privacy and securing wireless transmission to prevent unauthorized access to sensitive information. Addressing these regulatory gaps will be crucial to ensuring the safety and efficacy of wearable ultrasound devices in clinical settings.

    Achieving these objectives will be key to transitioning wearable ultrasound from a promising prototype to an indispensable tool in personalized medicine. Ultimately, the widespread adoption of wearable ultrasound devices will transform not only how we monitor chronic conditions, but also how we understand the human body14.

    [ad_2]

    Source link

  • Robotic exoskeleton adapts to its wearer through simulated training

    Robotic exoskeleton adapts to its wearer through simulated training

    [ad_1]

    Nature, Published online: 12 June 2024; doi:10.1038/d41586-024-01506-6

    A strategy for training a robotic exoskeleton through simulation takes the user out of the equation — saving users of wearable devices time and energy, and smoothing the transition between different types of movement.

    [ad_2]

    Source link

  • World’s first wooden satellite could herald era of greener space exploration

    World’s first wooden satellite could herald era of greener space exploration

    [ad_1]

    Takao Doi holds a model of the wooden artificial satellite, LignoSat.

    Takao Doi, an astronaut and engineer at Kyoto University, holds the world’s first wooden satellite.Credit: Kota Kawasaki/Yomiuri Shimbun via AP/Alamy

    Researchers unveiled the world’s first wooden satellite last month, billing it as clearing a path for more uses of wood in outer space. The material will be more sustainable and less polluting than the metals used in conventional satellites, they say.

    Researchers at Kyoto University in Japan and the Tokyo-based logging company Sumitomo Forestry showed off the satellite, called LignoSat, in late May. The roughly 10-centimetre-long cube is made of magnolia-wood panels and has an aluminium frame, solar panels, circuit boards and sensors. The panels incorporate traditional Japanese wood joinery methods that do not rely on glue or metal fittings.

    Wood might seem counterintuitive for use in space because it is combustible — but that feature can be desirable. To curb the growing problem of space junk threatening spacecraft and space stations, rocket stages and satellites are deliberately plunged into the Earth’s atmosphere to burn up. But during combustion, they release particles of aluminium and other metals. Many more spacecraft launches are planned, and scientists have warned that the environmental effects of this pollution are unknown.

    When LignoSat plunges back to Earth, after six months to a year of service, the magnolia will incinerate completely and release only water vapour and carbon dioxide, says Takao Doi, an astronaut and engineer at Kyoto University, who is part of the research team. He points to other benefits of wood: it’s resilient in the harsh environment of space and does not block radio waves, making it suitable for enclosing an antenna.

    And there is a precedent for spacecraft with wooden parts. Launched in 1962, NASA’s Ranger 3 lunar probe had a balsa-wood casing intended to protect its capsule as it landed on the lunar surface (the probe malfunctioned, missed the Moon and began orbiting the Sun).

    Timber pioneers

    LignoSat will cost about US$191,000 to design, manufacture, launch and operate. Sensors onboard will evaluate strain on the wood, temperature, geomagnetic forces and cosmic radiation, as well as receive and transmit radio signals. The satellite has been handed over to the Japan Aerospace Exploration Agency (JAXA) and will be transferred to the International Space Station in September, before being launched into orbit in November.

    Growth has been slow for the project, which began in 2020 with speculation about the wider potential for wood in space for better sustainability.

    “In our first conversations, Dr Doi proposed we build wooden housing on the Moon,” says team member Koji Murata at the biomaterials-design laboratory at Kyoto University’s Graduate School of Agriculture. “We have also discussed the possibility of building domes on Mars out of wood in order to grow timber forests.”

    Martian and lunar colonists, like all pioneers, would have to make use of local materials — regolith (rocky material on the surface), silicon dioxide and other minerals, in the case of Mars. But wood could play a part in crafting temporary or permanent shelters. Murata points to plans by JAXA and industrial partners to develop shelters made partly of wood that could be used in Antarctica or on the Moon.

    “The natural radiation-shielding properties of wood could be used effectively to design walls or outer shells of space habitats to provide protection,” says Nisa Salim, who specializes in engineered materials at Swinburne University of Technology in Melbourne, Australia, and is not part of the project. “Wood is an effective insulator, capable of regulating temperature and minimizing heat transfer to maintain a comfortable indoor environment. Wood is easy to work with, renewable and biodegradable, aligning with sustainability goals for space exploration.”

    Salim noted that the structural integrity, safety and longevity of wood need to be confirmed in space.

    Wood consists of cellulose held together by lignin, a kind of organic polymer. That makes it a naturally occurring member of the class of materials known as composites, says Scott J McCormack, a materials engineer at the University of California, Davis, who is not involved in the project. Composites are often used in the aerospace industry, so he does not find it surprising that their use in satellites might be explored.

    “Composites are ideal for the aerospace industry — and also satellites — due to their high strength-to-weight ratio,” says McCormack. But he has doubts about how wood will fare as a structural material on the Moon or Mars. “The first concern that comes to mind is galactic cosmic radiation [GCR] and how it might degrade the mechanical properties of wood over time. GCR isn’t that big of problem for us here on Earth, thanks to our atmosphere.”

    But Murata says that the team has studied measurements of GCR and solar energetic particles — high-energy particles that are released from the Sun — taken by NASA’s Curiosity rover on Mars, as well as the effects of gamma rays on wood on Earth. He thinks that wood on Mars could potentially last for thousands of years. “Radiation on Mars is a big problem for living organisms, including humans,” he says. “I don’t think this is going to be much of an issue for wood.”

    [ad_2]

    Source link

  • Brain fluid probed by ultrasound using squishy cubes

    Brain fluid probed by ultrasound using squishy cubes

    [ad_1]

    Many of the debilitating effects of diabetes can by mitigated by monitoring the concentration of glucose in the blood or in the interstitial fluid that surrounds the organs. But doing so continuously requires implantable glucose sensors, and these devices have proved difficult to design, despite 40 years of research1. One reason is that the human body often responds by forming a fibrous shell around the implanted device, which can affect its performance. This reaction is caused mainly by an immune response that occurs because the electronic components of an implanted sensor are much stiffer than the surrounding tissues. Writing in Nature, Tang et al.2 propose a clever method that could help to circumvent this problem, by using a soft biocompatible material that allows the harder components to be positioned on the surface of the skin.

    Competing Interests

    J.J.M. has a financial interest in Applied Biosensors, Inc., Salt Lake City, UT, USA.

    [ad_2]

    Source link

  • China’s Chang’e-6 collects first rock samples from Moon’s far side

    China’s Chang’e-6 collects first rock samples from Moon’s far side

    [ad_1]

    China’s Chang’e-6 robotic Moon-lander has wrapped up two days of drilling into the surface of the far side of the Moon and the ascender has blasted back into space. The spacecraft, with its precious rock samples, is now in lunar orbit, waiting to dock with the orbiter for the trip back home. It is the first time samples have been taken from the far side of the Moon.

    The Chang’e-6 lander made a successful touch-down on the Moon early on Sunday morning (Beijing time) at a pre-selected site within the South Pole-Aitken (SPA) basin, the oldest and largest lunar impact basin. Since then, Chang’e-6 has autonomously deployed its drill and scoop to collect soil and lunar regolith — the rocky material covering the surface of the Moon. Together the samples are expected to weigh up to two kilograms. “The sampling process has gone very smoothly,” says Chunlai Li, the mission’s deputy chief designer at the National Astronomical Observatories in Beijing.

    With the specimens loaded and sealed, the ascender fired its engine at 7:38 am Tuesday morning to lift off from the landing site and reached the designated lunar orbit six minutes later, according to the China National Space Administration (CNSA).

    “China is successfully carrying out complex operations on the lunar far side,” says Jonathan McDowell, an astronomer at the Harvard-Smithsonian Center for Astrophysics in Cambridge, Massachusetts. “The launch of the ascent stage was the first time anyone has taken off from the far side.”

    Captivating basalt

    According to Li, Chang’e-6 precise landing location is 41.63 degrees south and 153.99 degrees west, which means that the samples will mainly consist of basalts — dark-coloured, cooled lava. Similar material has previously been brought back to Earth for analysis from the Moon’s near side.

    The age of the basalts is estimated to be around 2.4 billion years old—much younger than the SPA basin itself, says planetary geologist Alfred McEwen at the University of Arizona, Tuscon. “There should also be fragments of older rocks in the regolith they collected,” McEwen says.

    Scientists hope to use samples returned from the SPA to precisely measure the basin’s age, and improve their understanding of the early history of the Earth and other planets, notes planetary geologist Jim Head at Brown University, in Providence, Rhode Island.

    Regardless of whether this information can be gleaned from the samples, the scientific value of Chang’e-6 samples, if successfully returned, will be very high, he says. They will be the first rocks ever retrieved from the Moon’s far side, which is dramatically different from the near side. “Obtaining dates and compositional information from the many hundreds of fragments sampled by the Chang’e-6 drill and scoop is like a having treasure chest full of critical parts of lunar history, and will very likely revolutionize our view of the entire Moon,” he says.

    Rock then dock

    In the coming days, Chang’e-6 will face one of the trickiest parts of the whole mission — rendezvous and docking of the ascender with the orbiter and transferring the samples, says McDowell. “You have two robots orbiting the Moon separately at 5,900 kilometres per hour, which have to come together and touch each other gently without crashing into each other,” he says.

    The Chang’e-6 samples’ trip home is expected to last about three weeks, ending with a return capsule piercing through Earth’s atmosphere and landing in the grasslands of the Siziwang banner in northern China’s Inner Mongolia autonomous region around 25 June.

    Planetary scientist Michel Blanc at the Research Institute in Astrophysics and Planetology, in Toulouse, France, who watched the launch of Chang’e-6 on Hainan island a month ago and followed the key steps of the mission, says that the scientific impact of the mission cannot be over-emphasized, because it will not only bring the first sample from the lunar far side, but also from one of the lowest-altitude regions of the Moon, where the surface might be closest to the mantle.

    “We planetary scientists are crossing fingers for the success of the rest of the mission,” Blanc says.

    [ad_2]

    Source link

  • Low-latency automotive vision with event cameras

    [ad_1]

    In the first step, we will give a general overview of our hybrid neural network architecture, together with the processing model to generate high-rate object detections (see section ‘Network overview’). Then we will provide more details about the asynchronous GNN (see section ‘Deep asynchronous GNN’) and will discuss the new network blocks that simultaneously push the performance and efficiency of our GNN. Finally, we will describe how our model is used in an asynchronous, event-based processing mode (see section ‘Asynchronous operation’).

    Network overview

    An overview of the network is shown in Extended Data Fig. 1. Our method processes dense images and sparse events (red and blue dots, top left) with a hybrid neural network. A CNN branch FI processes each new image \(I\in {{\mathbb{R}}}^{H\times W\times 3}\) at time tI, producing detection outputs \({{\mathcal{D}}}^{I}\) and intermediate features \({{\mathcal{G}}}^{I}={\{\,{g}_{l}^{I}\}}_{l=1}^{L}\) (blue arrows), where l is the layer index. The GNN branch FE then takes image-based detection outputs, image features and event graphs constructed from raw events \({\mathcal{E}}=\{{e}_{i}| {t}_{I} < {t}_{i} < {t}_{E}\}\) with tI < tE and events ei, as input to generate detections for each time tE. In summary, the detections at time tE are computed as

    $${{\mathcal{D}}}^{I},{{\mathcal{G}}}^{I}={F}_{I}(I)$$

    (1)

    $${{\mathcal{D}}}^{E}={F}_{E}({{\mathcal{D}}}^{I},{{\mathcal{G}}}^{I},{\mathcal{E}}),$$

    (2)

    In normal operation, equation (1) is executed each time a new image arrives and essentially generates feature banks \({{\mathcal{D}}}^{I}\) and \({{\mathcal{G}}}^{I}\) that are then reused in equation (2). As will be seen later, FE, being an asynchronous GNN, can be first trained on full event graphs, and then deployed to consume individual events in an incremental fashion, with low computational complexity and identical output to the batched form. As a result, the above equations describe a high-rate object detector that updates its detections for each new event. In the next section, we will have a closer look at our new GNN, before delving into the full hybrid architecture.

    Deep asynchronous GNN

    Here we propose a new, highly efficient GNN, which we term, deep asynchronous GNN (DAGr). It processes events as spatio-temporal graphs. However, before we can describe it, we first give some preliminaries on how events are converted into graphs.

    Graph construction

    Event cameras have independent pixels that respond asynchronously to changes in logarithmic brightness L. Whenever the magnitude of this change exceeds the contrast threshold C, that pixel triggers an event ei = (xi, ti, pi) characterized by the position xi, timestamp ti with microsecond resolution and polarity (sign) pi {−1, 1} of the change. An event is triggered when

    $${p}_{i}[{\bf{L}}({{\bf{x}}}_{i},{t}_{i})-{\bf{L}}({{\bf{x}}}_{i},{t}_{i}-\Delta {t}_{i})] > C.$$

    (3)

    The event camera thus outputs a sparse stream of events \({\mathcal{E}}={\{{e}_{i}\}}_{i=0}^{N-1}\). As in refs. 31,32,43,44,45, we interpret events as three-dimensional (3D) points, connected by spatio-temporal edges.

    From these points, we construct the event graph \({\mathcal{G}}=\{{\mathcal{V}},E\}\) consisting of nodes \({\mathcal{V}}\) and edges E. Each event ei corresponds to a node. These nodes \({{\bf{n}}}^{i}\in {\mathcal{V}}\) are characterized by their position \({{\bf{n}}}_{{\rm{p}}}^{i}=({\widehat{{\bf{x}}}}_{i},\beta {t}_{i})\in {{\mathbb{R}}}^{3}\) and node features \({{\bf{n}}}_{{\rm{f}}}^{i}={p}_{i}\in {\mathbb{R}}\). Here \({\widehat{{\bf{x}}}}_{i}\) is the event pixel coordinate, normalized by the height and width, and ti and pi are taken from the corresponding event. To map ti into the same range as xi, we rescale it by a factor of β = 10−6. These nodes are connected by edges, (i, j) E, connecting nodes ni and nj, each with edge attributes \({e}_{ij}\in {{\mathbb{R}}}^{{d}_{{\rm{e}}}}\). We connect nodes that are temporally ordered and within a spatio-temporal distance from each other:

    $$(i,j)\in E\,{\rm{if}}\,\parallel {{\bf{n}}}_{{\rm{p}}}^{i}-{{\bf{n}}}_{{\rm{p}}}^{j}{\parallel }_{\infty } < R\,\,{\rm{and}}\,\,{t}_{i} < {t}_{j}.$$

    (4)

    Here  ·  returns the absolute value of the largest component. For each edge, we associate edge features \({e}_{ij}=({{\bf{n}}}_{xy}^{j}-{{\bf{n}}}_{xy}^{i})/2r+1/2\). Here, nxy denote the x and y components of each node, and r is a constant, such that eij [0, 1]2. Constructing the graph in this way gives us several advantages. First, we can leverage the queue-based graph construction method in ref. 32 to implement a highly parallel graph construction algorithm on GPU. Our implementation constructs full event graphs with 50,000 nodes in 1.75 ms and inserts single events in 0.3 ms on a Quadro RTX 4000 laptop GPU. Second, the temporal ordering constraint above makes the event graph directed32,45, which will enable high efficiency in early layers before pooling (see section ‘Asynchronous operation’). In this work, we select R = 0.01 and limit the number of neighbours of each node to 16.

    Deep asynchronous GNN

    In this section, we describe the function FE in equation (2). For simplicity, we first describe it without the fusion terms \({{\mathcal{D}}}^{I}\) and \({{\mathcal{G}}}^{I}\) and describe only how processing is performed on events alone. We later give a complete description, incorporating fusion.

    An overview of our neural network architecture is shown in Extended Data Fig. 1. It processes the spatio-temporal graphs from the previous section and outputs object detection at multiple scales (top right). It consists of five alternating residual layers (Extended Data Fig. 1c) and max pooling blocks (Extended Data Fig. 1d), followed by a YOLOX-inspired detection head at two scales (Extended Data Fig. 1e). Crucially, our network has a total of 13 convolution layers. By contrast, the methods in ref. 32 and ref. 31 feature only five and seven layers, respectively, making our network almost twice as deep as the previous methods. Before each residual layer, we concatenate the x and y coordinates of the node position onto the node feature, which is indicated by +2 at the residual layer input. Residual layers and the detection head use the lookup table-based spline convolutions (LUT-SCs) as the basic building block (Extended Data Fig. 1f). These LUT-SCs are trained as a standard spline convolution31,35 and later deployed as an efficient lookup table (see section ‘Asynchronous operation’).

    Spline convolutions. Spline convolutions, shown in Extended Data Fig. 1f, update the node features by aggregating messages from neighbouring nodes:

    $$\begin{array}{r}{{\bf{n}}}_{{\rm{f}}}^{{\prime} i}=W{{\bf{n}}}_{{\rm{f}}}^{i}+\sum _{(\,j,i)\in E}W({e}_{ij}){{\bf{n}}}_{{\rm{f}}}^{j},\,{\rm{and}}\,{{\bf{n}}}_{{\rm{p}}}^{{\prime} i}={{\bf{n}}}_{{\rm{p}}}^{i}\end{array}.$$

    (5)

    Here \({{\bf{n}}}_{{\rm{f}}}^{{\prime} i}\) is the updated feature at node \({{\bf{n}}}_{i},W\in {{\mathbb{R}}}^{{c}_{{\rm{out}}}\times {c}_{{\rm{in}}}}\) is a matrix that maps the current node feature \({{\bf{n}}}_{{\rm{f}}}^{i}\) to the output, and \(W({e}_{ij})\in {{\mathbb{R}}}^{{c}_{{\rm{out}}}\times {c}_{{\rm{in}}}}\) is a matrix that maps neighbouring node features \({{\bf{n}}}_{{\rm{f}}}^{j}\) to the output. In ref. 35, W(eij) is a matrix-valued smooth function of the edge feature eij. Remember that the edge features eij [0, 1]2, which is the domain of W(eij). The function W(eij) is modelled by a d-order B-spline in m = 2 dimensions with k × k learnable weight matrices equally spaced in [0, 1]2. During the evaluation, the function interpolates between these learnable weights according to the value of eij. In this work, we choose d = 1 and k = 5.

    Max pooling. Max pooling, shown in Extended Data Fig. 1d, splits the input space into gx × gy × gt voxels V and clusters nodes in the same voxel. At the output, each non-empty voxel has a node, located at the rounded mean of the input node positions and with its feature equal to the maximum of the input nodes features.

    $$\begin{array}{c}{{\bf{n}}}_{{\rm{f}}}^{{\prime} i}=\mathop{max}\limits_{{\bf{n}}\in {V}_{i}}\,{{\bf{n}}}_{{\rm{f}}},\,{\rm{a}}{\rm{n}}{\rm{d}}\,{{\bf{n}}}_{{\rm{p}}}^{{\prime} i}=\frac{1}{\alpha }\left[\frac{\alpha }{|{V}_{i}|}\sum _{{\bf{n}}\in {V}_{i}}{{\bf{n}}}_{{\rm{p}}}\right]\end{array}.$$

    (6)

    Here multiplying by \(\alpha ={\left[H,W,\frac{1}{\beta }\right]}^{{\rm{T}}}\) scales the mean to the original resolution. To compute the new edges, it forms a union of all edges connecting the cluster centres and removes duplicates. Formally, the edge set of the output graph after pooling, \({E}_{{\rm{pool}}}^{{\prime} }\), is computed as

    $${E}_{{\rm{pool}}}^{{\prime} }=\{{e}_{{c}_{i}{c}_{j}}| {e}_{ij}\in E\}.$$

    (7)

    Here ci retrieves the index of the voxel in which the node ni resides, and duplicates are removed from the set. This operation can result in bidirectional edges between output nodes if at least one node from voxel A is connected to one of voxel B and vice versa. The combination of max pooling and position rounding has two main benefits: first, it allows the implementation of highly efficient LUT-SC, and second, it enables update pruning, which further reduces computation, discussed in the section ‘Events only’ under ‘Ablations’. For our pooling layers, we select (gx, gy, gt)i = (56/2i, 40/2i, 1), where i is the index of the pooling layer. As seen in this section, selecting gt = 1 is crucial to obtain high performance because it accelerates the information mixing in the network.

    Directed voxel grid pooling. As previously mentioned, the constructed event graph has a temporal ordering, which means that the edges pass only from older to newer nodes. Although this property is conserved in the first few layers of the GNN, after pooling it is lost to a certain extent. This is because edge pooling, described in equation (7), has the potential to generate bidirectional edges (Extended Data Fig. 2d, top). Bidirectional edges are formed when there is at least one edge going from voxel A to voxel B, and one edge going from voxel B to voxel A, such that pooling merges them into one bidirectional edge between A and B. Although bidirectional edges facilitate the distribution of messages throughout the network and thus boost accuracy, they also increase computation during asynchronous operation significantly. This is because bidirectional edges grow the k-hop subgraph that needs to be recomputed at each layer. In this work, we introduce a specialized directed voxel pooling, which instead curbs this growth, by eliminating bidirectional edges from the output, thus creating temporally ordered graphs at all layers. It does this, by redefining the pooling operations. Although feature pooling is the same, position pooling becomes

    $${{\bf{n}}}_{t}^{{\prime} i}=\mathop{max}\limits_{{\bf{n}}\in {V}_{i}}\,{{\bf{n}}}_{t}\,{\rm{a}}{\rm{n}}{\rm{d}}\,{{\bf{n}}}_{xy}^{{\prime} i}=\frac{1}{\alpha }\left[\frac{\alpha }{|{V}_{i}|}\sum _{{\bf{n}}\in {V}_{i}}{{\bf{n}}}_{xy}\right].$$

    (8)

    Here we pool the coordinates x and y using mean pooling and timestamps t with max pooling. We then redefine the edge pooling operation as

    $${E}_{{\rm{dpool}}}^{{\prime} }=\{{e}_{{c}_{i}{c}_{j}}| {e}_{ij}\in E\,\text{and}\,{{\bf{n}}}_{t}^{{c}_{j}} > {{\bf{n}}}_{t}^{{c}_{i}}\},$$

    (9)

    where we now impose that edges between output nodes can exist only if the timestamp of the source node is smaller than that of the destination node. This condition essentially acts as a filter on the total number of pooled edges. As will be discussed later, this pooling layer increases the efficiency, while also affecting the performance. However, we show that when combined with images (see section ‘Images and events’), this pooling layer can manifest both high accuracy and efficiency.

    Detection head. Inspired by the YOLOX detection head, we design a series of (LUT-SC, BN and ReLU) blocks that progressively compute a bounding box regression \({{\bf{f}}}_{{\rm{reg}}}\in {{\mathbb{R}}}^{4}\), class score \({{\bf{f}}}_{{\rm{cls}}}\in {{\mathbb{R}}}^{{n}_{{\rm{cls}}}}\) and object score \({{\bf{f}}}_{{\rm{obj}}}\in {\mathbb{R}}\) for each output node. We then decode the bounding box location as in ref. 34 but relative to the voxel location in which the node resides. This results in a sparse set of output detections.

    Now that all components of the GNN are discussed, we will introduce the fusion strategy that combines the CNN and GNN outputs.

    CNN branch and fusion

    The CNN branch FI (Extended Data Fig. 1) is implemented as a classical CNN, here ResNet30, pretrained on ImageNet46, whereas the GNN has the architecture from the section ‘Deep asynchronous GNN’.

    To generate the image features \({{\mathcal{G}}}^{I}\) used by the GNN, we process the features after each ResBlock with a depthwise convolution. To generate the detection output, we also apply a depthwise convolution to the last two scales of the output before using a standard YOLOX detection head34. We fuse features from the CNN with those from the GNN with sparse directed feature sampling and detection adding.

    Feature sampling

    Our GNN makes use of the intermediate image feature maps \({{\mathcal{G}}}^{I}\) using a feature sampling layer (Extended Data Fig. 1b), which, for each graph node, samples the image feature at that layer at the corresponding node position and concatenates it with the node feature. In summary, at each GNN layer, we update the node features with features derived from \({{\mathcal{G}}}^{I}\) by taking into account the spatial location of nodes in the image plane:

    $${\widehat{g}}_{l}^{i}={g}_{l}^{I}({{\bf{n}}}_{{\rm{p}}}^{i})$$

    (10)

    $${\widehat{{\bf{n}}}}_{{\rm{f}}}^{i}=[{\widehat{g}}_{l}^{i}\parallel {{\bf{n}}}_{{\rm{f}}}^{i}],$$

    (11)

    where \({\widehat{{\bf{n}}}}_{{\rm{f}}}^{i}\) is the updated node feature of node ni. Equation (10) samples image features at each event node location and equation (11) concatenates these features with the existing node features. Note that equations (10) and (11) can be done in an event-by-event fashion.

    Detection adding

    Finally, we add the outputs of the corresponding detection heads of the two branches. We do this before the decoding step34, which applies an exponential map to the regression outputs and sigmoid to the objectness scores. As the outputs of the GNN-based and CNN-based detection heads are sparse and dense, respectively, care must be taken when adding them together. We thus initialize the detections at tE with \({{\mathcal{D}}}^{I}\) and then add the detection outputs of the GNN to the pixels corresponding to the graph nodes. This operation is also compatible with event-by-event updating of the GNN-based detections.

    Detection adding is an essential step to overcome the limitations of event-based object detection in static conditions, because then the RGB-based detector can provide an initial guess even when no events are present. It also guarantees that in the absence of events, the performance of the method is lower bounded by the performance of the image-based detector.

    Training procedure

    Our hybrid method consists of two coupled object detectors that generate detection outputs at two different timestamps: one at the timestamp of the image tI and the other after observing events until time tE (Extended Data Fig. 1). As our labels are collocated with the image frames, this enables us to define a loss in both instances. We found that the following training strategy produced the best results: pretraining the image branch with the image labels first, then freezing the weights and training the depthwise convolutions and DAGr branch separately on the event labels.

    As both branches are trained to predict detections separately, the DAGr network essentially learns to update the detections made by the image branch. This means that DAGr learns to track, detect and forget objects from the previous view.

    Asynchronous operation

    As in refs. 31,32,36, after training, we deploy our hybrid neural network in an asynchronous mode, in which instead of feeding full event graphs, we input only individual events. Local recursive update rules are formulated at each layer that enforces that the output of the network for each new event is identical to that of the augmented graph that includes the old graph and the new event. As seen in refs. 31,32,36, the rules update only a fraction of the activations at each layer, leading to a drastic reduction in computation compared with a dense forward pass. In this section, we will describe the steps that are taken after training to perform asynchronous processing.

    Initialization

    The conversion to asynchronous mode happens in three steps: (1) precomputing the image features; (2) LUT-SC caching and batch norm fusing; and (3) network activation initialization.

    As a first step, when we get an image, we precompute the image features by running a forward pass through the CNN and applying the depthwise convolutions. This results in the image feature banks \({{\mathcal{G}}}^{I}\) and detections \({{\mathcal{D}}}^{I}\).

    In the second step (LUT-SC caching), spline convolutions generate the highest computational burden in our method because they involve evaluating a multivariate, matrix-valued function and performing a matrix–vector multiplication. Following the implementation in ref. 35, computing a single message between neighbours requires

    $${C}_{{\rm{msg}}}=(2{[d+1]}^{m}-1){c}_{{\rm{in}}}{c}_{{\rm{out}}}+(2{c}_{{\rm{in}}}-1){c}_{{\rm{out}}},$$

    (12)

    floating point operations (FLOPS), in which the first term computes the interpolation of the weight matrix and the second computes the matrix–vector product. Here the first term dominates because of the highly superlinear dependence on d and m. Our LUT-SC eliminates this term. We recognize that the edge attributes eij depend only on the relative spatial node positions. As events are triggered on a grid, and the distance between neighbours is bounded, these edge attributes can only take on a finite number of possible values. Therefore, instead of recomputing the interpolated weight at each step, we can precompute all weight matrices once and store them in a lookup table. This table stores the relative offsets of nodes together with their weight matrix. We thus replace the message propagation equation with

    $${{\bf{n}}}_{{\rm{f}}}^{{\prime} i}=W{{\bf{n}}}_{{\rm{f}}}^{i}+\sum _{(j,i)\in E}{W}_{ij}{{\bf{n}}}_{{\rm{f}}}^{j}$$

    (13)

    $${W}_{ij}={\rm{LUT}}({\rm{d}}x,{\rm{d}}y),$$

    (14)

    where dx and dy are the relative two-dimensional (2D) positions of nodes i and j. Note that this transformation reduces the complexity of our convolution operation to Cmsg = (2cin − 1)cout, which is on the level of the classical graph convolution (GC) used in ref. 32. However, crucially, LUT-SC still retains the relative spatial awareness of spline convolutions, as Wij change with the relative position and is thus more expressive than GCs. After caching, we fuse the weights computed above with the batch norm layer immediately following each convolution, thereby eliminating its computation from the tally. After pooling, ordinarily, node positions would not have the property that they lie on a grid anymore, as their coordinates get set to the centroid location. However, because we apply position rounding, we can apply LUT-SC caching in all layers of the network.

    In the third step (network activation initialization), before asynchronous processing, we pass a dense graph through our network and cache the intermediate activations at each layer. Although in convolution layers we cache the activation, that is, the results of sums computed from equation (13), in max pooling layers we cache (1) the indices of input nodes used to compute the output feature for each voxel; (2) a list of currently occupied output voxels; and (3) a partial sum of node positions and node counts per voxel to efficiently update output node positions after pooling.

    Update propagation

    When a new event is inserted, we compute updates to all relevant nodes in all layers of the network. The goal of these updates is to achieve an output identical to the output the network would have computed if the complete graph with one event added was processed from scratch. The updates include (1) adding and recomputing messages in equation (13) if a node position or feature has changed; (2) recomputing the maximum and node position for output nodes after each max pooling layer; and (3) adding and flipping edges when new edges are formed at the input. We will discuss these updates in the following sections. To facilitate computation, at each layer, we maintain a running list of unchanged nodes (grey) and changed nodes (cyan) and whether their position has changed, the feature has changed or both. The propagation rules are outlined in Extended Data Fig. 2.

    Convolution layers. In a convolution layer (Extended Data Fig. 2a), if the node has a different position (Extended Data Fig. 2a, top), we recompute that feature of the node and resend a message from that node to all its neighbours. These are marked as green and orange arrows in Extended Data Fig. 2a (top row). If instead, only the feature of the node changed (Extended Data Fig. 2a, bottom), we update only the messages sent from that node to its neighbours. We can gain an intuition for these rules from equation (13). A change in the node feature nf changes only one term in the sum that has to be recomputed. Instead, a node position change causes all weight matrices Wij to change, resulting in a recomputation of the entire sum.

    Pooling layers. Pooling layers update only output nodes for which at least one input node has a changed feature or changed position. For these output nodes, the position and feature are recomputed using equation (6). Special care must be taken when using directed voxel pooling layers. Sometimes it can happen that an edge at the output of this layer needs to be inverted such that temporal ordering is conserved. In this case, the next convolution layer must compute two messages (Extended Data Fig. 2e), one to undo the first message and the other corresponding to the new edge direction. In this case, two nodes are changed instead of only one. However, edge inversion happens rarely and thus does not contribute markedly to computation.

    Reducing computation

    In this section, we describe various considerations and algorithms for reducing the computation of the two basic layers described above.

    Directed event graph. As previously discussed, using a directed event graph notably reduces computation, as it reduces the number of nodes that need to be updated at each layer. We illustrate this concept in Extended Data Fig. 2c, in which we compare update propagation in graphs that are directed or possess bidirectional edges. Note that we encounter directed graphs either at the input layer (before the first pooling) or after directed voxel pooling layers. Instead, graphs with bidirectional edges are encountered after regular voxel pooling layers. As seen in Extended Data Fig. 2c (top), directed graphs keep the number of messages that need to be updated in each layer constant, as no additional nodes are updated at any layer. Instead, bidirectional edges send new messages to previously untouched nodes, leading to a proliferation of update messages, and as a result, computation.

    Update pruning. Even when input nodes to a voxel pooling layer change, the output position and feature may stay the same, even after recomputation. If this is the case, we simply terminate propagation at that node, called update pruning, and thus save significantly in terms of computation. We show this phenomenon in Extended Data Fig. 2b. This can happen when (1) the rounding operation in equation (6) simply rounds a slightly updated position to the same position as before; and (2) the maximal features at the output belong to input nodes that have not been updated. Let us state the second condition more formally. Let \({{\bf{n}}}_{f,j}^{{\prime} i}\) be the jth entry of the feature vector belonging to the ith output node. Now let

    $${{\bf{n}}}^{{k}_{j}^{i}}={\rm{\arg }}\,\mathop{{{\max }}}\limits_{{\bf{n}}\in {V}_{i}}\,{{\bf{n}}}_{{\rm{f}},j}$$

    (15)

    be the input node for which the feature nf,j at the jth position is maximal. The index \({k}_{j}^{i}\) selects this node from the voxel Vi. Thus, we may rewrite the equation for max pooling for each component as

    $${{\bf{n}}}_{{\rm{f}},j}^{{\prime} i}={{\bf{n}}}_{{\rm{f}},j}^{{k}_{j}^{i}}.$$

    (16)

    This means that essentially, only a subset of input nodes in the voxel contributes to the output, and this subset is exactly

    $${{\mathcal{P}}}_{i}=\{{{\bf{n}}}^{{k}_{j}^{i}}| \,j=0,…,c-1\}\subset {V}_{i}.$$

    (17)

    Moreover, as these nodes are indexed by j, and the \({k}_{j}^{i}\) could repeat, we know that the size of this subset satisfies \(| {{\mathcal{P}}}_{i}| \le c\), where c is the number of features. We thus find that output features do not change if none of the changed inputs nodes to a given output node are within the set \({{\mathcal{P}}}_{i}\).

    Thus, for each output node, we check the following conditions to see if update pruning can be performed. For all input nodes that have a changed position or feature, we check if (1) the changed node is currently in the set of unused nodes (greyed out in Extended Data Fig. 2b); (2) the changed feature of the node does not beat the current maximum at any feature index; and (3) its position change did not deflect the average output node position sufficiently to change rounding. If not all three conditions are met, we recompute the output feature for that node, otherwise, we prune the update and skip the computation in the lower layers. Skipping happens surprisingly often. In our case, we found that 73% of updates are skipped because of this mechanism. This also motivated us to place the max pooling layer in the early layers, as it has the highest potential to save computation. In a later section, we will show the impact these features have on the computation of the method.

    Simplification of concatenation operation. During feature fusion in the hybrid network, owing to the concatenation of node-level features with image features (equation (11)), the number of intermediate features at the input to each layer of the GNN increases. This would essentially increase the computation of these layers. However, we apply a simplification, which significantly reduces this additional cost. Note that from equation (13) the output of the layer after the concatenation becomes

    $$\begin{array}{c}{{\bf{n}}}_{{\rm{f}}}^{{\prime} i}=\,W{\hat{{\bf{n}}}}_{{\rm{f}}}^{i}+\sum _{(\,j,i)\in E}{W}_{ij}{\hat{{\bf{n}}}}_{{\rm{f}}}^{j}\\ \,=\,W[\,{\hat{g}}_{l}^{i}\parallel {{\bf{n}}}_{{\rm{f}}}^{i}]+\sum _{(\,j,i)\in E}{W}_{ij}[\,{\hat{g}}_{l}^{j}\parallel {{\bf{n}}}_{{\rm{f}}}^{j}]\\ \,=\,{W}^{g}{\hat{g}}_{l}^{i}+{W}^{{\rm{f}}}{{\bf{n}}}_{{\rm{f}}}^{i}+\sum _{(j,i)\in E}{W}_{ij}^{g}{\hat{g}}_{l}^{j}+\sum _{(j,i)\in E}{W}_{ij}^{{\rm{f}}}{{\bf{n}}}_{{\rm{f}}}^{j}\\ \,=\,\mathop{\underbrace{{W}^{g}{\hat{g}}_{l}^{i}+\sum _{(j,i)\in E}{W}_{ij}^{g}{\hat{g}}_{l}^{j}}}\limits_{{\rm{a}}{\rm{f}}{\rm{f}}{\rm{e}}{\rm{c}}{\rm{t}}{\rm{e}}{\rm{d}}\,{\rm{b}}{\rm{y}}\,{{\bf{n}}}_{{\rm{p}}}\,{\rm{c}}{\rm{h}}{\rm{a}}{\rm{n}}{\rm{g}}{\rm{e}}}\,\,+\mathop{\underbrace{{W}^{{\rm{f}}}{{\bf{n}}}_{{\rm{f}}}^{i}+\sum _{(j,i)\in E}{W}_{ij}^{{\rm{f}}}{{\bf{n}}}_{{\rm{f}}}^{j}.}}\limits_{{\rm{a}}{\rm{f}}{\rm{f}}{\rm{e}}{\rm{c}}{\rm{t}}{\rm{e}}{\rm{d}}\,{\rm{b}}{\rm{y}}\,{{\bf{n}}}_{{\rm{p}}}\,{\rm{a}}{\rm{n}}{\rm{d}}\,{{\bf{n}}}_{{\rm{f}}}\,{\rm{c}}{\rm{h}}{\rm{a}}{\rm{n}}{\rm{g}}{\rm{e}}}\end{array}$$

    (18)

    In the equation above, we made use of the fact that weight matrix \({W}_{ij}=[{W}_{ij}^{g}\parallel {W}_{ij}^{{\rm{f}}}]\) and thus multiplication results in the sum of products \({W}_{ij}^{g}{\widehat{g}}_{l}^{i}\) and \({W}_{ij}^{{\rm{f}}}{{\bf{n}}}_{{\rm{f}}}\). Note that this simplification does not imply that a similar operation could be performed with a pure depthwise convolution and addition of features, as the weight matrices Wij change for each neighbour. During asynchronous operation, the terms on the left need to be recomputed when there is a node position change, and the terms on the right need to be recomputed when there is a node position or node feature change. At most one node experiences a node position change in each layer, and thus the terms on the left do not need to be recomputed often.

    Datasets

    Purely event-based datasets

    We evaluate our method on the N-Caltech101 detection42, and the Gen1 Detection Dataset41. N-Caltech101 consists of recording by a DAVIS240 (ref. 17) undergoing a saccadic motion in front of a projector, projecting samples of Caltech101 (ref. 47) on a wall. In post-processing, bounding boxes around the visible boxes were hand placed. The Gen1 Detection Dataset is a more challenging, large-scale dataset targeting an automotive setting. It was recorded with an ATIS sensor48 with a resolution of 304 × 240, two classes, 228,123 annotated cars and 27,658 annotated pedestrians. As in ref. 19, we remove bounding boxes with diagonals below 30 and sides below 20 pixels from Gen1.

    Event- and image-based dataset

    We curate a multimodal dataset for object detection by using the DSEC40 dataset, which we term DSEC-Detection. A preview of the dataset can be seen in Extended Data Fig. 6a.

    It features data collected from a stereo pair of Prophesee Gen3 event cameras and FLIR Blackfly S global shutter RGB cameras recording at 20 fps. We select the left event camera and left RGB camera and align the RGB images with the distorted event camera frame by infinite depth alignment. Essentially, we first undistort the camera image, then rotate it into the same orientation as the event camera and then distort the image. The resulting image features only a maximal disparity of roughly 6 pixels for close objects at the edges of the image plane owing to the small baseline (4.5 cm). As object detection is not a precise per-pixel task, this kind of misalignment is sufficient for sensor fusion.

    To create labels, we use the QDTrack49,50 multiobject tracker to annotate the RGB images, followed by a manual inspection and removal of false detections and tracks. Using this method, we annotate the official training and test sets of DSEC40. Moreover, we label several sequences for the validation set and one complex sequence with pedestrians for the test set. We do this because the original dataset split was chosen to minimize the number of moving objects. However, this excludes cluttered scenes with pedestrians and moving cars. By including these additional sequences, we thus also address more complex and dynamic scenes. A detailed breakdown and comparison of the number of classes, instances per class and the number of samples are given in Extended Data Fig. 6b. Our dataset is the only one to feature images and events and consider semantic classes, to the best of our knowledge. By contrast, refs. 19,41 have only events, and ref. 51 considers only moving objects, that is, does not provide class information, or omits stationary objects.

    Statistics of edge cases

    We compute the percentage of edge cases for the DSEC-Detection dataset. We will define an edge case as an image that contains at least one appearing or disappearing object, which presumably would be missed by using a purely image-based algorithm. We found that this proportion is 31% of the training set and 30% of the test set. Moreover, we counted the number of objects that suddenly appear or disappear. We found that in the training set, 4.2% of objects disappear and 4.2% appear, whereas in the test set, 3.5% appear and 3.5% disappear.

    Comments on time synchronization

    Events and frames were hardware synchronized by an external computer that sent trigger signals simultaneously to the image and event sensor. While the image sensor would capture an image with a fixed exposure on triggering, the event camera would record a special event that exactly marked the time of triggering. We assign the timestamp of this event (and half an exposure time) to the image. We found that this synchronization accuracy was of the order of 78 μs, which we determined by measuring the mean squared deviation of the frame timestamps from a nominal 50,000 μs. More details can be found in ref. 40.

    Comments on network and event transport latencies

    As discussed earlier, we estimate the mean synchronization error of the order of 78 μs with hardware synchronization. Moreover, in a real-time system, the event camera will experience event transport delays that are split into a maximal sensor latency, MIPI to USB transfer latency and a USB to computer transfer latency, as discussed in ref. 52. For the Gen3 sensor, the sum of all worst-case latencies can be as low as 6 ms. It can be further reduced by using directly an MIPI interface in which case this latency reduces to 4 ms. However, this worst-case delay is achieved only during static scenarios, in which there is an exceptionally low event rate such that MIPI packets are not filled sufficiently. However, this case is rarely achieved because of the presence of sensor noise and also does not affect dynamic scenarios with high event rates. More details can be found in ref. 53. Finally, note that although all three latencies would affect a closed-loop system, our work is evaluated in an open loop and thus does not experience these latencies, or synchronization errors due to these latencies.

    In view of integrating our method into a multi-sensor system, which uses the network-based time synchronization standard IEEE1588v2, we analyse how the method performs when small synchronization errors between images and events are present. To test this, we introduce a fixed time delay Δtd [−20, 20] ms between the event and image stream. Note that for a given stimulus a delay of Δtd < 0 denotes that events arrive earlier than images, whereas Δtd > 0 denotes that events arrive later than images. We report the performance of DAGr-S + ResNet-50 on the DSEC-Detection test set in Extended Data Fig. 3b. As can be seen, our method is robust to synchronization errors up to 20 ms, suffering only a maximal performance decrease of 0.5 mAP. Making our method more robust to such errors remains the topic of further work.

    Comment on event-to-image alignment

    Throughout the dataset, event-to-image misalignment is small and never exceeds 6 pixels, and this is further supported by visual inspection of Extended Data Fig. 6a. Nonetheless, we characterize the accuracy that a hypothetical decision-making system would have if worst-case errors were considered. Consider a decision-making system that relies on accurate and low-latency positioning of actors such as cars and pedestrians. This system could use the proposed object detector (using the small-baseline stereo setup with an event and image camera) as well as a state-of-the-art event camera-based stereo depth method54 (using the wide-baseline stereo event camera setup) to map a conservative region around a proposed detection. This system would still have a low latency and provide a low depth uncertainty because of a low disparity error of 1.2–1.3 pixels, characterized on DSEC in ref. 40.

    We can calculate the depth uncertainty due to the stereo system as \({\sigma }_{D}=\frac{{D}^{2}}{f{b}_{{\rm{w}}}}{\sigma }_{{\rm{d}}}\). With a maximal disparity uncertainty σd = 1.3 pixels, the depth D at 3 m, the focal length at f = 581 pixels and the event camera to event camera baseline at bw = 50 cm. This results in a depth uncertainty of σD = 4 cm. Likewise, the lateral positioning uncertainty (due to shifted events) is \({\sigma }_{l}=\frac{D}{f}{\sigma }_{{\rm{d}}}\).

    For lateral positioning, we can assume a disparity error that is bounded by the misalignment between events and frames, which is \({\sigma }_{{\rm{d}}} < \frac{f{b}_{{\rm{s}}}}{D}\) where bs = 4.5 cm is the small baseline between the event and image camera. Inserting this uncertainty, the resulting lateral uncertainty is bounded by \({\sigma }_{{\rm{p}}}=\frac{D}{f}{\sigma }_{{\rm{d}}} < \frac{D}{f}\frac{f{b}_{{\rm{s}}}}{D}={b}_{{\rm{s}}}\), which means σp < 4.5 cm. These numbers are well within the tolerance limits of automotive systems that typically expect a 3% of distance to target uncertainty, which for 3 m would be 9 cm. Moreover, this lies within the tolerance limit of the current agent-forecasting methods10,11,12 that are currently finding their way into commercial patents13, in which we see displacement errors in prediction of the order of 0.6 m, more than one order of magnitude higher than the worst-case error of our system.

    Finally, we argue that despite the misalignment, our object detector learns to implicitly realign events to the image frame because of the training setup. As the network is trained with object detection labels that are aligned with the image frame, and slightly misaligned events, the network learns to implicitly realign the events to compensate for the misalignment. As the misalignment is small, this is simple to learn. To test this hypothesis, we used the LiDAR scans in DSEC to align the object detection labels with the event stream, that is, in the frame it was not trained for, and observed a performance drop from 41.87 mAP to 41.8 mAP. First, the slight performance drop indicates that we are moving the detection labels slightly out of distribution, thus confirming that the network learns to implicitly apply a correction alignment. Second, the small magnitude of the change highlights that the misalignment is small.

    Ground truth generation for inter-frame detection

    To evaluate our method between consecutive frames, we generate ground truth as follows. We generate ground truth for multiple temporal offsets \(\frac{i}{n}\Delta t\) with n = 10 and i = 0, …, 10 and Δt = tE − tI = 50 ms. We then remove the samples from our dataset in which two consecutive images do not share the same object tracks and generate inter-frame labels by linearly interpolating the position (x and y coordinates of the top left bounding box corner) and size (height and width) of each object. We then aggregate detection evaluations at the same temporal offset across the dataset.

    Comment on approximation errors due to linear interpolation

    To measure the inter-frame detection performance of our method, we use linear interpolation between consecutive frames to generate ground truth. Although this linear interpolation affects ground truth accuracy within the interval because of interpolation errors, at the frame borders, that is, t = 0 ms and t = 50 ms, no approximation is made. Still, we verify the accuracy of the ground truth by evaluating our method for different interpolation methods. We focus on the subset that has object tracks that have a length of at least four and then apply cubic and linear interpolation of object tracks on the interval between the second and third frames. We report the results in Extended Data Fig. 3a. We see that the performance of our method deviates at most 0.2 mAP between linear and cubic interpolations. Although there is a small difference, we focus on using linear interpolation, as it allows us to use a larger subset of the test set for inter-frame object detection.

    Training details

    On Gen1 and N-Caltech101, we use the AdamW optimizer55 with a learning rate of 0.01 and weight decay of 10−5. We train each model for 150,000 iterations with a batch size of 64. We randomly crop the events to 75% of the full resolution and randomly translate them by up to 10% of the full resolution. We use the YOLOX loss34, which includes an IOU loss, class loss and a regression loss, discussed in ref. 34. To stabilize training, we also use exponential model averaging56.

    On DSEC-Detection, we train with a batch size of 32, the learning rate of 2 × 10−4 for 800 epochs using the AdamW optimizer55, as before. Apart from the data augmentations described before, we now also use random horizontal flipping with a probability of 0.5 and random magnification with a scale \(s \sim {\mathcal{U}}(1,1.5)\). We train the network to predict with one image and 50 ms of events leading up to the next image, corresponding to the frequency of labels (20 Hz).

    Baselines

    In the purely event-based setting, we compare with the following state-of-the-art methods.

    Dense recurrent methods

    In this category, RED (ref. 19) and ASTM-Net (ref. 28) are the state-of-the-art methods, and they feature recurrent architectures. We also include MatrixLSTM + YOLOv3 (ref. 29) that features a recurrent, learnable representation and a YOLOv3 detection head.

    Dense feedforward methods

    Reference 28 provides the results on Gen1 for the dense feedforward methods, which we term Events + RRC (ref. 38), Inception + SDD (ref. 26) and Events + YOLOv3 (ref. 27). These methods use dense event representations with the RRC, SSD or YOLOv3 detection head.

    Spiking methods

    We compare with the spiking network Spiking DenseNet (ref. 39), which uses an SSD detection head.

    Asynchronous methods

    Here we compare with the state-of-the-art methods AEGNN (ref. 31) and NVS-S (ref. 32), both graph-based, AsyNet (ref. 36), which uses submanifold sparse convolutions57, and YOLE (ref. 58), which uses an asynchronous CNN. All of these methods deploy their networks in an asynchronous mode during testing.

    As implementation details are not available for Events + RRC (ref. 38), Inception + SDD (ref. 26) and Events + YOLOv3 (ref. 27), MatrixLSTM + YOLOv3 (ref. 29) and ASTM-Net (ref. 28), we find a lower bound on the per-event computation necessary to update their network based on the complexity of their detection backbone. Whereas for Events + YOLOv3 and MatrixLSTM + YOLOv3 we use the DarkNet-53 backbone, for ASTM-Net and Events + RRC, we use the VGG11 backbone, and for Inception + SDD the Inception v.2 backbone. As Spiking DenseNet uses spike-based computation, we do not report FLOPS because they are undefined and mark that entry with N/A.

    Hybrid methods

    In the event- and image-based setting, we additionally compare with an event- and frame-based baseline, which we term Events + YOLOX. It takes in concatenated images and event histograms59 from events up to time t and generates detections for time t.

    Image-based methods

    We compare with YOLOX (ref. 34). As YOLOX provides only detections at frame time, we present a variation that can provide detections in the blind time between the frames, using either constant or linear extrapolation of detections extracted at frame time. Whereas for constant extrapolation we simply keep object positions constant over time, for linear extrapolation we use detections in the past and current frames to fit a linear motion model on the position, height and width of the object. As YOLOX is an object detector, we need to establish associations between the past and current objects. We did this as follows: for each object in the current frame, we selected the object of the same class in the previous frame with the highest IOU overlap and used it to fit a linear function on the bounding box parameters (height, width, x position and y position). If no match was found (that is, all IOUs were 0 for the selected object), it was not extrapolated but instead kept constant.

    Finally, we compare the bandwidth and latency requirements of the Prophesee Gen3 camera with those of a set of automotive cameras, which are summarized in Extended Data Table 2. We also illustrate the concept of bandwidth–latency trade-off in Fig. 1a. The bandwidth–latency trade-off, discussed in ref. 60, states that cameras such as the automotive cameras in Extended Data Table 2 cannot simultaneously achieve low bandwidth and low latency because of the reliance of a frame rate. By contrast, the Prophesee Gen3 camera can minimize both because it is an asynchronous sensor.

    Related work

    Dense neural network-based methods

    Since the introduction of powerful object detectors in classical image-based computer vision, such as R-CNN (refs. 61,62,63), SSD (ref. 64) and the YOLO series34,65,66, and the widespread adoption of these methods in automotive settings, event-based object detection research has focused on leveraging the available models on dense, image-like event representations19,26,27,28,29,38. This approach enables the use of pretraining, and well-established architecture designs and loss functions, while maintaining the advantages of events, such as their high dynamic range, and negligible motion blur. Most recent examples of these methods include RED (ref. 19) and ASTM-Net (ref. 28), which operate recurrently on events and have shown high performance on detection tasks in automotive settings. However, owing to the nature of their method, these approaches necessarily need to convert events into dense frames. This invariably sacrifices the efficiency and high temporal resolution present in the events, which are important in many application scenarios such as low-power, always-on surveillance67,68 and low-latency, low-power object detection and avoidance3,69.

    Geometric learning methods

    As a result, a parallel line of research has emerged that tries to reintroduce sparsity into the present models by adopting either spiking neural network architectures39 or geometric learning approaches31,36. Of these, spiking neural networks are capable of processing raw events asynchronously and are thus closest in spirit to the event-based data. However, these architectures lack efficient learning rules and thus do not yet scale to complex tasks and datasets42,70,71,72,73,74. Recently, geometric learning approaches have filled this gap. These approaches treat events as spatio-temporal point clouds75, submanifolds36 or graphs31,32,43,76 and process them with specialized neural networks. Particular instances of these methods that have found use in large-scale point-cloud processing are PointNet++ (ref. 77) and Flex-Conv (ref. 78). These methods retain the spatio-temporal sparsity in the events and can be implemented recursively, in which single-event insertions are highly efficient.

    Asynchronous GNNs

    Of the geometric learning methods, processing events with GNNs is found to be most scalable, achieving high performance on complex tasks such as object recognition32,43,44, object detection31 and motion segmentation45. Recently, a line of research31,32 has focused on converting these GNNs, once trained, into asynchronous models. These models can process in an event-by-event fashion while maintaining low computational complexity and generating an identical output to feedforward GNNs. They do so, by efficiently inserting events into the event graph32, and then propagating the changes to lower layers, for which at each layer only a subset of nodes needs to be recomputed. However, these works are limited in three main aspects. First, they work only at a per node level, meaning that they flag nodes that have changed and then recompute the messages to recompute the feature of each node. This incurs redundant computation because effectively only a subset of messages passing to each changed node need to be recomputed. Second, they do not consider update pruning, which means that when node features do not change at a layer, they simply treat them as changed nodes, leading to additional computation. Finally, the number of changed nodes increases as the layer depth increases, meaning that these architectures work efficiently only for shallow neural networks, limiting the depth of the network.

    In this work, we address all three limitations. First, we pass updates on a per-message level, that is, we recompute only messages that have changed. Second, we apply update pruning and explore a specialized network architecture that maximizes this effect by placing the max pooling layer early in the network. By modulating the number of output features of this layer, we can control the amount of pruning that takes place. Finally, we also apply a specialized LUT-SC that cuts the computation markedly. With the reduced computational complexity, we are able to design two times deeper architectures, which markedly boosts the network accuracy.

    Hybrid methods

    One of the reasons for the lower performance of event-based detectors also lies in the properties of the sensor itself. Although possessing the capability to detect objects fast and in high-speed and high-dynamic-range conditions, the lack of explicit texture information in the event stream prevents the networks from extracting rich semantic cues. For this reason, several methods have combined events and frames for moving-object detections79, tracking80, computational photography22,81,82 and monocular depth estimation40. However, these are usually based on dense feedforward networks and simple event and image concatenation22,82,83,84 or multi-branch feature fusion40,83. As events are treated as dense frames, these methods suffer from the same drawbacks as standard dense methods. In this work, we combine events and frames in a sparse way without sacrificing the low computational complexity of event-by-event processing. This is, to our knowledge, the first paper to address asynchronous processing in a hybrid network.

    Ablations

    Events only

    Here we motivate the use of the features of our method. We split our ablation studies into two parts: those targeting the efficiency (Extended Data Fig. 4d) and those targeting the accuracy (Extended Data Fig. 4e) of the method. For all experiments, we use the model shown in Extended Data Fig. 1 without the image branch as a baseline and report the standard object detection score of mAP (higher is better)85 on the validation set of the Gen1 dataset41 as well as the computation necessary to process a single event in terms of floating point operations per event (FLOPS per event, lower is better).

    Ablations on efficiency. Key building blocks of our method are LUT-SCs, which are an accelerated version of standard spline convolutions35. An enabling factor for using LUT-SCs lies in transitioning from 2D to 3D convolutions, which we investigate by training a model with 3D spline convolutions (Extended Data Fig. 4d, row 1). With an mAP of 31.84, it achieves a 0.05 lower mAP than our baseline (bottom row). Using 3D convolutions yields a slight decrease in accuracy and does not allow us to perform an efficient lookup, yielding 150.87 MFLOPS per new event. Using 2D convolutions (row 2) reduces the computation to 79.6 MFLOPS per event because of the dependence on the dimension d in equation (12), which is further reduced to 17.3 MFLOPS per event after implementing LUT-SCs (row 3). In addition to the small increase in performance due to 2D convolutions, we gain a factor of 8.7 in terms of FLOPS per event.

    Next, we investigate pruning. We recompute the FLOPS of the previous model by terminating update propagation after max pooling layers, shown in Extended Data Fig. 2b, and reported in Extended Data Fig. 4d (row 4). We find that this reduces the computational complexity from 17.3 to 16.3 MFLOPS per event. This reduction comes from removing the orange messages in Extended Data Fig. 2a (bottom). Implementing node position rounding in equation (6) (Extended Data Fig. 4d, row 5), enables us to fully prune updates. This method only requires 4.58 MFLOPS per event. Node position rounding reduces mAP only by 0.01, justifying its use.

    In a final step, we also investigate the use of directed pooling, shown in Extended Data Fig. 2d. Owing to this pooling method, fewer edges are present after each pooling layer, thus restricting the message passing—that is, context aggregation abilities of our network. For this reason, it achieves only an mAP of 18.35. However, owing to the directedness of the graph, in each layer at most only one node needs to be updated (except for rare edge inversions), as shown in Extended Data Fig. 2c, leading to an overall computational complexity of only 0.31 MFLOPS per event. Owing to the lower performance, we instead use the previous method when comparing with the state-of-the-art methods. However, as will be seen later, the performance is affected to a much lesser degree when combined with images.

    Ablations on accuracy. We found that three features of our network had a marked impact on performance. First, we applied early temporal aggregation, that is, using gt = 1, which sped up training and led to higher accuracy. We trained another model that pooled the temporal dimension more gradually by setting gt = 8/2i, where i is the index of the pooling layer. This model reached only an mAP of 21.2 (Extended Data Fig. 4e, row 3), after reducing the learning rate to 0.002 to enable stable training. This highlights that early pooling plays an important part because it improves our result by 10.6 mAP. We believe that it is important for mixing features quickly so that they can be used in lower layers.

    Next, we investigate the importance of network depth on task performance. To see this, we trained another network, in which we removed the skip connection and second (LUT-SC and BN) block from the layer in Extended Data Fig. 1c, which resulted in a network with a total of eight layers, on par with the network in ref. 31, which had seven layers. We see that this network achieves only an mAP of 22.5 (Extended Data Fig. 4e, row 2) highlighting the fact that 9.4% in mAP is explained by a deeper network architecture. We also combine this ablation with the previous one about early pooling and see that the network achieves only 15.8 mAP, another drop of 6.7% mAP (Extended Data Fig. 4e, row 1). This result is on par with the result in ref. 31, which achieved a performance of 16.3 mAP, on par with our method. This highlights the importance of using a deep neural network to boost performance.

    Finally, we investigate using multiple layers before the max pooling layer. We train another model that only has a single-input layer, replacing the layer in Extended Data Fig. 1 with a (LUT-SC, BN and ReLU) block. This yielded a performance of 30.0 mAP (Extended Data Fig. 4e, row 4), which is 1.8 mAP lower than the baseline (Extended Data Fig. 4e, row 5). The computational complexity is only marginally lower, which is explained by Extended Data Fig. 2c (top). We see that adding layers at the input generates only a few additional messages. This highlights the benefits of using a directed event graph.

    Timing experiments. We compare the time it takes for our dense GNN to process a batch of 50,000 events averaged over Gen1, and compare it with our asynchronous implementation on a Quadro RTX 4000 laptop GPU. We found that our dense network takes 30.8 ms, whereas the asynchronous method requires 8.46 ms, a 3.7-fold reduction. We believe that with further optimizations, and when deployed on potentially spiking hardware, this method can reduce power and latency by additional factors.

    Max pooling

    In this section, we take a closer look at the pruning mechanism. We find that almost all pruning happens in the very first max pooling layer. This motivates the placement of the pooling layer at the early stages of the network, which allows us to skip most computations when pruning happens. Also, as the subgraph is still small in the early layers, it is easy to prune the entire update tree. We interpret this case as event filtering and investigate this filter in Extended Data Fig. 4.

    When applied to raw events (Extended Data Fig. 4a), we obtain filtered events (Extended Data Fig. 4b), that is, events that passed through the first max pooling layer. We observe that max pooling makes the events more uniformly distributed over the image plane. This is also supported by the density plot in Extended Data Fig. 4b, which shows that the distribution of the number of events per-pixel shifts to the left after filtering, removing events in regions in which there are too many. This behaviour can be explained by the pigeon-hole principle when applied to max pooling layers. Max pooling usually uses only a fraction of its input nodes to compute the output feature. The number of input nodes used by the max pooling layer is upper bounded by its output channel dimension, cout, because it could at maximum use only one feature from each input node. As a result, max pooling selects at most cout nodes for each voxel, resulting in more uniformly sampled events.

    To study the effect of the output channel dimension on filtering, we train four models with cout {8, 16, 24, 32}, in which our baseline model had cout = 16. We report the mAP, MFLOPS per event and fraction of events after filtering, ϕ averaged over Gen1, in Extended Data Fig. 4c. As predicted, we find that increasing cout increases mAP, MFLOPS and ϕ. However, the increase happens at different rates. While MFLOPS and ϕ grow roughly linearly, mAP growth slows down significantly after c = 24. Interestingly, by selecting cout = 8 we still achieve an mAP of 30.6, while using only 21% of events. This type of filtering has interesting implications for future work. An interesting question would be whether events that are not pruned carry salient and interpretable information.

    Images and events

    In this section, we ablate the importance of different design choices when combining events and images. In all experiments, we report the mAP and mean number of MFLOPS per newly inserted event over the DSEC-Detection validation set. When computing the FLOPS, we do not take into account the computation necessary by the CNN, because it needs to be executed only once. Our baseline model uses DAGr-S for the events branch and ResNet-18 (ref. 30).

    Ablations on fusion. In the following ablation studies, we investigate the influence of (1) the feature sampling layer and (2) the effect on detection adding at the detection outputs of an event and image branch. We summarize the results of this experiment in Extended Data Fig. 5d. In summary, we see that our baseline (Extended Data Fig. 5d, row 4) achieves an mAP of 37.3 with 6.73 MFLOPS per event. Removing feature sampling results in a drop of 3.1 mAP, while reducing the computational complexity by 0.73 MFLOPS per event. We argue that the performance gain due to feature sampling justifies this small increase in computational complexity. Removing detection adding at the output reduces the performance by 5.8 mAP, while also reducing the computation by 1.24 MFLOPS per event. We argue that this reduction comes from the fact that the image features are predominantly used to generate the output (that is, compared with the events only, which is 18.5 mAP lower), and thus more event features are pruned at the max pooling layer (roughly 20% more). Finally, if both feature sampling and detection adding are removed, we arrive at the original DAGr architecture, which achieves an mAP of 14.0 with 6.05 MFLOPS per event. It has a computational complexity on par with the baseline with detection adding, but with a performance of 20.2 mAP lower, justifying the use of detection adding.

    Other ablations. We found that two more factors helped the performance of the method without affecting the computation markedly: (1) CNN pretraining and (2) concatenation of image and event features that we ablate in Extended Data Fig. 5e. To test the first feature, we train the model end to end, without pretraining the CNN branch, and found that it resulted in a 0.2-mAP reduction in performance, with a negligible reduction in computational complexity. Next, we replaced the concatenation operation with a summation, which reduces the number of input channels to each spline convolution. This change reduces the mAP by 0.5 mAP and the computation by 1.24 MFLOPS per event. Instead, naive concatenation requires 7.49 MFLOPS per event without the simplifications in equation (18). If we use equation (18), we can reduce this computation to 6.74 MFLOPS per event, a roughly 10% reduction with no performance impact.

    Ablation on CNN backbone. We evaluate the ability of our method to perform inter-frame detection using different network backbones, namely, ResNet-18, ResNet-34 and ResNet-50, and provide the results in Extended Data Fig. 5a. Green and reddish colours indicate with and without events, respectively. As seen previously with the ResNet-50 backbone event and image-based methods (green), all show stable performance, successfully detecting objects in the 50 ms between two frames. As the backbone capacity increases, their performance level also increases. We also observe that with increasing time t ranging from 0 ms to 50 ms, all methods slightly increase, reach a maximum and then decrease again, improving the initial score at t = 0 by between 0.6 mAP and 0.7 mAP. The performance increase can be explained because of the addition of events, that is, more information becomes available so that detections can be refined, especially in the dark and blurry regions of the image. The subsequent slight decrease can then be explained by the fact that image information becomes more outdated. By contrast, purely image-based methods (red) suffer significantly in this setting. While starting off at the same level as the image and event-based methods, they quickly degrade by between 8.7 mAP and 10.8 mAP after 50 ms. The performance change over time for all methods is shown in Extended Data Fig. 5c, in which we confirm our findings. This decrease highlights the importance of updating the prediction between the frames. Using events is an effective and computationally cheap way to do so, closing the gap of up to 10.8 mAP. We illustrate this gain in performance by using events qualitatively in Fig. 5, in which we show object detections of DAGr-S + ResNet-50 in edge-case scenarios.

    Timing experiments. We report the runtime of our method in Extended Data Table 1 and find the fastest method to be DAGr-S + ResNet50 with 9.6 ms. Specific hardware implementations are likely to reduce this number substantially. Moreover, as can be seen in the comparison, MFLOPS per event does not correlate with runtime at these low computation regimes, and this indicates that significant overhead is present in the implementation. We use the PyTorch Geometric86 library, which is optimized for batch processing, and thus introduces data handling overhead. When eliminating this overhead, runtimes are expected to decrease even more.

    Further experiments on DSEC-Detection

    Event cameras provide additional information

    One of the proposed use cases for an event camera is to detect objects before they become fully visible within the frame. These could be objects, or parts of objects, appearing from behind occlusions, or entering the field of view. In this case, the first image does not carry sufficient information to make an informed decision, which requires waiting for information from additional sensors, or integrating context-enriched information from details such as shadows and body parts. Integrating this information can reduce the uncertainties in partially observable situations and is applicable to both image- and event-based algorithms. Event cameras, however, provide additional information, which invariably enhances prediction, even under partial observability (for example, an arm appearing from behind an occlusion or a cargo being lost on a highway). To test this hypothesis, we compared our method with the image-based baseline with extrapolation on the subset of DSEC-Detection in which objects suddenly appear or disappear (a total of 8% of objects). This subset requires further information to fill in these detections. Our event- and image-based method achieves 37.2 mAP, and the image-based method achieves 33.8 mAP, showing that events can provide a 3.4-mAP boost in this case.

    Incorporating CNN latency into the prediction

    Our hybrid method relies on dense features provided by a standard CNN, which is computationally expensive to run. We thus try to understand if our method would also work in a scenario in which dense features appear only after computation is finished and then need to be updated by later events. To test this case, we perform the following modification to our method. After a computation time Δt for computing the dense features, we integrate the events from the interval [Δt, 50 ms] into the detector. This means that for time 0 < t < Δt, no detection can be made, as no features are available from images. In this interval, either the event-only method from Extended Data Fig. 5d (row 1) can be used, or a linear propagation from the detection from the previous interval. At time t > Δt, we use the events in interval [Δt, t]. The runtimes for the different image networks (ResNet-18, ResNet-34 and ResNet-50 + detection head) were 5.3 ms, 8.2 ms and 12.7 ms, respectively, on a Quadro RTX 4000 laptop GPU. We report the results in Extended Data Fig. 3c. We see that on the full DSEC-Detection test set after 50-ms events, DAGr-S + ResNet-50 achieves a performance of 41.6 mAP, 0.3 mAP lower than without latency consideration. On the inter-frame detection task, this translates to a reduction from 44.2 mAP to 43.8 mAP, still 6.7 mAP higher than the image-based baseline with extrapolation implemented. This demonstrates that our method outperforms image-based methods even when considering computational latency due to CNN processing. For smaller networks ResNet-34 and ResNet-18, the degradations on the full test set are 0.1 mAP and 0.1 mAP, respectively, compared with the corresponding methods without latency consideration. Notably, smaller networks have lower latency and thus incur smaller degradations. However, the largest model still achieves the highest performance. Nonetheless, to minimize the effect of this latency, future work could consider incorporating the latency into the training loop, in which case the method will probably learn to compensate for it.

    Research ethics

    The study has been conducted in accordance with the Declaration of Helsinki. The study protocol is exempt from review by an ethics committee according to the rules and regulations of the University of Zurich, because no health-related data have been collected. The participants gave their written informed consent before participating in the study.

    [ad_2]

    Source link