Tag Archives: Open source

Android and RISC-V: What you need to know to be ready

Android is an open source operating system that is freely available to port to many devices and architectures. As such it supports many different device types and CPU architectures. We’re excited to be adding a new one to that list - RISC-V.

RISC-V is a free and open instruction set architecture (ISA), bringing the same spirit of industry-wide collaboration and innovation that we see in software around open source to the hardware ecosystem. Invented 10 years ago at the University of California, Berkeley, RISC-V has seen rapid adoption in embedded and microcontroller spaces, and in recent years has expanded into accelerators, servers, and mobile computing.

In November of 2022, we announced at the RISC-V Summit that we were accepting patches for RISC-V:

The latest update that we have is that now not only are we accepting patches, but we have begun to mature support for RISC-V in Android. RISC-V is a modular ISA, meaning that there are a large number of optional extensions. We have also determined an initial set that we feel is critical to ensure that any CPU running RISC-V will have all of the features we expect to achieve high performance. This set includes the rva22 profile as well as the vector and vector crypto extensions. This update was provided at the RISC-V summit in Europe:

You can build, test, and run the Android support for RISC-V on your own machine as well now! Just like other platform targets in AOSP, you can use the Cuttlefish Virtual Device support:

$ lunch aosp_cf_riscv64_phone-userdebug $ m -j $ launch_cvd -cpus=8 -memory_mb=8192

Then, you can use vncviewer to connect to the running device and interact.

Moving image of vncviewer running on an Android device

At this time, these patches will support building and running a basic Android Open Source Project experience, but are not yet fully optimized. For example, work on a fully optimized backend for the Android Runtime (ART) is still a work in progress. Additionally, AOSP, our external projects, and compilers haven’t generated fully optimized, reduced code that also takes advantage of the latest ratified extensions, such as the one for vectors. However, we believe that it is ready to allow experimentation and collaboration.

Later this year, we expect to have the NDK ABI finalized and canary builds available on Android’s public CI soon and RISC-V on x86-64 & ARM64 available for easier testing of riscv64 Android applications on a host machine. By 2024, the plan is to have emulators available publicly, with a full feature set to test applications for various device form factors! As recently announced in our collaboration with Qualcomm, we expect wearables to be the first form factor available.

However, just porting the Android operating system itself is not enough! We are working with the community and RISE (RISC-V Software Ecosystem). The RISE Project has been established to provide a way to accelerate the availability of software for high-performance and power-efficient RISC-V processor cores running high-level operating systems. That includes not only Android, but also Linux and other operating systems across a variety of application domains, including high-performance computing. The RISE Project includes members from Andes, Google, Intel, Imagination Technologies, MediaTek, Nvidia, Qualcomm Technologies, Red Hat, Rivos, Samsung, SiFive, T-Head, and Ventana.

Google is also continuing and expanding our strong investments at RISC-V International, even beyond our long-standing Premium membership and board participation. We also have many other contributors in key roles on horizontal committees, working groups, and technical committees to ensure that specifications are rapidly being designed and ratified to benefit not only Android but also many other use cases.

Android's support for RISC-V is dependent on a wide range of contributions from toolchain to basic support libraries. We are very appreciative of the ongoing efforts which requires countless projects to support RISC-V build configurations and quality implementations. If you are interested in contributing please visit the following resources:

  • https://github.com/google/android-riscv64 for detailed information on how to build and test the RISC-V support in Android, list of known issues and opportunities to contribute to AOSP at source.android.com and toolchain projects and support libraries.
  • Subscribe to RISC-V Android SIG mailing list or join directly, if your organization is a member of RISC-V International to stay tuned in to progress and offer your suggestions and feedback.

Make sure to stay tuned as we look into ways to make it as easy for Android developers writing native to target new platforms as it is for our Java and Kotlin developers!

Planning to head to the RISC-V International Summit in November? Find us there– we’ll be hosting a Community Collaboration Breakfast on Wednesday morning! Not attending the conference but interested? Learn more and register here.

By Lars Bergstrom – Android Platform Programming Languages & Greg Simon - Google Low-level Operating System

Finding Stability in Open Source Work

At Google, open source is at the core of our infrastructure, processes, and culture. For the last 19 years, Google’s Open Source Programs Office (OSPO) has enabled our organization to support open source ecosystems through funding, training, mentorship and direct contribution. Every year for the last 5 years, roughly 10% of our workforce has contributed to open source projects as part of their work as well as in their personal time. We’re focused on investing in and protecting open source communities and infrastructure, as well as expanding access to open source opportunities around the world. Every day we seek to promote open and connected ecosystems as the foundation of technological advancement.

For the last four years, researchers in Google's Open Source Programs Office (OSPO) have analyzed our open source contribution activity annually to identify trends and changes in behavior. The goal of this effort has been to increase transparency and accountability across all of the communities we engage with, as well as provide feedback indicators for Alphabet’s internal tools, processes, and policies. In this iteration, our 2022 open source contribution metrics were remarkably consistent with what we found in 2021, which gives us confidence that what we're measuring is a good representation of open source behavior, especially after the extreme outlier year of 2020.


Security remains a priority

At Alphabet, open source software remains a critical component of our infrastructure, products, and services and we continue to rely on the health and availability of open source projects. Through internal efforts and collaboration with industry-led efforts such as OpenSSF, Alphabet is committed to bolstering the security posture of projects, users, and developers of open source software.

In 2021, Google began funding two Linux Foundation contractors to focus exclusively on security, and in 2022 we've continued to sponsor their work to eliminate fragile C language features and APIs in the kernel. We also continue to support the Rust-in-Linux project, with the goal of improving memory safety, strengthening APIs, and reducing the number of bugs overall in the project. In late 2022, Rust infrastructure support landed in the upstream kernel.

The deps.dev project released a public BigQuery dataset, allowing anyone to explore and analyze the dependencies, advisories, ownership, license, and other metadata of open source packages across supported ecosystems, and explore how this metadata has changed over time.

In 2022 we announced:

  • The OSV-Scanner, a free tool enabling open source developers and users to identify and remediate known vulnerabilities in their project's OSS dependencies. The OSV-Scanner provides a supported frontend to the OSV database which connects a project’s list of dependencies with the vulnerabilities that affect them.
  • The GOSST Upstream Team, a dedicated staff of Google open source security engineers who spend 100% of their time working closely with upstream maintainers to improve the security of critical open source projects.
  • Graph for Understanding Artifact Composition (GUAC) which aggregates software security metadata into a high fidelity graph database–normalizing entity identities and mapping standard relationships between them.

Our contributions continue to scale with our growing workforce

In 2022, roughly 10% of Alphabet's full-time workforce contributed to open source projects hosted on GitHub or Git-on-Borg - our internal production Git service (more details below). This percentage has remained roughly consistent over the last five years, indicating that our open source contribution has continued to scale with the growth of Alphabet. Similar to last year, FTEs represented over 95% of our open source workers, while the remainder includes vendors, independent contractors, temporary staff, and interns who contributed to open source projects during their tenure at Alphabet.

As open source work is core to our ongoing operations, we continue to track engagement over time, helping to compare continuous and sporadic participation. On average, over 45% of our active* contributing population for the year logged an activity on GitHub or Git-on-Borg in an average month. (see Figure 1)
This chart shows Alphabet's monthly active users on GitHub and Git-on-Borg. Over the last five years, the trajectory of monthly active users has continued to increase on both GitHub and Git-on-Borg by more than 15% year over year per month

Our portfolio of projects remains active

We estimate that more than 2000 projects that originated from Alphabet teams and employees were still active* (not archived). To make this estimate, we chose a broad and variable definition of an open source project, including developer tools, utilities, languages, frameworks, libraries, demos, sample code, models, raw data, designs, and more.

Project counts should not be confused with repositories as projects can include many repositories. Within Alphabet, we maintain over 7500 public repositories on GitHub and 1600 public repositories on Git-on-Borg. Our total repositories under management have reduced over time with the enforcement of a new archiving policy that flags repositories for archiving based on activity levels and owner feedback. Most of these repositories are open to outside contribution: more than 500,000 unique GitHub accounts not affiliated with Alphabet workers contributed to Alphabet projects in 2022.

The majority of our open source work happens outside of Alphabet organizations

The majority of repositories we work on are outside of Alphabet organizations: Over the last five years, more than 70% of non-personal GitHub repositories Alphabet contributors interacted with were outside of Google-managed organizations. We updated the methodology behind this metric since our last edition to filter out forks created in the pull request workflow. The top projects (by unique contributors at Alphabet) include Google-initiated projects such as Kuberenetes, Apache Beam, and gRPC as well as community-led projects such as LLVM, Envoy, and Rust.


We continue to invest in the sustainability of open source ecosystems

The mission of the Google Open Source Programs Office remains the same: we sponsor, create, and invest in projects and programs that enable everyone to join and contribute to the global open source ecosystem. In 2022, OSPO provided $5.7M in membership fees and sponsorship funding to 60 key open source projects and organizations. This funding was in addition to our established annual programs:

  • In its 18th year, Google Summer of Code enabled more than 1000 individuals to contribute to more than 150 organizations. Over the lifetime of this program, more than 19,000 individuals from 112 countries have contributed to more than 800 open source organizations across the globe.
  • In its fourth year, Google Season of Docs provided direct grants to 30 open source projects to hire more than 50 technical writers to improve open source project documentation, and published its second case study report highlighting useful open source documentation metrics. More than half of the documentation created in the 2022 program were how-tos, tutorials, and reference documentation; projects primarily wanted to add documentation for missing use cases and fix disorganized documentation.
  • Since 2011, the Google Open Source Peer Bonus Program has awarded bonuses for open source contributions to members of our extended community. In 2022 more than 300 contributors received awards, working in over 40 countries on more than 200 open source projects.

Our open source work will continue to grow and evolve to support the changing needs of our communities. Thank you to our colleagues and community members who continue to dedicate their personal and professional time supporting the open source ecosystem. Follow our work at opensource.google.

By Sophia Vargas – Researcher, Google Open Source Programs Office


About this data:

This report features metrics provided by many teams and programs across Alphabet. In regards to the code and code-adjacent activities data, we wanted to share more details about the derivation of those metrics.

2022 updates: This year, we decided to remove event counts as it is increasingly difficult to differentiate automated activities from human-centered work. Even after filtering out non-human accounts, we couldn’t correlate these events to employee time spent on open source projects, and so we reduced our reporting to focus on our population and scope of effort.

  • Data sources: These data represent activities on repositories hosted on GitHub and our internal production Git service Git-on-Borg. These sources represent a subset of open source activity currently tracked by Google OSPO.
    • GitHub: We continue to use GitHub Archive as the primary source for GitHub data, which is available as a public dataset on BigQuery. Alphabet activity within GitHub is identified by self-registered accounts, which we estimate underreports actual activity.
    • Git-on-Borg: This is our primary platform for internal projects and some of our larger, long running public projects such as Android and Chromium. While we continue to develop on this platform, most of our open source activity has moved to GitHub to increase exposure and encourage community growth.
    • Distinct event types: Note that Git-on-Borg and GitHub APIs produce distinct sets of events—so we report activity metrics per platform. Where GitHub Event logs capture a wide range of activity from code creation and review to issue creation and comments, the Gerrit Event stream (used by Git-on-Borg) only captures code changes and reviews.
  • Driven by humans: We have created many automated bots and systems that can propose changes on various hosting platforms. We have intentionally filtered these data to focus on human-initiated activities.
  • Business and personal: Activity on GitHub reflects a mixture of Alphabet projects, third party projects, experimental efforts, and personal projects. Our metrics report on all of the above unless otherwise specified.
  • Alphabet contributors: Please note that unless additional detail is specified, activity counts attributed to Alphabet open source contributors will include our full-time employees as well as our extended Alphabet community (temps, vendors, contractors, and interns).
  • GitHub Accounts: For counts of GitHub accounts not affiliated with Alphabet, we cannot assume that one account is equivalent to one person, as multiple accounts could be tied to one individual or bot account.
  • *Active counts: Where possible, we will show ‘active users’ defined by logged activity (excluding ‘WatchEvent’) within a specified timeframe (a month, year, etc.) and ‘active repositories’ and ‘active projects’ as those that have enough activity to meet our internal criteria and have not been archived.

Full support of PostgreSQL engine comes to Logica

Logica is a logic programming language designed for intuitive and efficient data manipulation, which we open sourced in 2020. It compiles to SQL, providing access to the power of SQL engines with the convenience of a logic programming syntax.

When it was open sourced, Logica's only fully supported engine was BigQuery, a powerful data warehouse, executing queries with high parallelization and processing terabytes of data within seconds.

Modern machines can store and process significant amounts of data, even within a single computer. Thus relational SQL databases are as popular as ever. They contain a lot of data and its analysis is important. Among open source database options, PostgreSQL and SQLite are some of the most popular database engines (example1, example2). Logica added support for SQLite in 2021.

Now we are pleased to announce a new release of Logica that adds support for PostgreSQL.

As Logica compiles to SQL, it is natural to extend the language to use PostgreSQL as the engine. However, there are nuances in the SQL dialect of Postgres which require addressing. The biggest distinction is that PostgreSQL requires types of records to be explicitly spelled out in your query, while BigQuery determines the types automatically.

For example, consider a Logica predicate where for each user we collect a list of records with information about their purchases.

UserPurchases(

    user_id:,

    user_name:,

    purchases? List= {item_name:, item_price:}) distinct :-

  Purchase(purchase_id:, user_id:, item_name:, item_price:),

  UserInfo(user_id:, user_name:);


We can translate this Logica predicate to GoogleSQL to run on BigQuery as follows:

SELECT

  user_id,

  user_name,

  ARRAY_AGG(STRUCT(item_name as item_name, item_price as item_price)) as purchases

FROM

  Purchases INNER JOIN

  UserInfo USING (user_id)

GROUP BY 1, 2;

Logica's record {item_name:, item_price:} simply compiles into GoogleSQL's STRUCT(item_name as item_name, item_price as item_price).

However, in the dialect of PostgreSQL composite types must be explicitly defined and specified. In our example, we need to define the type PurchaseRecord with fields item_name and item_price. We should also specify in the query that the purchases column is aggregating records of type PurchaseRecord. Thus PostgreSQL query for our predicate would be written like so.

CREATE TYPE PurchaseRecord as (item_name text, item_price numeric);


SELECT

  user_id,

  user_name,

  ARRAY_AGG(ROW(item_name,

                item_price)::PurchaseRecord) AS purchases

FROM

  Purchase INNER JOIN

  UserInfo USING (user_id)

GROUP BY UserInfo.user_id, UserInfo.user_name;


Records and lists are also useful as intermediates in calculations, even if the input and output data are normalized. For example, we have a table called ItemSales and want to find a list of most sold items in each of the stores that the table describes. Specifically, we want to assemble a table with information about the top three most sold items among all of the stores. For each of the items, we may want to list the department of the store where the item is being sold. This can be achieved intuitively using the ArgMax3 aggregate function, which accumulates all the information about the items that we need, and no extra join is needed.


# Collecting information of top 3 most sold items for each store.

StoreTopItemsCollection(store) ArgMax3= {item:,

                                         department:} -> sales_volume :-

  ItemSales(store:, item:, department:, sales_volume:);


# Flattening top items collection.

StoreTopItems(store:, item:, department:) :-

  {item:, department:} in StoreTopItemsCollection(store);


To support the PostgreSQL engine, we extended the Logica compiler with type inference. Logica now infers data types for all expressions that a user employs. For records and arrays, Logica specifies their type in the produced SQL, just as PostgreSQL requires. Commands to create necessary types are produced as part of the compiled SQL. In this collab, we show an example of a program that writes a PostgreSQL table, and in this collab, we show how to give type hints when the program does not have enough information for complete inferences.

As a byproduct of type inference, we were able to improve error messages. Now that we know the types, we can point to the user where a mistake is made within the Logica program, rather than the user having to debug the generated SQL statement.

PostgreSQL is a popular and powerful engine. It is easy to start your own instance (maybe just in CoLab!), or use a serverless option. We are excited to provide users of Logica with the option to run on Postgres. If you already use PostgreSQL, we encourage you to give Logica a try, it is a joy to write data analysis with logic programming! If you have any feedback or questions, please share at the discussion section of Logica repository.

By Evgeny Skvortsov, Software Engineer – Google

Showing Our Work: A Study In Understanding Open Source Contributors

In 2022, the research team within Google’s Open Source Programs Office launched an in-depth study to better understand open source developers, contributors, and maintainers. Since Alphabet is a large consumer of and contributor to open source, our primary goals were to investigate the evolving needs and motivations of open source contributors, and to learn how we can best support the communities we depend on. We also wanted to share our findings with the community in order to further research efforts and our collective understanding of open source work.

Key findings from this work suggest that community leaders should:

  • Value your time together and apart: Lack of time was cited as the leading reason ‘not to contribute’ as well as motivation to ‘leave a community’. This should encourage community leaders to adopt practices that ensure that they are making the most of the time they have together. One example: some projects have planned breaks, no-meeting weeks, or official slowdowns during holidays or popular conference weeks.
  • Invest in documentation: Contributors and maintainers expressed that task variety, delegation, and onboarding new maintainers could help to reduce burnout in open source. Documentation is one way to make individual knowledge accessible to the community. In addition to technical and procedural overviews, documentation can also be used to clarify roles, tasks, expectations, and a path to leadership.
  • Always communicate with care: Contributors prefer projects that have welcoming communities, clear onboarding paths, and a code of conduct. Communication is the primary way for community leaders to promote welcoming and inclusive communities and set norms around language and behavior (as documented in a Code of Conduct). Communication is also how we build relationships, trust, and respect for each other.

  • Create spaces for anonymous feedback: Variable answers between demographic subsets in our research suggest that while systematic approaches can be taken to reduce burnout, there is no one-size-fits-all approach. Feedback is a valuable tool for any project to adjust to the evolving needs of their contributor and user communities. When designed appropriately, surveys can serve as safe, anonymous, retaliation-free spaces for individuals to provide honest feedback.

How do contributors select projects?

We asked respondents to share their most important criteria when selecting an open source project to contribute to in their personal time. The top responses were: welcoming community, clear onboarding path, and code of conduct.
Base: 517 international OSS developers, contributors, maintainers and students who worked on open source in their personal time

Within Google’s Open Source Programs office, we are constantly looking for ways to improve support for contributors inside and outside of Google. Studies such as this one provide guidance to our programs and investments in the community. This work helps us to see we should continue to:

  • Invest in documentation competency: Google Season of Docs provides support for open source projects to improve their documentation and gives professional technical writers an opportunity to gain experience in open source.
  • Document roles and promote tactics that recognize work within communities: The ACROSS project continues to work with projects and communities to establish consistent language to define roles, responsibilities, and work done within open source projects.
  • Exercise and discuss ‘better’ practices within the community: While we continually seek to improve our engagement practices within communities, we will also continue to share these experiences with the broader community in hopes that we can all learn from our successes and challenges. For example, we’ve published documentation around our release process, including resources for the creation and management of a code of conduct.

This research, along with other articles authored by the OSPO research team is now available on our site.

By Sophia Vargas – Researcher, Google Open Source Programs Office

GSoC 2023: project results and feedback part 1



In 2023, Google Summer of Code brought 966 new contributors into open source software development to work with open source organizations on a 12+ week project. We had 168 participating open source organizations with mentors and contributors from over 75 countries this year.

For 19 years, Google Summer of Code has thrived due to the enthusiasm of our open source communities and the 19k+ volunteer mentors that spend from 50-150 hours mentoring each of our 20k contributors since 2005! This year, there are 168 mentoring organizations and over 1,950 mentors participating in the 2023 program. A sincere thank you to our mentors and organization administrators for guiding and supporting our contributors this year. We are also looking forward to hosting many of the 2023 GSoC Mentors on campus this fall for the annual Mentor Summit.

September 4th concluded the standard 12-week project timeline and we are pleased to announce that 628 contributors have successfully completed this year’s program as of today, September 5th, 2023. Congratulations to all the contributors and mentors that have wrapped up their summer coding projects!

2023 has shown us that GSoC continues to grow in popularity with students and developers 19 years after the program began. GSoC had a record high 5,679 contributor applicants from 106 countries submit their project proposals this year. We also had huge interest in the program with over 43,765 registrants from 160 countries applying to the program during the two week application period.

The final step of every GSoC program is to hear back from mentors and contributors on their experiences through evaluations. This helps GSoC Admins continuously improve the program and gives us a chance to see the impact the program has on so many individuals! Some notable results and comments from the standard 12-week project length evaluations are below:

  • 95.63% of contributors think that GSoC helped their programming skills
  • 99.06% of contributors would recommend their GSoC mentors
  • 97.81% of contributors will continue working with their GSoC organization
  • 99.84% of contributors plan to continue working on open source
  • 82.81% of contributors said they would consider being a mentor
  • 96.25% of contributors said they would apply to GSoC again

Here’s some of what our GSoC 2023 Contributors had to say about the program!


At the suggestion of last year’s contributors, we added multiple live talks throughout the coding period to bring contributors together, providing tips to help them make the most of their GSoC experience. Each of these talks were attended on average by 42% of the 2023 GSoC contributors.

Another request from our previous contributors was to hear more about the cool projects their colleagues did over the summer and the opportunity to talk about their own projects with others. Over the coming weeks we are hosting three lightning talk sessions where over 40 of the 2023 contributors will have the opportunity to present their project learnings to the other contributors and their mentors.

We’ll be back in a couple of months to give a final update on the GSoC projects that will conclude later this year. Almost 30% of contributors (286 contributors) are still completing their projects, so please stay tuned for their results in part two of this blog post later this year!

By Perry Burnham – Google Open Source

ChromeOS EC testing suite in Renode for consumer products

Besides main application cores that are directly exposed to the users, many industrial and consumer devices include embedded controllers, which, although fairly invisible to the user, perform critical system tasks such as power management, receiving and processing user input, or signals from sensors like thermal. Given their role in the system, those MCUs need to be rigorously tested in CI. This is why the ChromeOS team has collaborated with Antmicro to simulate the ChromeOS FPMCU (Fingerprint Firmware) module, based on the ChromeOS EC (Embedded Controller) firmware in Antmicro’s Renode open source simulation framework.

This enables automated testing of embedded controllers in CI at scale, in a deterministic manner, and with complete observability. It also streamlines the developer feedback loop for faster development of microcontroller firmware that ChromeOS uses to drive peripherals, such as fingerprint readers or touchpads. To make this possible, we have improved the simulation capabilities for two of the microcontrollers used in FPMCU modules, popular in consumer electronics like Chromebooks and wearables, but also in many industrial applications: STM32F412 and STM32H743.

Testing consumer-grade products with Renode
Testing consumer-grade products with Renode

Continuous testing for embedded systems

The project required implementing continuous testing of the FPMCU module against tens of binaries that test the controller in the most common operations and scenarios to ensure maximum reliability at all times. A traditional approach would require reflashing the physical microcontroller memory with each binary, which is time-consuming and error-prone. To scratch that itch we developed the CrOS EC Tester, which runs all EC tests in a Renode simulation and uses GitHub Actions to handle building and executing test binaries for a truly automated workflow—useful both in CI and in an interactive development environment.

Renode has broad support for architectures such as (but not limited to) RISC-V, ARM Cortex-M, (recently added) Cortex-R or Cortex-A, and runs binary-compatible software. Thus, it is not limited to testing embedded controllers but entire multi-CPU systems. You can easily add Renode to an existing workflow without any major changes for testing in real-life scenarios. By moving all testing efforts into an interactive and deterministic environment of Renode, you can implement a fully CI-driven testing approach in your projects and benefit from advanced debugging, tracing, and prototyping capabilities.

Comprehensive simulation of STM32 microcontrollers

The Renode models of the STM32F412 and STM32H743 microcontrollers give you access to a broad range of peripherals, allowing you to run various scenarios you’d typically test on hardware. As a result of our collaboration with Google, we have added or improved models of ST peripherals like UART, EXTI, GPIO, DMA, ADC, SPI, flash controllers, timers, watchdogs, and more.

The need for in-depth testing has led to the introduction of many enhancements to ARM Cortex-M support in general, such as the MPU (Memory Protection Unit), which allows you to protect certain memory areas from unauthorized modification or access or FPU interrupts. These features can now be used by other Cortex-M-based projects to further extend their test coverage with Renode.

Renode for rapid, interactive prototyping

One of the tests from our test suite used the microcontroller's MPU module to test address space security. When you run the test-rollback test case, you can see that the MPU simulated in Renode successfully protected the OS from unauthorized memory access:

Testing consumer-grade products with Renode
Testing consumer-grade products with Renode

Another Renode feature that allowed us to increase our test coverage of the EC ecosystem is support for dummy SPI and I2C devices. While Renode supports a recently added advanced framework for time-controlled feeding of sensor data, many scenarios require much simpler interaction with the external device. For this purpose, we developed a dummy SPI device that simply returns pre-programmed data to the controller, which allowed us to pass initialization tests for a sensor without modeling the sensor itself. From the functional point of view of the simulation, the dummy sensor data is identical to the real data, which is useful when the specific component is difficult to model or lacks documentation.

Build a CI-driven test workflow with Renode

Renode is a powerful tool for automating and simplifying the test workflow in the project at any stage of development, even pre-silicon. It helps you reduce the tedium typically associated with embedded software testing by providing a fully controllable environment that can lead to fewer bugs and vulnerabilities, which is naturally important for mass-market products such as Chromebooks.

By Michael Gielda – Antmicro

A vision for more efficient media management

Petit Press’ new open source, cloud-based DAM platform helps publishers get rich media content in front of their audience at pace and scale.

Picture the scene: You’re an investigative journalist that has just wrapped up a new piece of video content that offers incisive, timely commentary on a pressing issue of the day. Your editor wants to get the content in front of your audience as quickly as possible and you soon find yourself bogged down in a laborious, manual process of archiving and uploading files. A process that is subject to human error, and involves repeating the same tasks as you prepare the content for YouTube and embedding within an article.

With the development of a new open source digital asset management (DAM) system, Slovak publishing house, Petit Press, is hoping to help the wider publishing ecosystem overcome these types of challenges.

Striving towards a universal, open source solution

Like many publishers in today’s fast-paced, fast-changing news landscape, Petit Press was feeling the pressure to be more efficient and do more with less, while at the same time maximizing the amount of high-quality, rich media content its journalists could deliver. “We wanted to find a solution to two main asset delivery issues in particular,” says Ondrej Podstupka, deputy editor in chief of SME.sk. “Firstly, to reduce the volume of work involved in transferring files from our journalists to our admin teams to the various platforms and CMS we use. Secondly, to avoid the risk of misplacing archived files or losing them entirely in an archive built on legacy technologies.”

As a publisher of over 35 print and digital titles, including one of Slovakia’s most-visited news portal, SME.sk, Petit Press also had a first-hand understanding of how useful the solution might be if it could flex to the different publishing scales, schedules, and platforms found across the news industry. With encouragement and support from GNI, Petit Press challenged themselves to build an entirely open source, API-based DAM system that flexes beyond their own use cases and can be easily integrated with any CMS, which means that other publishers can adapt and add functionality with minimal development costs.

Getting out of the comfort zone to overcome complexity

For the publisher, creating an open source project requires collaboration, skill development, and a strong sense of purpose. GNI inspired our team members to work together in a positive, creative, and supportive environment. Crucial resources from GNI also enabled the team to broaden the scope of the project beyond Petit Press’ direct requirements to cover the edge use cases and automations that a truly open source piece of software requires.

“GNI has enabled our organization to make our code open source, helping to create a more collaborative and innovative environment in the media industry.” 
– Ondrej Podstupka, deputy editor in chief of SME.sk

Building and developing the tool was difficult at times with a team of software engineers, product managers, newsroom managers, UX designers, testers, and cloud engineers all coming together to see the project to completion. For a team not used to working on GitHub, the open source aspect of the project proved the primary challenge. The team, however, also worked to overcome everything from understanding the complexities of integrating a podcast feature, to creating an interface all users felt comfortable with, to ensuring compliance with YouTube’s security requirements.

Unburdening the newsroom and minimizing costs

The hard work paid off though, when the system initially launched in early 2023. Serving as a unified distribution platform, asset delivery service and long term archive, the single solution is already unburdening the newsroom. It also benefits the tech/admin teams, by addressing concerns about the long-term costs of various media storage services.

On Petit Press’ own platforms, the DAM system has already been successfully integrated into SME.sk’s user-generated content (UGC) blog. This integration allows for seamless content management and curation, enhancing the overall user experience. The system also makes regulatory compliance easier, thanks to its GDPR-compliant user deletion process.

In addition to the UGC Blog system, the DAM system has now launched for internal Petit Press users—specifically for managing video and podcast content, which has led to increased efficiency and organization within the team. By streamlining the video and podcast creation and distribution processes, Petit Press has already seen a 5-10% productivity boost. The new DAM system saves an estimated 15-20 minutes of admin time off every piece of video/podcast content Petit Press produces.

Working towards bigger-picture benefits

Zooming out, the DAM system is also playing a central part in Petit Press’ year-long, org-wide migration to the cloud. This transformation was set in motion to enhance infrastructure, streamline processes, and improve overall efficiency within the department.

Podstupka also illustrates how the system might benefit other publishers. “It could be used as an effective standalone, automated archive for videos and podcasts,” he says. For larger publishing houses, “if you use [the DAM system] to distribute videos to YouTube and archive podcasts, there is minimal traffic cost and very low storage cost. But you still have full control over the content in case you decide to switch to a new distribution platform or video hosting service.”

As the team at Petit Press continues to get to grips with the new system, there is a clear goal in mind: To have virtually zero administrative overhead related to audio or video.

Beyond the automation-powered efficiency savings, the team at Petit Press are also exploring the new monetisation opportunities that the DAM system presents. They are currently working on a way to automatically redistribute audio and image assets to their video hosting platform, to automatically create video from every podcast they produce. This video is then pushed to their CMS and optimized for monetisation on the site with very little additional development required.

Ultimately, though, the open source nature of the system makes the whole team excited to see where other publishers and developers might take the product. “It’s a futureproof way to leverage media content with new services, platforms and ideas that emerge in technology or media landscapes,” says Igor, Head Of Development & Infrastructure. A succinct, but undeniably compelling way of summing up the system’s wide-ranging potential.

A guest post by the Petit Press team

Kubeflow joins the CNCF family

We are thrilled to announce a major milestone in the journey of the Kubeflow project. After a comprehensive review process and several months of meticulous preparation, Kubeflow has been accepted by the Cloud Native Computing Foundation (CNCF) as an incubating project. This momentous step marks a new chapter in our collaborative and open approach to accelerating machine learning (ML) in the cloud native ecosystem.

The acceptance of Kubeflow into the incubation stage by the CNCF reflects not just the project's maturity, but also its widespread adoption and expanding user base. It underscores the tremendous value of the diverse suite of components that Kubeflow provides, including Notebooks, Pipelines, Training Operators, Katib, Central Dashboard, Manifests, and many more. These tools have been instrumental in creating a cohesive, end-to-end ML platform that streamlines the development and deployment of ML workflows.

Furthermore, the alignment of Kubeflow with the CNCF acknowledges the project's foundational reliance on several existing CNCF projects such as Argo, Cert-Manager, and Istio. The joining of Kubeflow with the CNCF will serve to strengthen these existing relationships and foster greater collaboration among cloud native projects, leading to even more robust and innovative solutions for users.

Looking ahead, Google and the Kubeflow community are eager to collaborate with the CNCF on the transition process. Rest assured, our commitment to Kubeflow's ongoing development remains unwavering during this transition. We will continue to support new feature development, plan and execute upcoming releases, and strive to deliver further improvements to the Kubeflow project.

We extend our heartfelt thanks to the CNCF Technical Oversight Committee and the wider CNCF community for their support and recognition of the Kubeflow project. We look forward to this exciting new phase in our shared journey towards advancing machine learning in the cloud native landscape.

As Kubeflow continues to evolve, we invite developers, data scientists, ML engineers, and all other interested individuals to join us in shaping the future of cloud native machine learning. Let's innovate together, with Kubeflow and the CNCF, to make machine learning workflows more accessible, manageable, and scalable than ever before!

By James Liu – GCP Cloud AI

Google Dev Library Letters: 21st Edition

Posted by Swathi Dharshna Subbaraj, Google Dev Library

In this newsletter, we highlight the best projects developed with Google technologies that have been contributed to the Google Dev Library platform. We hope this will spark some inspiration for your next project!

Highlights of the Month

In the past two months, we asked contributors to look back, revisit, and update their older Dev Library contributions as a best practice. Most contributors took the time to revise their content and incorporate recent releases. This campaign encourages developers to update their repositories with the latest Google technologies, which is advantageous to users and the broader developer community.

Here are some of the standout up-to-date projects:

  • Sheets Compose Dialogs by Maximilian Keppeler

See how an Android library that offers dialogs and views for various use cases - built with Jetpack Compose for Compose projects. All dialogs and views are easy and quick to implement. 

 

  • Round Corner Progress Bar by Somkiat Khitwongwattana

Progress Bar Animation
Use this extensive “Rounded Corner progress bar” library for your own Android projects. 

During the campaign, we noticed that some new projects were submitted. Here are some of the new projects from our contributors:

  • Android TV sample projects by Ademir Queiroga

Android TV Project
See some of the Android TV sample projects on the main topics around Android TV development, and the project follows Google's best practices with a few experience-based insights.  
 

  • Storage provisioning with Cloud SQL using Workload Identity by Fermin Blanco

Learn how to create a production ready GKE cluster in a matter of seconds. 

Android


Using Android’s new Credential Manager API by Priya Sindkar
Dive into this blog on how Android's new Credential Manager API provides a seamless way for your app’s users to log in with one-click solutions.  

KStore by Isuru Rajapakse
Learn how the tiny Kotlin multiplatform library that assists in saving and restoring objects to and from disk using kotlinx.coroutines, kotlinx.serialisation and okio.  

DevBricksX by Nan YE
Discover how DevBricksX is a remarkable remake and extended version of DevBricks, this project covers various aspects of daily development, from low-level database tasks to user interface design, as it eliminates the need for repetitive work.  

Dose app by Waseef Akhtar
Learn how Dose, a reminder app for people to take their medications on time, was built using Kotlin and Jetpack Compose with MVVM + clean architecture.  

Compose_adaptive_scaffold by Thomas Künneth
Explore how to write Jetpack Compose apps that support large screens and foldables.  

Cloud


Troubleshooting reachability with a Network Intelligence Center connectivity test by Gaurav Madan
Learn how network troubleshooting processes become crucial when time is of the essence, and how to do so efficiently.  

From data chaos to data insights with Google Cloud and GitLab CI: A cutting-edge solution by Gursimar Singh
Take a look at a streamlined, effective approach to acquire important insights from data and learn how to deal with the turmoil of manual data deployment and analysis easily.  

Machine Learning


Client-side in-decent content checking
Discover a JavaScript library to help you quickly identify unseemly images; all in the client's browser.  

YoloV7 in Tensorflow.js by Hugo Zanini
Learn object detection using Yolov7 in tensorflow.js, and how it’s trained on the MS COCO dataset to recognizes up to 80 different classes  

Flutter


Exploring Inherited Widget: The powerful state management solution by Muhammad Salman
Take a deep dive into the backstory of state management in Flutter and explore one of the most important concepts in Flutter state management, the Inherited Widget.  

Control your Flutter app on the fly with Firebase Remote Config by Mangirdas Kazlauskas
Flutter Forward agenda app
Learn the overview of Firebase Remote Config and how to use it to enable real-time features in your Flutter application.  

The ultimate Flutter Navigator 2.0 series using AutoRoute by Cavin Macwan
Explore the differences between Navigator 1.0 and 2.0 and why you need Navigator 2.0. You’ll also learn how you can implement Navigator 2.0 using the Auto Route package in Flutter.  

Angular


Papanasi (UI library) by Quique Fdez Guerra
Learn to use this frontend UI library across frameworks.  

How to manage complex forms in Angular by Roland Tubongye Wabubindja
See how to save and modify data from a form containing several FormArray.  

Community Updates


🚀 Announcing Google Maps Platform added to Dev Library

Progress Bar AnimationGoogle Maps platform in Dev Library

Google Maps Platform has now been officially added to the Dev Library! With these resources, developers can create applications that enable them to visualize geospatial data and build projects ranging from hyperlocal logistics to location-driven app development, and have access to even more resources to take their projects to the next level.

Dev Library contributors will be better able to write and create innovative and useful applications that utilize Google’s mapping, places, and routing data and features.

Visit the Google Maps Platform product page in Dev Library



Browse Dev Library | Google Developers Online on Discord | Newsletter Archives

An open-source gymnasium for machine learning assisted computer architecture design

Computer Architecture research has a long history of developing simulators and tools to evaluate and shape the design of computer systems. For example, the SimpleScalar simulator was introduced in the late 1990s and allowed researchers to explore various microarchitectural ideas. Computer architecture simulators and tools, such as gem5, DRAMSys, and many more have played a significant role in advancing computer architecture research. Since then, these shared resources and infrastructure have benefited industry and academia and have enabled researchers to systematically build on each other's work, leading to significant advances in the field.

Nonetheless, computer architecture research is evolving, with industry and academia turning towards machine learning (ML) optimization to meet stringent domain-specific requirements, such as ML for computer architecture, ML for TinyML accelerationDNN accelerator datapath optimization, memory controllers, power consumption, security, and privacy. Although prior work has demonstrated the benefits of ML in design optimization, the lack of strong, reproducible baselines hinders fair and objective comparison across different methods and poses several challenges to their deployment. To ensure steady progress, it is imperative to understand and tackle these challenges collectively.

To alleviate these challenges, in “ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design”, accepted at ISCA 2023, we introduced ArchGym, which includes a variety of computer architecture simulators and ML algorithms. Enabled by ArchGym, our results indicate that with a sufficiently large number of samples, any of a diverse collection of ML algorithms are capable of finding the optimal set of architecture design parameters for each target problem; no one solution is necessarily better than another. These results further indicate that selecting the optimal hyperparameters for a given ML algorithm is essential for finding the optimal architecture design, but choosing them is non-trivial. We release the code and dataset across multiple computer architecture simulations and ML algorithms.


Challenges in ML-assisted architecture research

ML-assisted architecture research poses several challenges, including:

  1. For a specific ML-assisted computer architecture problem (e.g., finding an optimal solution for a DRAM controller) there is no systematic way to identify optimal ML algorithms or hyperparameters (e.g., learning rate, warm-up steps, etc.). There is a wider range of ML and heuristic methods, from random walk to reinforcement learning (RL), that can be employed for design space exploration (DSE). While these methods have shown noticeable performance improvement over their choice of baselines, it is not evident whether the improvements are because of the choice of optimization algorithms or hyperparameters.

    Thus, to ensure reproducibility and facilitate widespread adoption of ML-aided architecture DSE, it is necessary to outline a systematic benchmarking methodology.

  2. While computer architecture simulators have been the backbone of architectural innovations, there is an emerging need to address the trade-offs between accuracy, speed, and cost in architecture exploration. The accuracy and speed of performance estimation widely varies from one simulator to another, depending on the underlying modeling details (e.g., cycle-accurate vs. ML-based proxy models). While analytical or ML-based proxy models are nimble by virtue of discarding low-level details, they generally suffer from high prediction error. Also, due to commercial licensing, there can be strict limits on the number of runs collected from a simulator. Overall, these constraints exhibit distinct performance vs. sample efficiency trade-offs, affecting the choice of optimization algorithm for architecture exploration.

    It is challenging to delineate how to systematically compare the effectiveness of various ML algorithms under these constraints.

  3. Finally, the landscape of ML algorithms is rapidly evolving and some ML algorithms need data to be useful. Additionally, rendering the outcome of DSE into meaningful artifacts such as datasets is critical for drawing insights about the design space.

    In this rapidly evolving ecosystem, it is consequential to ensure how to amortize the overhead of search algorithms for architecture exploration. It is not apparent, nor systematically studied how to leverage exploration data while being agnostic to the underlying search algorithm.

ArchGym design

ArchGym addresses these challenges by providing a unified framework for evaluating different ML-based search algorithms fairly. It comprises two main components: 1) the ArchGym environment and 2) the ArchGym agent. The environment is an encapsulation of the architecture cost model — which includes latency, throughput, area, energy, etc., to determine the computational cost of running the workload, given a set of architectural parameters — paired with the target workload(s). The agent is an encapsulation of the ML algorithm used for the search and consists of hyperparameters and a guiding policy. The hyperparameters are intrinsic to the algorithm for which the model is to be optimized and can significantly influence performance. The policy, on the other hand, determines how the agent selects a parameter iteratively to optimize the target objective.

Notably, ArchGym also includes a standardized interface that connects these two components, while also saving the exploration data as the ArchGym Dataset. At its core, the interface entails three main signals: hardware state, hardware parameters, and metrics. These signals are the bare minimum to establish a meaningful communication channel between the environment and the agent. Using these signals, the agent observes the state of the hardware and suggests a set of hardware parameters to iteratively optimize a (user-defined) reward. The reward is a function of hardware performance metrics, such as performance, energy consumption, etc. 

ArchGym comprises two main components: the ArchGym environment and the ArchGym agent. The ArchGym environment encapsulates the cost model and the agent is an abstraction of a policy and hyperparameters. With a standardized interface that connects these two components, ArchGym provides a unified framework for evaluating different ML-based search algorithms fairly while also saving the exploration data as the ArchGym Dataset.

ML algorithms could be equally favorable to meet user-defined target specifications

Using ArchGym, we empirically demonstrate that across different optimization objectives and DSE problems, at least one set of hyperparameters exists that results in the same hardware performance as other ML algorithms. A poorly selected (random selection) hyperparameter for the ML algorithm or its baseline can lead to a misleading conclusion that a particular family of ML algorithms is better than another. We show that with sufficient hyperparameter tuning, different search algorithms, even random walk (RW), are able to identify the best possible reward. However, note that finding the right set of hyperparameters may require exhaustive search or even luck to make it competitive.

With a sufficient number of samples, there exists at least one set of hyperparameters that results in the same performance across a range of search algorithms. Here the dashed line represents the maximum normalized reward. Cloud-1, cloud-2, stream, and random indicate four different memory traces for DRAMSys (DRAM subsystem design space exploration framework).

Dataset construction and high-fidelity proxy model training

Creating a unified interface using ArchGym also enables the creation of datasets that can be used to design better data-driven ML-based proxy architecture cost models to improve the speed of architecture simulation. To evaluate the benefits of datasets in building an ML model to approximate architecture cost, we leverage ArchGym’s ability to log the data from each run from DRAMSys to create four dataset variants, each with a different number of data points. For each variant, we create two categories: (a) Diverse Dataset, which represents the data collected from different agents (ACO, GA, RW, and BO), and (b) ACO only, which shows the data collected exclusively from the ACO agent, both of which are released along with ArchGym. We train a proxy model on each dataset using random forest regression with the objective to predict the latency of designs for a DRAM simulator. Our results show that:

  1. As we increase the dataset size, the average normalized root mean squared error (RMSE) slightly decreases.
  2. However, as we introduce diversity in the dataset (e.g., collecting data from different agents), we observe 9× to 42× lower RMSE across different dataset sizes.

Diverse dataset collection across different agents using ArchGym interface.
The impact of a diverse dataset and dataset size on the normalized RMSE.

The need for a community-driven ecosystem for ML-assisted architecture research

While, ArchGym is an initial effort towards creating an open-source ecosystem that (1) connects a broad range of search algorithms to computer architecture simulators in an unified and easy-to-extend manner, (2) facilitates research in ML-assisted computer architecture, and (3) forms the scaffold to develop reproducible baselines, there are a lot of open challenges that need community-wide support. Below we outline some of the open challenges in ML-assisted architecture design. Addressing these challenges requires a well coordinated effort and a community driven ecosystem.

Key challenges in ML-assisted architecture design.

We call this ecosystem Architecture 2.0. We outline the key challenges and a vision for building an inclusive ecosystem of interdisciplinary researchers to tackle the long-standing open problems in applying ML for computer architecture research. If you are interested in helping shape this ecosystem, please fill out the interest survey.


Conclusion

ArchGym is an open source gymnasium for ML architecture DSE and enables an standardized interface that can be readily extended to suit different use cases. Additionally, ArchGym enables fair and reproducible comparison between different ML algorithms and helps to establish stronger baselines for computer architecture research problems.

We invite the computer architecture community as well as the ML community to actively participate in the development of ArchGym. We believe that the creation of a gymnasium-type environment for computer architecture research would be a significant step forward in the field and provide a platform for researchers to use ML to accelerate research and lead to new and innovative designs.


Acknowledgements

This blogpost is based on joint work with several co-authors at Google and Harvard University. We would like to acknowledge and highlight Srivatsan Krishnan (Harvard) who contributed several ideas to this project in collaboration with Shvetank Prakash (Harvard), Jason Jabbour (Harvard), Ikechukwu Uchendu (Harvard), Susobhan Ghosh (Harvard), Behzad Boroujerdian (Harvard), Daniel Richins (Harvard), Devashree Tripathy (Harvard), and Thierry Thambe (Harvard).  In addition, we would also like to thank James Laudon, Douglas Eck, Cliff Young, and Aleksandra Faust for their support, feedback, and motivation for this work. We would also like to thank John Guilyard for the animated figure used in this post. Amir Yazdanbakhsh is now a Research Scientist at Google DeepMind and Vijay Janapa Reddi is an Associate Professor at Harvard.




Source: Google AI Blog