Tag Archives: Open source

Kubernetes CRD Validation Using CEL

Motivation

CRDs was used to support two major categories of built-in validation:

  • CRD structural schemas: Provide type checking of custom resources against schemas.
  • OpenAPIv3 validation rules: Provide regex ('pattern' property), range limits ('minimum' and 'maximum' properties) on individual fields and size limits on maps and lists ('minItems', 'maxItems').

For use cases that cannot be covered by build-in validation support:

  • Admission Webhooks: have validating admission webhook for further validation
  • Custom validators: write custom checks in several languages such as Rego

While admission webhooks do support CRDs validation, they significantly complicate the development and operability of CRDs.

To provide an self-contained, in-process validation, an inline expression language - Common Expression Language (CEL), is introduced into CRDs such that a much larger portion of validation use cases can be solved without the use of webhooks.

It is sufficiently lightweight and safe to be run directly in the kube-apiserver, has a straight-forward and unsurprising grammar, and supports pre-parsing and typechecking of expressions, allowing syntax and type errors to be caught at CRD registration time.


CRD validation rule

CRD validation rules are promoted to GA in Kubernetes 1.29 to validate custom resources based on validation rules.

Validation rules use the Common Expression Language (CEL) to validate custom resource values. Validation rules are included in CustomResourceDefinition schemas using the x-kubernetes-validations extension.

The Rule is scoped to the location of the x-kubernetes-validations extension in the schema. And self variable in the CEL expression is bound to the scoped value.

All validation rules are scoped to the current object: no cross-object or stateful validation rules are supported.

For example:

...
  openAPIV3Schema:
    type: object
    properties:
      spec:
        type: object
        x-kubernetes-validations:
          - rule: "self.minReplicas <= self.replicas"
            message: "replicas should be greater than or equal to minReplicas."
          - rule: "self.replicas <= self.maxReplicas"
            message: "replicas should be smaller than or equal to maxReplicas."
        properties:
          ...
          minReplicas:
            type: integer
          replicas:
            type: integer
          maxReplicas:
            type: integer
        required:
          - minReplicas
          - replicas
          - maxReplicas

will reject a request to create this custom resource:

apiVersion: "stable.example.com/v1"


kind: CronTab


metadata:


 name: my-new-cron-object


spec:


 minReplicas: 0


 replicas: 20


 maxReplicas: 10

with the response:

The CronTab "my-new-cron-object" is invalid:
* spec: Invalid value: map[string]interface {}{"maxReplicas":10, "minReplicas":0, "replicas":20}: replicas should be smaller than or equal to maxReplicas.

x-kubernetes-validations could have multiple rules. The rule under x-kubernetes-validations represents the expression which will be evaluated by CEL. The message represents the message displayed when validation fails.

Note: You can quickly test CEL expressions in CEL Playground.

Validation rules are compiled when CRDs are created/updated. The request of CRDs create/update will fail if compilation of validation rules fail. Compilation process includes type checking as well.

Validation rules support a wide range of use cases. To get a sense of some of the capabilities, let's look at a few examples:

Validation Rule

Purpose

self.minReplicas <= self.replicas

Validate an integer field is less than or equal to another integer field

'Available' in self.stateCounts

Validate an entry with the 'Available' key exists in a map

self.set1.all(e, !(e in self.set2))

Validate that the elements of two sets are disjoint

self == oldSelf

Validate that a required field is immutable once it is set

self.created + self.ttl < self.expired

Validate that 'expired' date is after a 'create' date plus a 'ttl' duration

Validation rules are expressive and flexible. See the Validation Rules documentation to learn more about what validation rules are capable of.


CRD transition rules

Transition Rules make it possible to compare the new state against the old state of a resource in validation rules. You use transition rules to make sure that the cluster's API server does not accept invalid state transitions. A transition rule is a validation rule that references 'oldSelf'. The API server only evaluates transition rules when both an old value and new value exist.

Transition rule examples:

Transition Rule

Purpose

self == oldSelf

For a required field, make that field immutable once it is set. For an optional field, only allow transitioning from unset to set, or from set to unset.

(on parent of field) has(self.field) == has(oldSelf.field)

on field: self == oldSelf

Make a field immutable: validate that a field, even if optional, never changes after the resource is created (for a required field, the previous rule is simpler).

self.all(x, x in oldSelf)

Only allow adding items to a field that represents a set (prevent removals).

self >= oldSelf

Validate that a number is monotonically increasing.

Using the Functions Libraries

Validation rules have access to a couple different function libraries:

Examples of function libraries in use:

Validation Rule

Purpose

!(self.getDayOfWeek() in [0, 6]

Validate that a date is not a Sunday or Saturday.

isUrl(self) && url(self).getHostname() in [a.example.com', 'b.example.com']

Validate that a URL has an allowed hostname.

self.map(x, x.weight).sum() == 1

Validate that the weights of a list of objects sum to 1.

int(self.find('^[0-9]*')) < 100

Validate that a string starts with a number less than 100

self.isSorted()

Validate that a list is sorted

Resource use and limits

To prevent CEL evaluation from consuming excessive compute resources, validation rules impose some limits. These limits are based on CEL cost units, a platform and machine independent measure of execution cost. As a result, the limits are the same regardless of where they are enforced.


Estimated cost limit

CEL is, by design, non-Turing-complete in such a way that the halting problem isn’t a concern. CEL takes advantage of this design choice to include an "estimated cost" subsystem that can statically compute the worst case run time cost of any CEL expression. Validation rules are integrated with the estimated cost system and disallow CEL expressions from being included in CRDs if they have a sufficiently poor (high) estimated cost. The estimated cost limit is set quite high and typically requires an O(n2) or worse operation, across something of unbounded size, to be exceeded. Fortunately the fix is usually quite simple: because the cost system is aware of size limits declared in the CRD's schema, CRD authors can add size limits to the CRD's schema (maxItems for arrays, maxProperties for maps, maxLength for strings) to reduce the estimated cost.

Good practice:

Set maxItems, maxProperties and maxLength on all array, map (object with additionalProperties) and string types in CRD schemas! This results in lower and more accurate estimated costs and generally makes a CRD safer to use.


Runtime cost limits for CRD validation rules

In addition to the estimated cost limit, CEL keeps track of actual cost while evaluating a CEL expression and will halt execution of the expression if a limit is exceeded.

With the estimated cost limit already in place, the runtime cost limit is rarely encountered. But it is possible. For example, it might be encountered for a large resource composed entirely of a single large list and a validation rule that is either evaluated on each element in the list, or traverses the entire list.

CRD authors can ensure the runtime cost limit will not be exceeded in much the same way the estimated cost limit is avoided: by setting maxItems, maxProperties and maxLength on array, map and string types.


Adoption and Related work

This feature has been turned on by default since Kubernetes 1.25 and finally graduated to GA in Kubernetes 1.29. It raised a lot of interest and is now widely used in the Kubernetes ecosystem. We are excited to share that the Gateway API was able to replace all the validating webhook previously used with this feature.

After CEL was introduced into Kubernetes, we are excited to expand the power to multiple areas including the Admission Chain and authorization config. We will have a separate blog to introduce further.

We look forward to working with the community on the adoption of CRD Validation Rules, and hope to see this feature promoted to general availability in upcoming Kubernetes releases.


Acknowledgements

Special thanks to Joe Betz, Kermit Alexander, Ben Luddy, Jordan Liggitt, David Eads, Daniel Smith, Dr. Stefan Schimanski, Leila Jalali and everyone who contributed to CRD Validation Rules!

By Cici Huang – Software Engineer

Kubernetes CRD Validation Using CEL

Motivation

CRDs was used to support two major categories of built-in validation:

  • CRD structural schemas: Provide type checking of custom resources against schemas.
  • OpenAPIv3 validation rules: Provide regex ('pattern' property), range limits ('minimum' and 'maximum' properties) on individual fields and size limits on maps and lists ('minItems', 'maxItems').

For use cases that cannot be covered by build-in validation support:

  • Admission Webhooks: have validating admission webhook for further validation
  • Custom validators: write custom checks in several languages such as Rego

While admission webhooks do support CRDs validation, they significantly complicate the development and operability of CRDs.

To provide an self-contained, in-process validation, an inline expression language - Common Expression Language (CEL), is introduced into CRDs such that a much larger portion of validation use cases can be solved without the use of webhooks.

It is sufficiently lightweight and safe to be run directly in the kube-apiserver, has a straight-forward and unsurprising grammar, and supports pre-parsing and typechecking of expressions, allowing syntax and type errors to be caught at CRD registration time.


CRD validation rule

CRD validation rules are promoted to GA in Kubernetes 1.29 to validate custom resources based on validation rules.

Validation rules use the Common Expression Language (CEL) to validate custom resource values. Validation rules are included in CustomResourceDefinition schemas using the x-kubernetes-validations extension.

The Rule is scoped to the location of the x-kubernetes-validations extension in the schema. And self variable in the CEL expression is bound to the scoped value.

All validation rules are scoped to the current object: no cross-object or stateful validation rules are supported.

For example:

...
  openAPIV3Schema:
    type: object
    properties:
      spec:
        type: object
        x-kubernetes-validations:
          - rule: "self.minReplicas <= self.replicas"
            message: "replicas should be greater than or equal to minReplicas."
          - rule: "self.replicas <= self.maxReplicas"
            message: "replicas should be smaller than or equal to maxReplicas."
        properties:
          ...
          minReplicas:
            type: integer
          replicas:
            type: integer
          maxReplicas:
            type: integer
        required:
          - minReplicas
          - replicas
          - maxReplicas

will reject a request to create this custom resource:

apiVersion: "stable.example.com/v1"


kind: CronTab


metadata:


 name: my-new-cron-object


spec:


 minReplicas: 0


 replicas: 20


 maxReplicas: 10

with the response:

The CronTab "my-new-cron-object" is invalid:
* spec: Invalid value: map[string]interface {}{"maxReplicas":10, "minReplicas":0, "replicas":20}: replicas should be smaller than or equal to maxReplicas.

x-kubernetes-validations could have multiple rules. The rule under x-kubernetes-validations represents the expression which will be evaluated by CEL. The message represents the message displayed when validation fails.

Note: You can quickly test CEL expressions in CEL Playground.

Validation rules are compiled when CRDs are created/updated. The request of CRDs create/update will fail if compilation of validation rules fail. Compilation process includes type checking as well.

Validation rules support a wide range of use cases. To get a sense of some of the capabilities, let's look at a few examples:

Validation Rule

Purpose

self.minReplicas <= self.replicas

Validate an integer field is less than or equal to another integer field

'Available' in self.stateCounts

Validate an entry with the 'Available' key exists in a map

self.set1.all(e, !(e in self.set2))

Validate that the elements of two sets are disjoint

self == oldSelf

Validate that a required field is immutable once it is set

self.created + self.ttl < self.expired

Validate that 'expired' date is after a 'create' date plus a 'ttl' duration

Validation rules are expressive and flexible. See the Validation Rules documentation to learn more about what validation rules are capable of.


CRD transition rules

Transition Rules make it possible to compare the new state against the old state of a resource in validation rules. You use transition rules to make sure that the cluster's API server does not accept invalid state transitions. A transition rule is a validation rule that references 'oldSelf'. The API server only evaluates transition rules when both an old value and new value exist.

Transition rule examples:

Transition Rule

Purpose

self == oldSelf

For a required field, make that field immutable once it is set. For an optional field, only allow transitioning from unset to set, or from set to unset.

(on parent of field) has(self.field) == has(oldSelf.field)

on field: self == oldSelf

Make a field immutable: validate that a field, even if optional, never changes after the resource is created (for a required field, the previous rule is simpler).

self.all(x, x in oldSelf)

Only allow adding items to a field that represents a set (prevent removals).

self >= oldSelf

Validate that a number is monotonically increasing.

Using the Functions Libraries

Validation rules have access to a couple different function libraries:

Examples of function libraries in use:

Validation Rule

Purpose

!(self.getDayOfWeek() in [0, 6]

Validate that a date is not a Sunday or Saturday.

isUrl(self) && url(self).getHostname() in [a.example.com', 'b.example.com']

Validate that a URL has an allowed hostname.

self.map(x, x.weight).sum() == 1

Validate that the weights of a list of objects sum to 1.

int(self.find('^[0-9]*')) < 100

Validate that a string starts with a number less than 100

self.isSorted()

Validate that a list is sorted

Resource use and limits

To prevent CEL evaluation from consuming excessive compute resources, validation rules impose some limits. These limits are based on CEL cost units, a platform and machine independent measure of execution cost. As a result, the limits are the same regardless of where they are enforced.


Estimated cost limit

CEL is, by design, non-Turing-complete in such a way that the halting problem isn’t a concern. CEL takes advantage of this design choice to include an "estimated cost" subsystem that can statically compute the worst case run time cost of any CEL expression. Validation rules are integrated with the estimated cost system and disallow CEL expressions from being included in CRDs if they have a sufficiently poor (high) estimated cost. The estimated cost limit is set quite high and typically requires an O(n2) or worse operation, across something of unbounded size, to be exceeded. Fortunately the fix is usually quite simple: because the cost system is aware of size limits declared in the CRD's schema, CRD authors can add size limits to the CRD's schema (maxItems for arrays, maxProperties for maps, maxLength for strings) to reduce the estimated cost.

Good practice:

Set maxItems, maxProperties and maxLength on all array, map (object with additionalProperties) and string types in CRD schemas! This results in lower and more accurate estimated costs and generally makes a CRD safer to use.


Runtime cost limits for CRD validation rules

In addition to the estimated cost limit, CEL keeps track of actual cost while evaluating a CEL expression and will halt execution of the expression if a limit is exceeded.

With the estimated cost limit already in place, the runtime cost limit is rarely encountered. But it is possible. For example, it might be encountered for a large resource composed entirely of a single large list and a validation rule that is either evaluated on each element in the list, or traverses the entire list.

CRD authors can ensure the runtime cost limit will not be exceeded in much the same way the estimated cost limit is avoided: by setting maxItems, maxProperties and maxLength on array, map and string types.


Adoption and Related work

This feature has been turned on by default since Kubernetes 1.25 and finally graduated to GA in Kubernetes 1.29. It raised a lot of interest and is now widely used in the Kubernetes ecosystem. We are excited to share that the Gateway API was able to replace all the validating webhook previously used with this feature.

After CEL was introduced into Kubernetes, we are excited to expand the power to multiple areas including the Admission Chain and authorization config. We will have a separate blog to introduce further.

We look forward to working with the community on the adoption of CRD Validation Rules, and hope to see this feature promoted to general availability in upcoming Kubernetes releases.


Acknowledgements

Special thanks to Joe Betz, Kermit Alexander, Ben Luddy, Jordan Liggitt, David Eads, Daniel Smith, Dr. Stefan Schimanski, Leila Jalali and everyone who contributed to CRD Validation Rules!

By Cici Huang – Software Engineer

Open sourcing Project Guideline: A platform for computer vision accessibility technology

Two years ago we announced Project Guideline, a collaboration between Google Research and Guiding Eyes for the Blind that enabled people with visual impairments (e.g., blindness and low-vision) to walk, jog, and run independently. Using only a Google Pixel phone and headphones, Project Guideline leverages on-device machine learning (ML) to navigate users along outdoor paths marked with a painted line. The technology has been tested all over the world and even demonstrated during the opening ceremony at the Tokyo 2020 Paralympic Games.

Since the original announcement, we set out to improve Project Guideline by embedding new features, such as obstacle detection and advanced path planning, to safely and reliably navigate users through more complex scenarios (such as sharp turns and nearby pedestrians). The early version featured a simple frame-by-frame image segmentation that detected the position of the path line relative to the image frame. This was sufficient for orienting the user to the line, but provided limited information about the surrounding environment. Improving the navigation signals, such as alerts for obstacles and upcoming turns, required a much better understanding and mapping of the users’ environment. To solve these challenges, we built a platform that can be utilized for a variety of spatially-aware applications in the accessibility space and beyond.

Today, we announce the open source release of Project Guideline, making it available for anyone to use to improve upon and build new accessibility experiences. The release includes source code for the core platform, an Android application, pre-trained ML models, and a 3D simulation framework.


System design

The primary use-case is an Android application, however we wanted to be able to run, test, and debug the core logic in a variety of environments in a reproducible way. This led us to design and build the system using C++ for close integration with MediaPipe and other core libraries, while still being able to integrate with Android using the Android NDK.

Under the hood, Project Guideline uses ARCore to estimate the position and orientation of the user as they navigate the course. A segmentation model, built on the DeepLabV3+ framework, processes each camera frame to generate a binary mask of the guideline (see the previous blog post for more details). Points on the segmented guideline are then projected from image-space coordinates onto a world-space ground plane using the camera pose and lens parameters (intrinsics) provided by ARCore. Since each frame contributes a different view of the line, the world-space points are aggregated over multiple frames to build a virtual mapping of the real-world guideline. The system performs piecewise curve approximation of the guideline world-space coordinates to build a spatio-temporally consistent trajectory. This allows refinement of the estimated line as the user progresses along the path.

Project Guideline builds a 2D map of the guideline, aggregating detected points in each frame (red) to build a stateful representation (blue) as the runner progresses along the path.

A control system dynamically selects a target point on the line some distance ahead based on the user’s current position, velocity, and direction. An audio feedback signal is then given to the user to adjust their heading to coincide with the upcoming line segment. By using the runner’s velocity vector instead of camera orientation to compute the navigation signal, we eliminate noise caused by irregular camera movements common during running. We can even navigate the user back to the line while it’s out of camera view, for example if the user overshot a turn. This is possible because ARCore continues to track the pose of the camera, which can be compared to the stateful line map inferred from previous camera images.

Project Guideline also includes obstacle detection and avoidance features. An ML model is used to estimate depth from single images. To train this monocular depth model, we used SANPO, a large dataset of outdoor imagery from urban, park, and suburban environments that was curated in-house. The model is capable of detecting the depth of various obstacles, including people, vehicles, posts, and more. The depth maps are converted into 3D point clouds, similar to the line segmentation process, and used to detect the presence of obstacles along the user’s path and then alert the user through an audio signal.

Using a monocular depth ML model, Project Guideline constructs a 3D point cloud of the environment to detect and alert the user of potential obstacles along the path.

A low-latency audio system based on the AAudio API was implemented to provide the navigational sounds and cues to the user. Several sound packs are available in Project Guideline, including a spatial sound implementation using the Resonance Audio API. The sound packs were developed by a team of sound researchers and engineers at Google who designed and tested many different sound models. The sounds use a combination of panning, pitch, and spatialization to guide the user along the line. For example, a user veering to the right may hear a beeping sound in the left ear to indicate the line is to the left, with increasing frequency for a larger course correction. If the user veers further, a high-pitched warning sound may be heard to indicate the edge of the path is approaching. In addition, a clear “stop” audio cue is always available in the event the user veers too far from the line, an anomaly is detected, or the system fails to provide a navigational signal.

Project Guideline has been built specifically for Google Pixel phones with the Google Tensor chip. The Google Tensor chip enables the optimized ML models to run on-device with higher performance and lower power consumption. This is critical for providing real-time navigation instructions to the user with minimal delay. On a Pixel 8 there is a 28x latency improvement when running the depth model on the Tensor Processing Unit (TPU) instead of CPU, and 9x improvement compared to GPU.



Testing and simulation

Project Guideline includes a simulator that enables rapid testing and prototyping of the system in a virtual environment. Everything from the ML models to the audio feedback system runs natively within the simulator, giving the full Project Guideline experience without needing all the hardware and physical environment set up.

Screenshot of Project Guideline simulator.


Future direction

To launch the technology forward, WearWorks has become an early adopter and teamed up with Project Guideline to integrate their patented haptic navigation experience, utilizing haptic feedback in addition to sound to guide runners. WearWorks has been developing haptics for over 8 years, and previously empowered the first blind marathon runner to complete the NYC Marathon without sighted assistance. We hope that integrations like these will lead to new innovations and make the world a more accessible place.

The Project Guideline team is also working towards removing the painted line completely, using the latest advancements in mobile ML technology, such as the ARCore Scene Semantics API, which can identify sidewalks, buildings, and other objects in outdoor scenes. We invite the accessibility community to build upon and improve this technology while exploring new use cases in other fields.


Acknowledgements

Many people were involved in the development of Project Guideline and the technologies behind it. We’d like to thank Project Guideline team members: Dror Avalon, Phil Bayer, Ryan Burke, Lori Dooley, Song Chun Fan, Matt Hall, Amélie Jean-aimée, Dave Hawkey, Amit Pitaru, Alvin Shi, Mikhail Sirotenko, Sagar Waghmare, John Watkinson, Kimberly Wilber, Matthew Willson, Xuan Yang, Mark Zarich, Steven Clark, Jim Coursey, Josh Ellis, Tom Hoddes, Dick Lyon, Chris Mitchell, Satoru Arao, Yoojin Chung, Joe Fry, Kazuto Furuichi, Ikumi Kobayashi, Kathy Maruyama, Minh Nguyen, Alto Okamura, Yosuke Suzuki, and Bryan Tanaka. Thanks to ARCore contributors: Ryan DuToit, Abhishek Kar, and Eric Turner. Thanks to Alec Go, Jing Li, Liviu Panait, Stefano Pellegrini, Abdullah Rashwan, Lu Wang, Qifei Wang, and Fan Yang for providing ML platform support. We’d also like to thank Hartwig Adam, Tomas Izo, Rahul Sukthankar, Blaise Aguera y Arcas, and Huisheng Wang for their leadership support. Special thanks to our partners Guiding Eyes for the Blind and Achilles International.

Source: Google AI Blog


Enabling large-scale health studies for the research community

As consumer technologies like fitness trackers and mobile phones become more widely used for health-related data collection, so does the opportunity to leverage these data pathways to study and advance our understanding of medical conditions. We have previously touched upon how our work explores the use of this technology within the context of chronic diseases, in particular multiple sclerosis (MS). This effort leverages the FDA MyStudies platform, an open-source platform used to create clinical study apps, that makes it easier for anyone to run their own studies and collect good quality healthcare data, in a trusted and safe way.

Today, we describe the setup that we developed by expanding the FDA MyStudies platform and demonstrate how it can be used to set up a digital health study. We also present our exploratory research study created through this platform, called MS Signals, which consists of a symptom tracking app for MS patients. The goal for this app is twofold: 1) to ensure that the enhancements to the FDA MyStudies platform made for a more streamlined study creation experience; and 2) to understand how new data collection mechanisms can be used to revolutionize patients’ chronic disease management and tracking. We have open sourced our extension to the FDA MyStudies platform under the Apache 2.0 license to provide a resource for the community to build their own studies.


Extending the FDA MyStudies platform

The original FDA MyStudies platform allowed people to configure their own study apps, manage participants, and create separate iOS and Android apps. To simplify the study creation process and ensure increased study engagement, we made a number of accessibility changes. Some of the main improvements include: cross-platform (iOS and Android) app generation through the use of Flutter, an open source framework by Google for building multi-platform applications from a single codebase; a simplified setup, so that users can prototype their study quickly (under a day in most cases); and, most importantly, an emphasis on accessibility so that diverse patient’s voices are heard. The accessibility enhancements include changes to the underlying features of the platform and to the particular study design of the MS Signals study app.


Multi-platform support with rapid prototyping

We decided on the use of Flutter as it would be a single point that would generate both iOS and Android apps in one go, reducing the work required to support multiple platforms. Flutter also provides hot-reloading, which allows developers to build & preview features quickly. The design-system in the app takes advantage of this feature to provide a central point from which the branding & theme of the app can be changed to match the tone of a new study and previewed instantly. The demo environment in the app also utilizes this feature to allow developers to mock and preview questionnaires locally on their machines. In our experience this has been a huge time-saver in A/B testing the UX and the format and wording of questions live with clinicians.


System accessibility enhancements

To improve the accessibility of the platform for more users, we made several usability enhancements:

  1. Light & dark theme support
  2. Bold text & variable font-sizes
  3. High-contrast mode
  4. Improving user awareness of accessibility settings

Extended exposure to bright light themes can strain the eyes, so supporting dark theme features was necessary to make it easier to use the study app frequently. Some small or light text-elements are illegible to users with vision impairments, so we added 1) bold-text and support for larger font-sizes and 2) high-contrast color-schemes. To ensure that accessibility settings are easy to find, we placed an introductory one-time screen that was presented during the app’s first launch, which would directly take users to their system accessibility settings.


Study accessibility enhancements

To make the study itself easier to interact with and reduce cognitive overload, we made the following changes:

  1. Clarified the onboarding process
  2. Improved design for questionnaires

First, we clarified the on-boarding process by presenting users with a list of required steps when they first open the app in order to reduce confusion and participant drop-off.

The original questionnaire design in the app presented each question in a card format, which utilizes part of the screen for shadows and depth effects of the card. In many situations, this is a pleasant aesthetic, but in apps where accessibility is priority, these visual elements restrict the space available on the screen. Thus, when more accessible, larger font-sizes are used there are more frequent word breaks, which reduces readability. We fixed this simply by removing the card design elements and instead using the entire screen, allowing for better visuals with larger font-sizes.


The MS Signals prototype study

To test the usability of these changes, we used our redesigned platform to create a prototype study app called MS Signals, which uses surveys to gather information about a participant’s MS-related symptoms.

MS Signals app screenshots.

MS Studies app design

As a first step, before entering any study information, participants are asked to complete an eligibility and study comprehension questionnaire to ensure that they have read through the potentially lengthy terms of study participation. This might include, for example, questions like "In what country is the study available?" or “Can you withdraw from the study?" A section like this is common in most health studies, and it tends to be the first drop-off point for participants.

To minimize study drop-off at this early stage, we kept the eligibility test brief and reflected correct answers for the comprehension test back to the participants. This helps minimize the number of times a user may need to go through the initial eligibility questionnaire and ensures that the important aspects of the study protocol are made clear to them.

After successful enrollment, participants are taken to the main app view, which consists of three pages:

  • Activities:
    This page lists the questionnaires available to the participant and is where the majority of their time is spent. The questionnaires vary in frequency — some are one-time surveys created to gather medical history, while others are repeated daily, weekly or monthly, depending on the symptom or area they are exploring. For the one-time survey we provide a counter above each question to signal to users how far they have come and how many questions are left, similar to the questionnaire during the eligibility and comprehension step.
  • Dashboard:
    To ensure that participants get something back in return for the information they enter during a study, the Dashboard area presents a summary of their responses in graph or pie chart form. Participants could potentially show this data to their care provider as a summary of their condition over the last 6 months, an improvement over the traditional pen and paper methods that many employ today.
  • Resources:
    A set of useful links, help articles and common questions related to MS.

Questionnaire design

Since needing to frequently input data can lead to cognitive overload, participant drop off, and bad data quality, we reduced the burden in two ways:

  1. We break down large questionnaires into smaller ones, resulting in 6 daily surveys, containing 3–5 questions each, where each question is multiple choice and related to a single symptom. This way we cover a total of 20 major symptoms, and present them in a similar way to how a clinician would ask these questions in an in-clinic setting.
  2. We ensure previously entered information is readily available in the app, along with the time of the entry.

In designing the survey content, we collaborated closely with experienced clinicians and researchers to finalize the wording and layout. While studies in this field typically use the Likert scale to gather symptom information, we defined a more intuitive verbose scale to provide better experience for participants tracking their disease and the clinicians or researchers viewing the disease history. For example, in the case of vision issues, rather than asking participants to rate their symptoms on a scale from 1 to 10, we instead present a multiple choice question where we detail common vision problems that they may be experiencing.

This verbose scale helps patients track their symptoms more accurately by including context that helps them more clearly define their symptoms. This approach also allows researchers to answer questions that go beyond symptom correlation. For example, for vision issues, data collected using the verbose scale would reveal to researchers whether nystagmus is more prominent in patients with MS compared to double vision.

Side-by-side comparison with a Likert scale on the left, and a Verbose scale on the right.

Focusing on accessibility

Mobile-based studies can often present additional challenges for participants with chronic conditions: the text can be hard to read, the color contrast could make it difficult to see certain bits of information, or it may be challenging to scroll through pages. This may result in participant drop off, which, in turn, could yield a biased dataset if the people who are experiencing more advanced forms of a disease are unable to provide data.

In order to prevent such issues, we include the following accessibility features:

  • Throughout, we employ color blind accessible color schemes. This includes improving the contrast between crucial text and important additional information, which might otherwise be presented in a smaller font and a faded text color.
  • We reduced the amount of movement required to access crucial controls by placing all buttons close to the bottom of the page and ensuring that pop-ups are controllable from the bottom part of the screen.

To test the accessibility of MS Signals, we collaborated with the National MS Society to recruit participants for a user experience study. For this, a call for participation was sent out by the Society to their members, and 9 respondents were asked to test out the various app flows. The majority indicated that they would like a better way than their current method to track their symptom data, that they considered MS Signals to be a unique and valuable tool that would enhance the accuracy of their symptom tracking, and that they would want to share the dashboard view with their healthcare providers.


Next steps

We want to encourage everyone to make use of the open source platform to start setting up and running their own studies. We are working on creating a set of standard study templates, which would incorporate what we learned from above, and we hope to release those soon. For any issues, comments or questions please check out our resource page.

Source: Google AI Blog


Google Summer of Code 2024 Celebrating our 20th Year!

Google Summer of Code (GSoC) will be celebrating its 20th anniversary with our upcoming 2024 program. Over the past 19 years we have welcomed over 19,000 new contributors to open source through the program under the guidance of 19,000+ mentors from over 800 open source organizations in a wide range of fields.

We are honored and thrilled to keep GSoC’s mission of bringing new contributors into open source communities alive for 20 years. Open source communities thrive when they have new contributors with fresh, exciting ideas and the renewed energy they bring to these communities. Mentorship is a vital way to keep these new contributors coming into the open source ecosystem where they can see collaboration at its finest from their community members all across the world, all with different backgrounds and skills working towards a common goal.

With just over a week left in the 2023 program, we have had one of our most enthusiastic groups of GSoC contributors with 841 GSoC contributors completing their projects with 159 open source organizations. There are 68 GSoC contributors wrapping up their projects. A GSoC 2023 wrap up blog post will be coming late this month with stats and quotes from our contributors and mentors.

Our contributors and mentors have given us invaluable feedback and we are making one adjustment around project time commitment/project scope. For the 2024 program, there will be three options for project scope: medium at ~175 hours, large at ~350 hours and a new size: small at ~90 hours. The idea is to remove the barrier of available time that many potential contributors have and open the program to people who want to learn about open source development but can’t dedicate all or even half of their summer to the program.

As a reminder, GSoC 2024 is open to students and to beginners in open source software development that are over the age of 18 at time of registration.


Interested in applying to the Google Summer of Code Program?


Open Source Organizations

Check out our website to learn what it means to be a participating mentor organization. Watch the GSoC Org Highlight videos and get inspired about projects that contributors have worked on in the past.

Take a look through our mentor guide to learn about what it means to be part of Google Summer of Code, how to prepare your community, gather excited mentors, create achievable project ideas, and tips for applying. We welcome all types of open source organizations and encourage you to apply—it is especially exciting for us to welcome new orgs into the program and we hope you are inspired to get involved with our growing community. In 2024, we look forward to accepting more artificial intelligence/machine learning open source organizations.


Want to be a GSoC Contributor?

New to open source development or a student? Eager to gain experience on real-world software development projects used by thousands of people? It is never too early to start thinking about what kind of open source organization you’d like to learn more about and how the application process works!

Watch our ‘Introduction to GSoC’ video to see a quick overview of the program. Read through our contributor guide for important tips from past participants on preparing your proposal, what to think about if you wish to apply for the program, and explore our website for other resources. Continue to check for more information about the 2024 program once the 2023 program ends later this month.

Please share information about the 2024 GSoC program with your friends, family, colleagues, and anyone you think may be interested in joining our community. We are excited to welcome new contributors and mentoring organizations to celebrate the 20th year of Google Summer of Code!

By Stephanie Taylor – Program Manager, Google Open Source Programs Office

Open source PDKs joining the Linux Foundation’s CHIPS Alliance

In November 2020, we launched our Open Source MPW Shuttle Program to make it easier for researchers and developers to build custom silicon and to enable a thriving ecosystem around open source hardware. Working with our partner, SkyWater Technology, we released the first foundry-supported open source process design kit (PDK) for their 130nm mixed-signal CMOS technology (SKY130), then welcomed GlobalFoundries as a partner with the release of an open source PDK for their 180nm MCU process (GF180MCU).

Then, to give researchers and developers a way to validate and prove their designs made with the PDKs, we partnered with Efabless to fund a series of no-cost manufacturing shuttles for open source designs. In support of this program, Efabless released an end-to-end RTL to GDS design stack called OpenLane that is open source, freely available, and fully supported by their manufacturing platform. OpenLane is now being maintained as part of the OpenROAD Project. When combined with open source PDKs, a design’s verification results can now be freely shared and easily replicated by other researchers and developers, which has enabled a new collaborative model to evaluate and iterate on ideas.

Pictures of a full wafer from the first SKY130 shuttle, a tray of bare dies, and a project bring-up from SKY130 MPW-2.
Pictures of a full wafer from the first SKY130 shuttle, a tray of bare dies, and a project bring-up from SKY130 MPW-2.

Results

The Open Source MPW Shuttle Program has been a success and we’re excited by the growth we’ve seen in this ecosystem. Since its inception, the program has launched eight shuttle runs on SKY130 and an initial test run on GF180MCU, the last of which are being packaged now. With 40 slots per shuttle, we’ve manufactured 360 designs out of over 600 submissions from 19 countries around the world.Graph showing number of designs submitted to Open Source MPW shuttles across versions 1 through 8

The program has also fostered collaboration between the open source community and Google. We’ve learned valuable lessons from designers who participated in the program giving feedback and filing hundreds of bugs and pull requests. These have helped to improve each successive run and to make the platforms and tools more feature-complete.

Elsewhere in the ecosystem, we’re excited by the release of new open source PDKs from foundries like the 130nm BiCMOS process from IHP, the SOI-CMOS PDK from Minimal Fab, and also by the publication of new semiconductor research using open source PDKs. Multiple universities have incorporated open source PDKs into their curriculum, and last year, NIST adopted the SKY130 PDK to migrate their existing planarized wafer designs for nanotechnology research.

Announcing GF180MCU MPW-1

We’ve just launched a new MPW-1 shuttle for GF180MCU in our partnership with Efabless. Submissions will be accepted until December 11th, targeting delivery in early 2024.

Graph showing number of designs submitted to Open Source MPW shuttles across versions 1 through 8

Next Steps

The open source silicon ecosystem is continuing to grow and evolve. After GF180 MPW-1 concludes, the open source SKY130 and GF180MCU PDKs will be joining the Linux Foundation’s CHIPS Alliance under a new working group to foster continued open source PDK development, and we expect future PDK releases will join as well. This will help with the transition to a broader governance model that enables more participation by industry, academia and the community, opening the possibility for larger shuttle programs with multiple sponsors as the ecosystem continues to grow.

Low-cost manufacturing options will continue to be available through this transition, both through commercial shuttle offerings like Efabless’ ChipIgnite program and also through educational efforts like TinyTapeout.

Thank you

Lastly, we’d like to thank the open source community. Your feedback has been invaluable to the success so far, and has helped to improve the tools and documentation to be more user-friendly. We have also seen contributions from the community in the form of hundreds of new and fully manufacturable designs, which have helped to expand the range and capabilities of open source hardware available to the community. We look forward to continuing partnerships to build a thriving ecosystem around open source silicon.

By Aaron Cunningham – Technical Program Manager, Core Hardware Tools

Project Open Se Cura Open Source Announcement

As AI permeates our lives, developing secure, scalable, and efficient compute systems is crucial for safe and trustworthy AI experiences. However, hardware advancements lag behind machine learning (ML) models and software development, hindering the deployment of secure and efficient full-stack systems. Furthermore, consumer demand for smaller devices outpaces battery technology advancements, constraining the power envelope and limiting the capabilities of deployable AI systems.

Towards these challenges, Google is launching Project Open Se Cura, an open-source framework to accelerate the development of secure, scalable, transparent and efficient AI systems. Previously known internally as Project Sparrow, Project Open Se Cura is a testament to our commitment to open-source development. Our goal with Open Se Cura is to evolve a set of open-source design tools and IP libraries that will accelerate the development of full-stack systems with ML workloads through co-design and development. This will enable us to better center system designs around security, efficiency, and scalability, empowering the next generation of AI experiences.

This work was developed in close collaboration with our partners such as lowRISC, Antmicro and VeriSilicon for key parts of the tooling and infrastructure. lowRISC has contributed a transparent root of trust, along with development and integration tools, ensuring a secure foundation for the project. Antmicro has contributed system simulation tooling with Renode and expertise in open-source system-level software. VeriSilicon has contributed expertise in IP design, silicon design, BSP development and commercialization. And together, we’ve used these tools for the first time to extend our IP library with secure ML capabilities and generated a proof of concept for a low-power AI system.

To accelerate open development and transparency of AI system design, we’re releasing our entire code base for developers to consider -- you can get started by following the instructions in the README.md.

Going forward, we’ll continue to evolve Open Se Cura in the open and seek to onboard additional partners, such as Cambridge University (for CHERI innovations) and the University of Michigan (for low-power and generative AI). We are excited to explore what we can build with these new tools and hope the open source community will join us and contribute.

We look forward to working with the open-source community to drive new innovations and new AI experiences with secure, scalable and efficient ML systems.

Join us and our partners, Brian Murray from VeriSilicon and Michael Gielda from Antmicro, for a technical deep dive at the CHIPS Alliance Technology Update. This hybrid workshop event will be held from 9:10 to 10:00 AM PST on November 9th at Google's Sunnyvale office.

By Kai Yick – Research Cerebra Open Source team

The Story of Gateway API

Earlier this week, Gateway API v1.0 was released, marking the significant milestone of General Availability. This Kubernetes API represents the future of load balancing, routing, and service mesh configuration. It already has more than 20 implementations, including GKE and Istio. In this post, we’ll take a look back at some of the key moments that led to this point, starting with the proposal that started it all.

Initial Proposal

The core ideas for this new API were initially proposed by Bowei Du (Software Engineer, Google) at KubeCon San Diego as “Ingress v2”, the next generation of the Ingress API for Kubernetes. This proposal came as the shortcomings of the original Ingress API were becoming apparent. The community had started to develop alternative APIs, notably including Istio’s VirtualService API and Contour’s IngressRoute API. We had reached an inflection point where the Kubernetes ecosystem was diverging, and Bowei believed it was important to develop a new standard that would expose all these advanced features in a portable way.

The initial proposal for this API provided a great foundation to build on. Specifically, this proposal focused on a role-oriented model that split capabilities into resources that were aligned with 3 different personas. It emphasized both expressiveness and extensibility as core design principles. This early sketch from that proposal closely resembles the API today:

Sketch of early proposal for Ingress API

One of the key limitations of the Ingress API was that it was designed with the lowest common denominator in mind: every feature included in the API needed to be implemented by everyone. This meant that the API surface was very small, and implementations that wanted to support more advanced features either relied on long lists of implementation-specific annotations or developing new custom APIs.

Bowei proposed that this new API could introduce a concept of “support levels.” This would allow us to add features to the API even if not every implementation could support them, for example “Extended” features would be fully portable if they were supported.Diagram showing proposed Custom API support levels

Evolution of the API

Since that initial proposal, the API has evolved significantly, benefiting from the expertise of many in the community. Gateway has been referred to as the “most collaborative API in Kubernetes history” due to the hundreds of contributors representing dozens of companies that have helped refine the API over the years.

One of the things that makes this API unique is that it is built on top of Custom Resource Definitions (CRDs). This has meant that Gateway API is developed and released outside of the main Kubernetes project, enabling broader collaboration and shorter feedback loops. For example, each new release of this API supports the 5 most recent versions of Kubernetes, covering the vast majority of clusters in use today. So, instead of waiting until you can upgrade to the latest version of Kubernetes, most will be able to try out these APIs close to the time they’re released.

As the first official Kubernetes API to take this approach, it has developed several unique concepts along the way:

GEPs

Similar to Kubernetes Enhancement Proposals (KEPs), Gateway Enhancement Proposals provide a streamlined approach for proposing significant new enhancements to Gateway API. As the API grew and attracted more contributors, it became critical to have a better way to document key design decisions. The concept of GEPs was initially proposed by Bowei in 2021.

More than 30 of these have already merged, with many more in progress right now. This pattern has been invaluable in keeping track of when and why key design decisions were made. All key parts of the API now have GEPs documenting when and why they were proposed, along with alternatives considered.

Release Channels

In 2021 we proposed a simplified approach to versioning that would introduce the concept of release channels, our own version of Kubernetes’ “feature gates”, which denote the stability of individual fields and features.

All new resources, fields, and features start in the “Experimental” release channel. As the name implies, this channel provides no stability guarantees and can include breaking changes to enable us to iterate more quickly on APIs.

As these experimental APIs stabilize, individual resources, fields, and features can graduate to the “Standard” release channel when they meet predefined graduation criteria. These two release channels enable us to both provide a stable and predictable API with the “Standard” release channel while still iterating on experimental concepts with the “Experimental” release channel.

Conformance Tests

We added the first conformance tests in 2022, before this API reached beta, and since then these tests have become a key part of every new feature in Gateway API, ensuring that implementations were truly providing a portable experience. Before a feature can graduate to the “Standard” release channel, thorough conformance tests need to be developed, and multiple implementations need to pass them.

Service Mesh Support

Earlier this year, mesh support launched its “Experimental” version, marking the first time a Kubernetes API has ever officially underpinned the concept of Service Mesh. In 2022, key Service Mesh projects came together to form the GAMMA initiative (Gateway API for Mesh Management and Administration). The core idea was that the Gateway API was sufficiently modular that the Routing and Policy layers could be used for both mesh and ingress use cases.

Trying it Out

Gateway API enables great new features on GKE, such as advanced multi-cluster routing. Yesterday GKE announced GA support for multi-cluster Gateways. In the coming weeks, GKE will also be rolling out the v1.0 CRDs for all customers that have enabled Gateway API in their clusters. In the meantime, you can access all of the same features with the v1beta1 CRDs already supported by GKE. For more information on how to get started with the Gateway API on GKE, refer to the GKE Gateway documentation.

If you’re interested in Gateway API’s support for Service Mesh, you can try it out with Anthos Service Mesh.

Alternatively, if you’d like to use this API with another implementation, refer to the open source project’s Getting Started documentation.

By Rob Scott – GKE Networking

Open source and CI-driven RTL testing and verification for Caliptra’s RISC-V VeeR core

As part of CHIPS Alliance’s mission to enable a software-driven approach to silicon, Google, Antmicro and other CHIPS members have been developing and improving a growing number of open source tools to enable effective, CI-driven silicon development.

Fully reproducible and scalable workflows based on open source tooling are especially beneficial for efforts spanning across multiple industrial and academic actors such as Caliptra, a Root of Trust project driven by Google, AMD, NVIDIA and Microsoft which joined CHIPS in order to host the ongoing development and provide the necessary structure, working environment and support for the reference implementation of the standard, originally hosted by Open Compute Project.

In this blog post, we describe Antmicro and Google’s collaborative effort focused on introducing a Continuous Integration (CI) based code quality checks, code indexing, coverage and functional testing pipeline into the RISC-V VeeR core family, as used within the Caliptra project.

CI-driven testing and verification for Caliptra/VeeR
Open source RTL CI testing and verification for Caliptra/VeeR

Caliptra and VeeR

VeeR (Very Efficient & Elegant RISC-V) is an open source production-grade RISC-V core family hosted by CHIPS Alliance and comes in three variants:

Caliptra’s hardware block structure with an EL2 VeeR CPU core includes the following elements:

CI-driven testing for Veer flow diagram
Caliptra’s hardware block structure

As can be seen in the diagram, VeeR EL2 plays a central role in the implementation and, while it is a mature and well-tested technology, keeping both the core itself and its integration with Caliptra consistently tested is important.

Advanced code processing with Verible and Kythe

Many of Antmicro’s efforts focus around building not only the end products but the scalable CI solutions for collaborative hardware development environments that power them. Caliptra’s needs for establishing a more open process tie in perfectly with Antmicro’s and all of CHIPS Alliance’s open source-based approach to tooling.

One of the core parts of this effort involved Verible, an open source SystemVerilog parser developed by Google in collaboration with Antmicro within CHIPS Alliance, offering a number of code processing functionalities, including linting, formatting, indexing and producing a Kythe schema. Verible comes with a Language Server Protocol which enables integration with popular text editors such as VS Code, Vim, Neovim, Emacs, Sublime, Kakoune and Kate, described in detail in a separate article on Antmicro's blog.

Antmicro’s work for Google around Caliptra involved [adding the Verible formatter to the VeeR CI which marks non-compliant formatting changes and uses the reviewdog bot to add comments in the Pull Request Discussion with suggested fixes. Furthermore, we added a Verible linting Action that helps developers maintain good coding practices by providing lint rules for continuous validation of the code, before it even reaches the compilation phase. Notably, the provided lint rules are flexible and can be adjusted based on the project's requirements, or even turned off completely through creation of a waiver-file or by an inline directive.

CI-driven testing for VeeR lint error
Verible linting example

Thanks to Verible’s ability to output a Kythe schema, besides linting and formatting code changes we can also provide an indexed overview of the entire codebase, viewable online. The Kythe Verible Indexer, using Verible Indexing Action, enables the user to select multiple repositories to create a set of indexed webpages. The source code is available in the Verible Indexer GitHub repo.

CI-driven testing for Veer: Verible Indexer landing page
Verible Indexer landing page

The workflow also checks if a newer revision is available for any of the defined repositories and, if needed, performs indexing. The indexed code browser webpages were deployed for Cores-VeeR-EL2 and Caliptra-rtl.

Example from Caliptra-RTL Code Navigator demonstrating visual representation of the project structure (left pane), code browser (right pane) and usage of active references (bottom pane)
Example from Caliptra-RTL Code Navigator demonstrating visual representation of the project structure (left pane), code browser (right pane) and usage of active references (bottom pane)

Putting riscv-dv to use

The riscv-dv framework is another tool hosted by CHIPS Alliance helping address the complexities of SoC design and verification. It is an SV/UVM based open source instruction generator for RISC-V processors, originally developed for Google’s own needs but currently in use by a wide array of organizations and companies working with verification of RISC-V cores.

The riscv-dv framework generates random instruction chains to exercise certain core features. These instructions are then simultaneously executed by the core (through RTL simulation) and by a reference RISC-V ISS (instruction set simulator), for example Spike or Renode, Antmicro’s open source simulation framework.

Core states of both are then compared after each executed instruction and an error is reported in case of a mismatch.

CI-driven testing for VeeR risc-dv flow diagram
riscv-dv flow

While working on the Caliptra project, Antmicro introduced the riscv-dv framework for testing the VeeR-EL2 core as well as a GH Actions CI flow which builds or downloads all the dependencies (Verilator, Spike, VeeR-ISS and Renode) and runs the tests. For the purpose of using riscv-dv with VeeR we had to write a VeeR-specific execution trace log parser. The task of this parser is to translate the log to the format understandable by the riscv-dv framework.

As an interesting detail, VeeR implements division (div) and remainder (rem) instructions in a way it delegates the calculations to the division logic and proceeds with the execution of the program. Once the division core ends, the result is written back to the div/rem instruction result register. This flow takes into account the situation where any instruction following div/rem requires the div/rem result. In such cases the pipeline is stalled until the result is available. If any instruction following div/rem overwrites the result register before division logic finishes, the division operation is canceled.

To handle the case where the division results are available after a few other instructions were executed we’ve developed a lazy parsing method of the VeeR trace log to be able to catch the result register update even if it is not immediate. The second case - cancellation of the division calculation has been handled by adding a code post-processing script. It can detect a situation where a cancellation would happen and prevent it by injecting a number of the NOP instructions (allowing the division core to finish).

Custom GitHub Actions runners for greater scalability and more flexibility to mix tools

Much like a large part of the industry, the Caliptra project uses Universal Verification Methodology (UVM) as its verification methodology. While Antmicro’s ongoing work on enabling fully open source UVM support in Verilator should ultimately enable completely open source verification, today UVM testbenches or tools like RISC-V DV cannot be run using open source tools only.

Fortunately, this problem already has a solution, also developed within the CHIPS Alliance - custom GitHub Actions Runners that are already in use by a large number of CHIPS projects.

A custom runner setup, currently in development for Caliptra, allows mixing and matching open and closed source tools for CI testing purposes, exposing only the results (such as pass/fail or coverage) with fine-grained control.

What is more, given that RTL design testing and verification of RISC-V based cores and SoCs often require long, memory-consuming and computationally demanding simulations, the custom runners will play another very important role in Caliptra. While GitHub is the obvious choice for hosting the reference RTL, the processing power and throughput of the CI machines available in GitHub Actions is simply not enough to cover the needs of simulation of complex designs, especially in a highly dynamic, collaborative environment with lots of CI angles.

In order to enable public-facing yet secure CI, and improve the flexibility and scalability of Caliptra’s/VeeR’s pipelines, the custom runners will be deployed for the respective repositories. This setup will enable us to precisely select machines to be used for specific workloads (i.e. the architecture, virtual CPU count, memory size and disk space) but also to use tools stored on an external cloud disk that can be attached to a virtual machine running the job workload.

Seeding other verification methods

The Caliptra SoC is meant as a macro for use in a variety of chip designs, big and small. Various teams adopting Caliptra/VeeR as their Root of Trust solution will need to plug it into a larger ecosystem of tools used in their organization (of course hopefully using Caliptra as a good reference and role model).

As part of the project, on top of the original Caliptra test suite we implemented more specific tests around the VeeR integration in cocotb, a co-simulation testbench library that enables connecting Python coroutines with your HDL simulator of choice. We prepared a cocotb testbench that is able to not only run programs from the generator, but also apply dedicated stimuli and monitor the results in a Python coroutine.

Furthermore, for projects who prefer a more UVM-like testing methodology but need an open source option today, we also provide some example tests using pyuvm, a Pythonic library that mirrors the industry accepted SystemVerilog implementation. We have implemented a minimal UVM Agent for the programmable interrupt controller of the VeeR-EL2 Core, which will be used to verify handling of the interrupt service routines triggered by external or local-to-core timer interrupts. The verification environment is expected to grow as more test cases could be added, covering the DMA controller, close coupled memory buses or the debug interface.

For system level tests we decided to connect to an interactive simulation of the complete design via JTAG with commonly used tools: Open On-Chip Debugger OpenOCD and the GNU Project Debugger (GDB). The simulation exposes a virtual JTAG port, which is used to establish a connection with OpenOCD. Then the OpenOCD instance connects to the GNU debugger. Finally, test scripts are run in GDB, which verify core registers content, memory access and peripheral accesses.

With this testing methodology we exposed an actual problem in the design which prevented accessing system peripherals via JTAG. As it turned out the issue was caused by the side AHB bus of the debug core being disconnected.

CI-driven testing for veer waveform
Incorrect waveform - no system bus activity

Once a connection of the side bus had been made it became possible to access all the peripherals. A 2-to-1 AHB multiplexer was used to join the system and side AHB master ports and forward requests to the peripherals.

CI-driven testing for veer waveform pass
Correct waveform - correct system bus activity

To verify the effectiveness of all kinds of tests, both ISS and RTL level, and help ensure that all design states are properly tested, we use coverage analysis. While open source tools and frameworks have some support for gathering and presenting these metrics, e.g. Verilator supports line, toggle and functional coverage, some additional work needs to be done to integrate all of those and present them in a comprehensive, visual form, which will be part of our future efforts.

Transparent and open source-driven hardware ecosystem

In addition to the efforts described in this article, there are other interesting developments within CHIPS Alliance, including improving Verilator to better handle large designs and verification tasks, which are helping bring more open source-driven development and verification solutions to the Caliptra project and the entire open hardware ecosystem.

To learn more about the Caliptra project, watch a recording from a joint talk by Google and Antmicro given at this year’s RISC-V Summit Europe. You can also join the contributors at the upcoming 2023 OCP Global Summit for several talks about the latest developments in the project and future plans.

If you are interested in the work of CHIPS Alliance, keep an eye out for updates during their next Technology Update coinciding with the RISC-V Summit this Fall.

By Michael Gielda – Antmicro

Android and RISC-V: What you need to know to be ready

Android is an open source operating system that is freely available to port to many devices and architectures. As such it supports many different device types and CPU architectures. We’re excited to be adding a new one to that list - RISC-V.

RISC-V is a free and open instruction set architecture (ISA), bringing the same spirit of industry-wide collaboration and innovation that we see in software around open source to the hardware ecosystem. Invented 10 years ago at the University of California, Berkeley, RISC-V has seen rapid adoption in embedded and microcontroller spaces, and in recent years has expanded into accelerators, servers, and mobile computing.

In November of 2022, we announced at the RISC-V Summit that we were accepting patches for RISC-V:

The latest update that we have is that now not only are we accepting patches, but we have begun to mature support for RISC-V in Android. RISC-V is a modular ISA, meaning that there are a large number of optional extensions. We have also determined an initial set that we feel is critical to ensure that any CPU running RISC-V will have all of the features we expect to achieve high performance. This set includes the rva22 profile as well as the vector and vector crypto extensions. This update was provided at the RISC-V summit in Europe:

You can build, test, and run the Android support for RISC-V on your own machine as well now! Just like other platform targets in AOSP, you can use the Cuttlefish Virtual Device support:

$ lunch aosp_cf_riscv64_phone-userdebug $ m -j $ launch_cvd -cpus=8 -memory_mb=8192

Then, you can use vncviewer to connect to the running device and interact.

Moving image of vncviewer running on an Android device

At this time, these patches will support building and running a basic Android Open Source Project experience, but are not yet fully optimized. For example, work on a fully optimized backend for the Android Runtime (ART) is still a work in progress. Additionally, AOSP, our external projects, and compilers haven’t generated fully optimized, reduced code that also takes advantage of the latest ratified extensions, such as the one for vectors. However, we believe that it is ready to allow experimentation and collaboration.

Later this year, we expect to have the NDK ABI finalized and canary builds available on Android’s public CI soon and RISC-V on x86-64 & ARM64 available for easier testing of riscv64 Android applications on a host machine. By 2024, the plan is to have emulators available publicly, with a full feature set to test applications for various device form factors! As recently announced in our collaboration with Qualcomm, we expect wearables to be the first form factor available.

However, just porting the Android operating system itself is not enough! We are working with the community and RISE (RISC-V Software Ecosystem). The RISE Project has been established to provide a way to accelerate the availability of software for high-performance and power-efficient RISC-V processor cores running high-level operating systems. That includes not only Android, but also Linux and other operating systems across a variety of application domains, including high-performance computing. The RISE Project includes members from Andes, Google, Intel, Imagination Technologies, MediaTek, Nvidia, Qualcomm Technologies, Red Hat, Rivos, Samsung, SiFive, T-Head, and Ventana.

Google is also continuing and expanding our strong investments at RISC-V International, even beyond our long-standing Premium membership and board participation. We also have many other contributors in key roles on horizontal committees, working groups, and technical committees to ensure that specifications are rapidly being designed and ratified to benefit not only Android but also many other use cases.

Android's support for RISC-V is dependent on a wide range of contributions from toolchain to basic support libraries. We are very appreciative of the ongoing efforts which requires countless projects to support RISC-V build configurations and quality implementations. If you are interested in contributing please visit the following resources:

  • https://github.com/google/android-riscv64 for detailed information on how to build and test the RISC-V support in Android, list of known issues and opportunities to contribute to AOSP at source.android.com and toolchain projects and support libraries.
  • Subscribe to RISC-V Android SIG mailing list or join directly, if your organization is a member of RISC-V International to stay tuned in to progress and offer your suggestions and feedback.

Make sure to stay tuned as we look into ways to make it as easy for Android developers writing native to target new platforms as it is for our Java and Kotlin developers!

Planning to head to the RISC-V International Summit in November? Find us there– we’ll be hosting a Community Collaboration Breakfast on Wednesday morning! Not attending the conference but interested? Learn more and register here.

By Lars Bergstrom – Android Platform Programming Languages & Greg Simon - Google Low-level Operating System