Author Archives: Google Developers

How We Made the CES 2024 AR Experience: Android Virtual Guide, powered by Geospatial Creator

Posted by Kira Rich – Senior Product Marketing Manager, AR and Bradford Lee – Product Marketing Manager, AR

Navigating a large-scale convention like CES can be overwhelming. To enhance the attendee experience, we've created a 360° event-scale augmented reality (AR) experience in our Google booth. Our friendly Android Bot served as a digital guide, providing:

  • Seamless wayfinding within our booth, letting you know about the must try demos
  • Delightful content, only possible with AR, like replacing the Las Vegas Convention Center facade with our Generative AI Wallpapers or designing an interactive version of Android on Sphere for those who missed it in real life
  • Helpful navigation tips and quick directions to transportation hubs (Monorail, shuttle buses)

In partnership with Left Field Labs and Adobe, we used Google’s latest AR technologies to inspire developers, creators, and brands on how to elevate the conference experience for attendees. Here’s a behind the scenes look at how we used Geospatial Creator, powered by ARCore and Photorealistic 3D Tiles from Google Maps Platform, to promote the power and usefulness of Google on Android.

Moving image showing end-to-end experience
Using Google’s Geospatial Creator, we helped attendees navigate CES with the Android Bot as a virtual guide, providing helpful and delightful immersive tips on what to experience in our Google Booth.


Tools We Used


Geospatial Creator in Adobe Aero Pre-Release

Geospatial Creator in Adobe Aero enables creators and developers to easily visualize where in the real-world they want to place their digital content, similar to how Google Earth visualizes the world. With Geospatial Creator, we were able to bring up Las Vegas Convention Center in Photorealistic 3D Tiles from Google Maps Platform and understand the surroundings of where the Google Booth would be placed. In this case, the booth did not exist in the Photorealistic 3D Tiles because it was a temporary build for the conference. However, by utilizing the 3D model of the booth and the coordinates of where it would be built, we were able to easily estimate and visualize the booth inside of Adobe Aero and build the experience around it seamlessly, including anchoring points for the digital content and the best attendee viewing points for the experience.


"At CES 2024, the Android AR experience, created in partnership with the Google AR, Android, and Adobe teams, brought smiles and excitement to attendees - ultimately that's what it's all about. The experience not only showcased the amazing potential of app-less AR with Geospatial Creator, but also demonstrated its practical applications in enhancing event navigation and engagement, all accessible with a simple QR scan." 
– Yann Caloghiris, Executive Creative Director at Left Field Labs
Moving image of developer timelapse
Adobe Aero provided us with an easy way to visualize and anchor the AR experience around the 3D model of the Google Booth at the Las Vegas Convention Center.

With Geospatial Creator, we had multiple advantages for designing the experience:

  • Rapid iteration with live previews of 3D assets and high fidelity visualization of the location with Photorealistic 3D Tiles from Google Maps Platform were crucial for building a location-based, AR experience without having to be there physically. 
  • Easy selection of the Las Vegas Convention Center and robust previews of the environment, as you would navigate in Google Earth, helped us visualize and develop the AR experience with precision and alignment to the real world location.

In addition, Google Street View imagery generated a panoramic skybox, which helped visualize the sight lines in Cinema 4D for storyboards. We also imported this and Photorealistic 3D Tiles from Google Maps Platform into Unreal Engine to visualize occlusion models at real world scale.

In Adobe Aero, we did the final assembly of all 3D assets and created all interactive behaviors in the experience. We also used it for animating simpler navigational elements, like the info panel assets in the booth.

AR development was primarily done with Geospatial Creator in Adobe Aero. Supplementary tools, including Unreal Engine and Autodesk Maya, were used to bring the experience to life.

Adobe Aero also supports Google Play Instant apps and App Clips1, which means attendees did not have to download an app to access the experience. They simply scanned a QR code at the booth and launched directly into the experience, which proved to be ideal for onboarding users and reducing friction especially at a busy event like CES.

Unreal Engine was used to bring in the Photorealistic 3D Tiles, allowing them to build the 3D animated Android Bot that really interacted closely with the surrounding environment. This approach was crucial for previews of the experience, allowing us to understand sight lines and where to best locate content for optimal viewing from the Google booth.

Autodesk Maya was used to create the Android Bot character, environmental masks, and additional 3D props for the different scenes in the experience. It was also used for authoring the final materials.

Babylon exporter was used for exporting from Autodesk Maya to glTF format for importing into Adobe Aero.

Figma was used for designing flat user interface elements that could be easily imported into Adobe Aero.

Cinema 4D was used for additional visualization and promotional shots, which helped with stakeholder alignment during the development of the experience.



Designing the experience

During the design phase, we envisioned the AR experience to have multiple interactions, so attendees could experience the delight of seeing precise and robust AR elements blended into the real world around them. In addition, they could experience the helpfulness of contextual information embedded into the real objects around them, providing the right information at the right time.

Image of Creative storyboard
To make the AR experience more engaging for attendees, we created several possibilities for people to interact with their environment (click to enlarge).

Creative storyboarding

Creating an effective storyboard for a Geospatial AR experience using Adobe Aero begins with a clear vision of how the digital overlays interact with the real-world locations.

Left Field Labs started by mapping out key geographical points at the Las Vegas Convention Center location where the Google booth was going to stand, integrating physical and digital elements along the way. Each scene sketched in the storyboard illustrated how virtual objects and real-world environments would interplay, ensuring that user interactions and movements felt natural and intuitive.


“Being able to pin content to a location that’s mapped by Google and use Photorealistic 3D Tiles in Google’s Geospatial Creator provided incredible freedom when choosing how the experience would move around the environment. It gave us the flexibility to create the best flow possible.” 
– Chris Wnuk, Technical Director at Left Field Labs

Early on in the storyboarding process, we decided that the virtual 3D Android Bot would act as the guide. Users could follow the Bot around the venue by turning around in 360°, but staying at the same vantage point. This allowed us to design the interactive experience and each element in it for the right perspective from where the user would be standing, and give them a full look around the Google Booth and surrounding Google experiences, like the Monorail or Sphere.

The storyboard not only depicted the AR elements but also considered user pathways, sightlines, and environmental factors like time of day, occlusion, and overall layout of the AR content around the Booth and surrounding environment.

We aimed to connect the attendees with engaging, helpful, and delightful content, helping them visually navigate Google Booth at CES.

User experience and interactivity

When designing for AR, we have learned that user interactivity and ensuring that the experience has both helpful and delightful elements are key. Across the experience, we added multiple interactions that allowed users to explore different demo stations in the Booth, get navigation via Google Maps for the Monorail and shuttles, and interact with the Android Bot directly.

The Android brand team and Left Field Labs created the Android character to be both simple and expressive, showing playfulness and contextual understanding of the environment to delight users while managing the strain on users’ devices. Taking an agile approach, the team iterated on a wide range of both Android and iOS mobile devices to ensure smooth performance across different smartphones, form factors such as foldables, as well as operating system versions, making the AR experience accessible and enjoyable to the widest audience.

testing content in Adobe Aero
With Geospatial Creator in Adobe Aero, we were able to ensure that 3D content would be accurate to specific locations throughout the development process.

Testing the experience

We consistently iterated on the interactive elements based on location testing. We performed two location tests: First, in the middle of the design phase, which helped us validate the performance of the Visual Positioning Service (VPS) at the Las Vegas Convention Center. Second, at the end of the design phase and a few days before CES, which further validated the placement of the 3D content and enabled us to refine any final adjustments once the Google booth structure was built on site.


“It was really nice to never worry about deploying. The tracking on physical objects and quickness of localization was some of the best I’ve seen!” 
– Devin Thompson, Associate Technical Director at Left Field Labs

Attendee Experience

When attendees came to the Google Booth, they saw a sign with the QR code to enter the AR experience. We positioned the sign at the best vantage point at the booth, ensuring that people had enough space around them to scan with their device and engage in the AR experience.

Sign with QR code to scan for entry at the Google Booth, Las Vegas Convention Center
By scanning a QR code, attendees entered directly into the experience and saw the virtual Android Bot pop up behind the Las Vegas Convention Center, guiding them through the full AR experience.

Attendees enjoyed seeing the Android Bot take over the Las Vegas Convention Center. Upon initializing the AR experience, the Bot revealed a Generative AI wallpaper scene right inside of a 3D view of the building, all while performing skateboarding tricks at the edge of the building’s facade.

Moving image of GenAI Wallpaper scene
With Geospatial Creator, it was possible for us to “replace” the facade of the Las Vegas Convention Center, revealing a playful scene where the Android Bot highlighted the depth and occlusion capabilities of the technology while showcasing a Generative AI Wallpaper demo.

Many people also called out the usefulness of seeing location-based, AR content with contextual information, like navigation through Google Maps, embedded into interesting locations around the Booth. Interactive panels overlaid around the Booth then introduced key physical demos located at each station around the Booth. Attendees could quickly scan the different themes and features demoed, orient themselves around the Booth, and decide which area they wanted to visit first.


“I loved the experience! Maps and AR make so much sense together. I found it super helpful seeing what demos are in each booth, right on top of the booth, as well as the links to navigation. I could see using this beyond CES as well!” 
– CES Attendee
Moving image Booth navigation
The Android Bot helped attendees visually understand the different areas and demos at the Google Booth, helping them decide what they wanted to go see first.

From the attendees we spoke to, over half of them engaged with the full experience. They were able to skip parts of the experience that felt less relevant to them and focus only on the interactions that added value. Overall, we’ve learned that most people liked seeing a mix of delightful and helpful content and they felt excited to explore the Booth further with other demos.

Moving image of people navigating augmented reality at CES
Many attendees engaged with the full AR experience to learn more about the Google Booth at CES.

Photo of Shahram Izadi watching a demonstration of the full Geospatial AR experience at CES.
Shahram Izadi, Google’s VP and GM, AR/XR, watching a demonstration of the full Geospatial AR experience at CES.

Location-based, AR experiences can transform event experiences for attendees who desire more ways to discover and engage with exhibitors at events. This trend underscores a broader shift in consumer expectations for a more immersive and interactive world around them and the blurring lines between online and offline experiences. At events like CES, AR content can offer a more immersive and personalized experience that not only entertains but also educates and connects attendees in meaningful ways.

To hear the latest updates about Google AR, Geospatial Creator, and more follow us on LinkedIn (@GoogleARVR) and X (@GoogleARVR). Plus, visit our ARCore and Geospatial Creator websites to learn how to get started building with Google’s AR technology.


1Available on select devices and may depend on regional availability and user settings.

People of AI – Season 3

Posted by Ashley Oldacre

If you are joining us for the first time, you can binge listen to Seasons 1 and 2 wherever you get your podcasts.

We are back for another season of People of AI with a new lineup of incredible guests! I am so excited to continue co-hosting with Luiz Gustavo Martins as we meet inspiring people with interesting stories in the field of Artificial Intelligence.

Last season we focused on the big shift in technology spurred on by Generative AI. Fast forward 12 months, with the launch of multimodal models, we are at an interesting point in history.

In Season 3, we will continue to uncover our guests' personal and professional journeys into the field of AI, highlighting the important work/products they are focusing on. At the same time we want to dig deeper into the societal implications of what our guests create. We will ask questions to understand how they are leveraging AI to solve problems and create new experiences while also looking to understand what challenges they may face and what potential this technology has for both good and bad. We want to hold both truths to light through conversations with our guests. All this with the goal of aligning our technology with the public narrative and paint a realistic picture of how this technology is being used, the amazing things we can do with it and the right questions to make sure it is used safely and responsibly.

Starting today, we will release one new episode of season 3 per week. alternating video and audio. Listen to the first episode on the People of AI site or wherever you get your podcasts.

  • Episode 1: meet Adrit Rao, a 16 year old high school student, app developer, and research intern at Stanford University. We talk about App development and how learning about TensorFlow enabled him to create life changing apps in Healthcare. 
  • Episode 2: meet Indira Negi, a Product and Tech Executive investing in Medical Devices, AI and Digital health at the Bill and Melinda Gates Foundation as we learn about the latest investments in AI and Healthcare.
    • Episode 3: meet Tris Warkentin, Director of Product Management at Google Deepmind as we talk about the exciting new launches from Google’s latest Large Language Models. 
    • Episode 4: meet Kathleen Kenealy, Senior Software Engineer at Google DeepMind as we learn about the engineering genius behind Google’s latest Large Language Model launches. 
    • Episode 5: meet Jeanine Banks, Vice President and General Manager of Google Developer X and Head of Developer Relations. Join us as we learn about Google’s latest AI innovations and how they will change the developer landscape. 
    • Episode 6: meet François Chollet, creator of Keras and senior Software Engineer and AI researcher at Google. Join us as we learn about Google’s latest AI innovations and how they will change the developer landscape. 
    • Episode 7: meet Chansung Park, Google Developer Expert and Researcher as we talk about the importance of building and planning for Large Language Model Infrastructure. 
    • Episode 8: meet Fergus Hurley and Nia Castelly, co-founders of Checks, a privacy platform for mobile app developers that helps create a safer digital ecosystem by simplifying the path to privacy compliance for development teams and the apps they’re building. 
    • Episode 9: meet Sam Sepah and Thad Starner, as they talk about leveraging the power of Generative AI to unlock sign language capabilities.

    Listen now to the first episode of Season 3. We can’t wait to share the stories of these exceptional People of AI with you!


    This podcast is sponsored by Google. Any remarks made by the speakers are their own and are not endorsed by Google.

    How recommerce startup Beni uses AI to help you shop secondhand

    Posted by Lillian Chen – Global Brand and Content Marketing Manager, Google Accelerator Programs

    Sarah Pinner’s passion to reduce waste began as a child when she would reach over and turn off her sibling’s water when they were brushing their teeth. This passion has fueled her throughout her career, from joining zero-waste grocery startup Imperfect Foods to co-founding Beni, an AI-powered browser extension that aggregates and recommends resale options while users shop their favorite brands. Together with her co-founder and Beni CTO Celine Lightfoot, Sarah built Beni to make online apparel resale accessible to everyday shoppers in order to accelerate the circular economy and reduce the burden of fashion on the planet.

    Sarah explains how the platform helps connect shoppers to secondhand clothing: “Let’s say you’re looking at a Nike shoe. While on the Nike site, Beni pulls resale listings for that same shoe from over 40 marketplaces like Poshmark or Ebay or TheRealReal. Users can simply buy the resale version instead of new to save money and purchase more sustainably. On average, Beni users save about 55% from the new item, and it’s also a lot more sustainable to buy the item secondhand.”

    Beni was one of the first companies in the recommerce platform software space, and the competitive landscape is growing. “The more recommerce platforms the better, but Beni is ahead in terms of our partnerships and access to data as well as the ability to search across data,” says Sarah.


    How Beni Uses AI

    AI helps Beni to ingest all data feeds from their 40+ partnerships into Beni’s database so they can surface the most relevant resale items to the shopper. For example, when Beni receives eBay’s feed for a product search, there may be 100,000 different sizes. The team has trained the Beni model to normalize sizing data. That’s one piece of their categorization.

    “When we first started Beni, the intention wasn’t to start a company. It was to solve a problem, and AI has been a great tool to be able to do that,” says Sarah.


    Participating in Google for Startups Accelerator: Circular Economy

    Beni’s product was built using Google technology, is hosted on Google Cloud and utilizes Vision API Product Search, Vertex AI, BigQuery, and the Chrome web store.

    When they heard about the Google for Startups Accelerator: Circular Economy program, it seemed like the perfect fit. “Having been in the circular economy space, and being a software business already using a plethora of Google products, and having a Google Chrome extension - getting plugged into the Google world gave us great insights about very niche questions that are very hard to find online,” says Sarah.

    As an affiliate business in resale, Beni’s revenue per transaction is low—a challenge for a business model that requires scale. The Beni team worked one-on-one with Google mentors to best use Google tools in a cost-effective way. Keeping search results relevant is a core piece of the zero-waste model. “Being plugged in and being able to work through ways to improve that relevancy and that reliability with the people in Google who know how to build Google Chrome extensions, know how to use the AI tools on the backend, and deeply understand Search is super helpful.” The Google for Startups Accelerator: Circular Economy program also educated the team in how to selectively use AI tools such as Google’s Vision API Product Search versus building their own tech in-house.

    “Having direct access to people at Google was really key for our development and sophisticated use of Google tools. And being a part of a cohort of other circular economy businesses was phenomenal for building connections in the same space,” says Sarah.

    Google for Startups Accelerator support extended beyond tech. A program highlight for Sarah was a UX writing deep dive specifically for sustainability. “It showed us all this amazing, tangible research that Google has done about what is actually effective in terms of communicating around sustainability to drive behavior change,” said Sarah. “You can’t shame people into doing things. The way in which you communicate is really important in terms of if people will actually make a change or be receptive.”

    Additionally, the new connections made with other circular economy startups and experts in their space was a huge benefit of participating in Google for Startups Accelerator. Mentorship, in particular, provided product-changing value. Google technical mentors shared advice that had a huge impact on the decision for Beni to move from utilizing Vision API Product Search to their own reverse image search. “Our mentors guided us to shift a core part of our technology. It was a big decision and was one of the biggest pieces of mentorship that helped drive us forward. This was a prime example of how the Google for Startups Accelerator program is truly here to support us in building the best products,” says Sarah.


    What’s next for Beni

    Beni’s mission is straightforward ‐ they’re easing the burden for shoppers to find and buy items second hand so that they can bring new people into resale and make resale the new norm.

    Additionally, Beni is continuing to be built into a search platform, searching across second hand clothing. Beni offers their Chrome extension on desktop and mobile, and they will have a searchable interface. In addition to building out the platform further, Beni is looking at how they can support other e-commerce platforms and integrate resale into their offerings.

    Learn about how to get involved in Google accelerator programs here.

    Carbon Limit’s concrete technology is saving the environment using AI

    Posted by Lillian Chen – Global Brand and Content Marketing Manager, Google Accelerator Programs

    Located in Boca Raton, Carbon Limit aims to decarbonize the industry and take part in saving, protecting, and healing the environment. Cofounder Tim Sperry explains that for him and his cofounders Oro Padron, and Christina Stavridi, the mission is personal. “I’ve lost family members [to polluted air]. Oro has his own story, Christina has her own story, and our other core team member Angel just had kids. All of us have our own connection to our mission. And with that, we've developed a really strong company culture,” he says.

    Today, Carbon Limit is evolving to create sustainable solutions for the built environment. Their flagship product, CaptureCrete, is an additive that gives concrete the ability to capture and store CO2 directly from the air.

    Carbon Limit’s initial prototype — a portable shipping container fitted with solar panels, filtered media, and intake fans — was a direct air capture system. With a business model that was dependent on tax credits and carbon credits, the team decided to pivot. “We took our original technology, which was always meant to capture CO2 to store in concrete as a permanent storage solution to CO2 in the air, and turned that into concrete technology,” explains Tim. “We’re lowering the carbon footprint of concrete projects and problems, and providing the ability to generate valuable carbon credits. It actually pays to use our technology: you’re quantifiably lowering the carbon footprint and improving the environment, and you can make money from these carbon credits.”


    How Carbon Limit uses AI

    Combating climate change is a race against time, as cofounder and CMO Oro explains: “We are in an industry that moves at a pace that when technology catches up, sometimes it’s too late.”

    “We have found that AI actually is not eliminating, it is creating—it is letting our own people discover things about themselves and possibilities that they didn’t know about,” says Oro. “We embrace AI because we are embracing the future, and we strive to be pioneers.”

    Artificial intelligence also allows for transparency in a space that can become congested by unreliable data. “We’re developing tools, specifically the digital MRV, which stands for measurement, reporting, and verification of carbon credits,” says Tim. “There is bad press that there’s a lot of fake or unverified carbon credits being sold, generated, or created.” AI gives real-time, real-world data, exposure, and quantification of the carbon credits. Carbon Limit is generating carbon credits with hard tech, bringing trust into tech.


    How Carbon Limit uses Google technology

    Carbon Limit is a team of developers, programmers, and data scientists working across multiple operating systems, so they needed a centralized system for collaborating. “Google Workspace has allowed us to build our own CRMs with Google Sheets and Google Docs, which we’ve found to be the easiest way to onboard quickly. Google has been an amazing tool for us to communicate internally.” Christina adds, “We have a small but diverse team with ages that vary. Not every single team member is used to using the same tools, so the way Oro has onboarded the team and utilized these tools in a customizable way where they’re easily adoptable and used by every single team member to optimize our work has been super beneficial.”

    Additionally, the Carbon Limit team uses Google data for training their CO2-related data, and Google Colab to train their models. “We have some models that were made in Python, but utilizing Google Cloud has helped us predict models faster,” says Oro.


    Participating in Google for Startups Accelerator: Climate Change

    Before Carbon Limit started the Google for Startups Accelerator: Climate Change program, the Carbon Limit team considered integrating artificial intelligence (AI) and machine learning (ML) into their process but wanted to ensure that they were making the right decision. With Google mentorship and support, they went full force with AI and ML algorithms. “Accelerator: Climate Change helped us realize exactly what we needed to do,” says Oro.

    Participating in the program also gave Carbon Limit access to resources that helped enhance their SEO. “We learned how to increment our backlinks and how to improve performance, which has been extremely helpful to put us on the map. Our whole backbone has been built thanks to Google Workspace,” says Oro.

    “The Google for Startups Accelerator program gave us valuable resources and guidance on what we can do, how we can do it, and what not to do” says Tim. “The mentorship and learning from people who developed the technology, use the technology, and work with it every day was invaluable for us.” Christina adds, “The mentors also helped us refine our pitch when communicating our solution on different platforms. That was very useful to understand how to speak to different customers and investors.”

    The program also led to a new client for Carbon Limit: Google. “That was critical because with Google as an early adopter, that helped us build a significant amount of credibility and validation,” Tim tells us.


    What’s next for Carbon Limit

    Looking ahead, Carbon Limit will be launching a new technology that can be used in data centers to mitigate electricity as well as reduce and remove CO2 pollution.

    “We went from a carbon capture solution to sustainable solutions because we wanted to go even bigger,” says Tim. “We want to inspire others to do what we’re doing and help create more awareness and a more environmentally friendly world.”

    Tim shares, “I love what I do. I love to be able to invent something that didn’t exist. But more importantly, it helps protect my family, my loved ones, future generations, and the environment. And I get to do it with this amazing group of people at Carbon Limit.”

    Learn about how to get involved in Google accelerator programs here.

    Introducing Android emulators, iOS simulators, and other product updates from Project IDX

    Posted by the IDX team

    Six months ago, we launched Project IDX, an experimental, cloud-based workspace for full-stack, multiplatform software development. We built Project IDX to simplify and streamline the developer workflow, aiming to reduce the sea of complexities traditionally associated with app development. It certainly seems like we've piqued your interest, and we love seeing what IDX has helped you build.

    For example, we recently learned about Tanaki, an AI-enhanced content creation app built using Project IDX:

    Image of content creation app Tanaki on a mobile device in the foreground, with coding in Project IDX on a computer screen in the banckgound.

    Pasquale D’Silva one of the developers that built Tanaki, said:

    "Using the IDX shared workspace to build Tanaki has been so fun. It allows our remote team of imagineers to build together in one place. It is a magic collaboration portal!"

    Developers at Google have also been using IDX internally to help speed up development across various projects. One example is the the Firebase Blog, where the full authoring, development, and deployment of the Astro-powered project is handled using IDX:

    Screen grab of The Firebase Blog on a computer

    Another interesting project leveraging IDX’s extensibility model is Malloy, a new open-source data language available as a VS Code extension that operates against databases like BigQuery:

    Screen grab of Malloy in Project IDX

    Lloyd Tabb, a Distinguished Software Engineer at Google, told us:

    “I use IDX with the Malloy project. I often have several different data projects going simultaneously and IDX lets me quickly spin up an instance to solve a problem and it is trivial to configure."

    If you want to share what IDX has helped you build, use the #ProjectIDX tag on X.


    What’s new in IDX?

    In addition to seeing how you’re using IDX, a key part of building Project IDX is your feedback, so we’ve continued to roll out features for you to test. We're excited to share the latest updates we've implemented to expedite and streamline multiplatform app development, so you can deliver with speed, ease and quality.


    Preview your app directly in IDX with our iOS simulator and Android emulator

    We’re bringing the iOS Simulator and Android Emulator to the browser. Whether you’re building a Flutter or web app, Project IDX now allows you to preview your applications without having to leave your workspace. When you use a Flutter or web template, Project IDX intelligently loads the right preview environment for your application — Safari mobile and Chrome for web templates, or Android, iOS, and Chrome for Flutter templates.

    Screen grab of an animation project in Project IDX

    IDX’s web and Android emulators allow you to develop, test, and debug directly from your workspace, consolidating your multi-step, multiplatform process into one place. With iOS simulation you can spot-check your app's layout and behavior while you work. This feature is still experimental, so be sure to test it out and send us feedback.


    Get started fast with a rich library of project templates

    Four of our top ten feature requests have been to support more templates, so we’re pleased to share that we’ve added new templates for Astro, Go, Python/Flask, Qwik, Lit, Preact, Solid.js, and Node.js. Use these templates to jump right into your project so you can spend less time setting up and more time creating.

    Preview of template gallery in Project IDX
    Check out our new and improved template gallery

    Of course you can still import your own repo from GitHub, directly from your local files, or you can choose your own setup using a custom Nix environment.


    Quickly build and customize your IDX workspace with improvements to Nix

    .idx/dev.nix

    IDX uses Nix to define the environment configuration for each workspace to give you flexibility and extensibility in IDX – even our templates and previews are configured using Nix to ensure they’re working correctly inside IDX. We’re continuously working on Nix improvements to help boost your productivity, so now you can:

    • Customize IDX starter templates easily by leveraging Nix extensibility.
    • Reduce the likelihood of errors and write code more efficiently with Nix file editing, including support for syntax highlighting, error detection, and suggested code completions.
    • Recover from broken configurations quickly and avoid unnecessary rebuild attempts with major improvements to our environment customization workflow, including seamless environment rebuilds and troubleshooting.

    Easily build, test, and deploy apps with additional new IDX features and resources

    image showing backend ports and workspace tasks in IDX
    • Auto-detect network ports needed for applications or services and adjust the firewall settings to permit ingress and egress without any additional configuration on your end.
    • Instantly run command-line tools, scripts, and utilities directly within workspace without the need to install them locally on your machine.
    • Simplify the process of working with Docker containers and images directly from the development environment by enabling Docker in your dev.nix file.

    AI launched in 15 new regions

    image showing backend ports and workspace tasks in IDX

    We’ve launched our AI capabilities in the following 15 countries: India, Australia, Israel, Brazil, Mexico, Colombia, Argentina, Peru, Chile, Singapore, Bangladesh, Pakistan, Canada, Japan, and South Korea. More countries will be enabled with AI access soon – indicate your interest for AI expansion in this feature tracking post and stay tuned for more AI updates.


    Improving together

    We're constantly working on adding new capabilities to help you do higher quality work, more efficiently, with less friction. We’ve addressed dozens of your feature requests and fixed a multitude of bugs you flagged for us, so thank you for your continued support and engagement – please keep the feedback coming by filing bugs and feature requests.

    For walkthroughs and more information on all the features mentioned above, check out our documentation page. If you haven’t already, visit our website to sign up to try Project IDX and join us on our journey. Also, be sure to check out our new Project IDX Blog for the latest product announcements and updates from the team.

    We can’t wait to see what you create with Project IDX!

    How it’s Made – Exploring AI x Learning through ShiffBot, an AI experiment powered by the Gemini API

    Posted by Jasmin Rubinovitz, AI Researcher

    Google Lab Sessions is a series of experimental collaborations with innovators. In this session, we partnered with beloved creative coding educator and YouTube creator Daniel Shiffman. Together, we explored some of the ways AI, and specifically the Gemini API, could provide value to teachers and students during the learning process.

    Dan Shiffman started out teaching programming courses at NYU ITP and later created his YouTube channel The Coding Train, making his content available to a wider audience. Learning to code can be challenging, sometimes even small obstacles can be hard to overcome when you are on your own. So together with Dan we asked - could we try and complement his teaching even further by creating an AI-powered tool that can help students while they are actually coding, in their coding environment?

    Dan uses the wonderful p5.js JavaScript library and its accessible editor to teach code. So we set out to create an experimental chrome extension for the editor, that brings together Dan’s teaching style as well as his various online resources into the coding environment itself.

    In this post, we'll share how we used the Gemini API to craft Shiffbot with Dan. We're hoping that some of the things we learned along the way will inspire you to create and build your own ideas.

    To learn more about ShiffBot visit - shiffbot.withgoogle.com

    As we started defining and tinkering with what this chatbot might be, we found ourselves faced with two key questions:

    1. How can ShiffBot inspire curiosity, exploration, and creative expression in the same way that Dan does in his classes and videos?
    2. How can we surface the variety of creative-coding approaches, and surface the deep knowledge of Dan and the community?

    Let’s take a look at how we approached these questions by combining Google Gemini API’s capabilities across prompt engineering for Dan’s unique teaching style, alongside embeddings and semantic retrieval with Dan’s collection of educational content.


    Tone and delivery: putting the “Shiff” in “ShiffBot”

    A text prompt is a thoughtfully designed textual sequence that is used to prime a Large Language Model (LLM) to generate text in a certain way. Like many AI applications, engineering the right prompt was a big part of sculpting the experience.

    Whenever a user asks ShiffBot a question, a prompt is constructed in real time from a few different parts; some are static and some are dynamically generated alongside the question.

    ShiffBot prompt building blocks
    ShiffBot prompt building blocks (click to enlarge)

    The first part of the prompt is static and always the same. We worked closely with Dan to phrase it and test many texts, instructions and techniques. We used Google AI Studio, a free web-based developer tool, to rapidly test multiple prompts and potential conversations with ShiffBot.

    ShiffBot’s prompt starts with setting the bot persona and defining some instructions and goals for it to follow. The hope was to both create continuity for Dan’s unique energy, as seen in his videos, and also adhere to the teaching principles that his students and fans adore.

    We were hoping that ShiffBot could provide encouragement, guidance and access to relevant high-quality resources. And, specifically, do it without simply providing the answer, but rather help students discover their own answers (as there can be more than one).

    The instructions draw from Dan’s teaching style by including sentences like “ask the user questions” because that’s what Dan is doing in the classroom. This is a part of the persona / instructions part of the prompt:

    You are a ShiffBot, a chat bot embedded in the p5.js web editor that can help users while they learn creative coding. You are based on Daniel Shiffman's personality and The Coding Train YouTube channel. You are playful, silly, friendly, and educational. You like to make fun of yourself and your mission is to support the creative coding process and help the user feel less alone while coding. You love humans, are fascinated by them, and want to learn more about humans. You just LOVE to help them with their code, it brings you extraordinary happiness...

    The next piece of the prompt utilizes another capability of LLMs called few-shot learning. It means that with just a small number of examples, the model learns patterns and can then use those in new inputs. Practically, as part of the prompt, we provide a number of demonstrations of input and expected output.

    We worked with Dan to create a small set of such few-shot examples. These are pairs of <user-input><bot-response> where the <bot-response> is always in our desired ShiffBot style. It looks like this:

    User: How do I draw a circle? 

    ShiffBot: Oh! 🟣 🟢 🔵 That's what the `circle()` function is for! You can think of a function as a command: "I command you computer-machine to draw a circle!" You have to also specify where to draw the circle and how big (that's what "arguments" to a function are!), the p5.js reference which was written by actual, beautiful humans can tell you more! 

    --- 

    User: My homework is to draw a self-portrait. ShiffBot: Ooh! Self portraits are so fun! You could draw shapes to represent you, but you should be creative and think abstractly! What does a self-portrait mean to you? Try starting with `square()` or `circle()` and let me know how it goes! 🎨

    Our prompt includes 13 such pairs.

    Another thing we noticed as we were working on the extension is that sometimes, giving more context in the prompt helps. In the case of learning creative coding in p5.js, explaining some p5.js principles in the prompt guides the model to use those principles as it answers the user’s question. So we also include those things like:

    Use let rather than var or const.

    Everything we discussed up to now is static, meaning that it remains the same for every turn of the conversation between the user and ShiffBot. Now let's explore some of the parts that are constructed dynamically as the conversation evolves.


    Conversation and code context

    Because ShiffBot is embedded inside the p5.js editor, it can “see” the current code the user is working on, so that it can generate responses that are more personalized and relevant. We grab that information for the HTML DOM and append it to the prompt as well.

    the p5.js editor environment
    the p5.js editor environment (click to enlarge)

    Then, the full conversation history is appended, e.g:

    ShiffBot: ☀️ Good morning! Is it morning? I wouldn't know. I'm just a robot. What are you coding today? User: how do I draw a square?

    ShiffBot: Squares! I love squares! They are so stable and reliable. To draw a square, you can use the square() function. It takes four arguments: the x and y coordinates of the top-left corner, and the width and height of the square. For example, square(10, 10, 50, 50) would draw a square with its top-left corner at (10, 10) and a width and height of 50 pixels.

    User: how can I change the color?

    ShiffBot:

    We make sure to end with

    ShiffBot:

    So the model understands that it now needs to complete the next piece of the conversation by ShiffBot.


    Semantic Retrieval: grounding the experience in p5.js resources and Dan’s content

    Dan has created a lot of material over the years, including over 1,000 YouTube videos, books and code examples. We wanted to have ShiffBot surface these wonderful materials to learners at the right time. To do so, we used the Semantic Retrieval feature in the Gemini API, which allows you to create a corpus of text pieces, and then send it a query and get the texts in your corpus that are most relevant to your query. (Behind the scenes, it uses a cool thing called text embeddings; you can read more about embeddings here.) For ShiffBot we created corpuses from Dan’s content so that we could add relevant content pieces to the prompt as needed, or show them in the conversation with ShiffBot.


    Creating a Corpus of Videos

    In The Coding Train videos, Dan explains many concepts, from simple to advanced, and runs through coding challenges. Ideally ShiffBot could use and present the right video at the right time.

    The Semantic Retrieval in Gemini API allows users to create multiple corpuses. A corpus is built out of documents, and each document contains one or more chunks of text. Documents and chunks can also have metadata fields for filtering or storing more information.

    In Dan’s video corpus, each video is a document and the video url is saved as a metadata field along with the video title. The videos are split into chapters (manually by Dan as he uploads them to YouTube). We used each chapter as a chunk, with the text for each chunk being

    <videoTitle>

    <videoDescription>

    <chapterTitle>

    <transcriptText>

    We use the video title, the first line of the video description and chapter title to give a bit more context for the retrieval to work.

    This is an example of a chunk object that represents the R, G, B chapter in this video.

    1.4: Color - p5.js Tutorial


    In this video I discuss how color works: RGB color, fill(), stroke(), and transparency.


    Chapter 1: R, G, B


    R stands for red, g stands for green, b stands for blue. The way that you create a digital color is by mixing some amount of red, some amount of green, and some amount of blue. So that's that that's where I want to start. But that's the concept, how do I apply that concept to function names, and arguments of those functions? Well, actually, guess what? We have done that already. In here, there is a function that is talking about color. Background is a function that draws a solid color over the entire background of the canvas. And there is, somehow, 220 sprinkles of red, zero sprinkles of green, right? RGB, those are the arguments. And 200 sprinkles of blue. And when you sprinkle that amount of red, and that amount of blue, you get this pink. But let's just go with this. What if we take out all of the blue? You can see that's pretty red. What if I take out all of the red? Now it's black. What if I just put some really big numbers in here, like, just guess, like, 1,000? Look at that. Now we've got white, so all the colors all mixed together make white. That's weird, right? Because if you, like, worked with paint, and you were to mix, like, a whole lot of paint together, you get this, like, brown muddy color, get darker and darker. This is the way that the color mixing is working, here. It's, like, mixing light. So the analogy, here, is I have a red flashlight, a green flashlight, and a blue flashlight. And if I shine all those flashlights together in the same spot, they mix together. It's additive color, the more we add up all those colors, the brighter and brighter it gets. But, actually, this is kind of wrong, the fact that I'm putting 1,000 in here. So the idea, here, is we're sprinkling a certain amount of red, and a certain amount of green, and a certain amount of blue. And by the way, there are other ways to set color, but I'll get to that. This is not the only way, because some of you watching, are like, I heard something about HSB color. And there's all sorts of other ways to do it, but this is the fundamental, basic way. The amount that I can sprinkle has a range. No red, none more red, is zero. The maximum amount of red is 255. By the way, how many numbers are there between 0 and 255 if you keep the 0? 0, 1, 2, 3, 4-- it's 256. Again, we're back to this weird counting from zero thing. So there's 256 possibilities, 0 through 255. So, now, let's come back to this and see. All right, let's go back to zero, 0, 0, 0. Let's do 255, we can see that it's blue. Let's do 100,000, it's the same blue. So p5 is kind of smart enough to know when you call the background function, if you by accident put a number in there that's bigger than 255, just consider it 255. Now, you can customize those ranges for yourself, and there's reasons why you might want to do that. Again, I'm going to come back to that, you can look up the function color mode for how to do that. But let's just stay with the default, a red, a green, and a blue. So, I'm not really very talented visual design wise. So I'm not going to talk to you about how to pick beautiful colors that work well together. You're going to have that talent yourself, I bet. Or you might find some other resources. But this is how it works, RGB. One thing you might notice is, did you notice how when they were all zero, it was black, and they were all 255 it was white? What happens if I make them all, like, 100? It's, like, this gray color. When r equals g equals b, when the red, green, and blue values are all equal, this is something known as grayscale color.

    When the user asks ShiffBot a question, the question is embedded to a numerical representation, and Gemini’s Semantic Retrieval feature is used to find the texts whose embeddings are closest to the question. Those relevant video transcripts and links are added to the prompt - so the model could use that information when generating an answer (and potentially add the video itself into the conversation).

    Semantic Retrieval Graph
    Semantic Retrieval Graph (click to enlarge)

    Creating a Corpus of Code Examples

    We do the same with another corpus of p5.js examples written by Dan. To create the code examples corpus, we used Gemini and asked it to explain what the code is doing. Those natural language explanations are added as chunks to the corpus, so that when the user asks a question, we try to find matching descriptions of code examples, the url to the p5.js sketch itself is saved in the metadata, so after retrieving the code itself along with the sketch url is added in the prompt.

    To generate the textual description, Gemini was prompted with:

    The following is a p5.js sketch. Explain what this code is doing in a short simple way.

    code:

    ${sketchCode}


    Example for a code chunk:
    Text:
     

    Arrays - Color Palette

    This p5.js sketch creates a color palette visualization. It first defines an array of colors and sets up a canvas. Then, in the draw loop, it uses a for loop to iterate through the array of colors and display them as rectangles on the canvas. The rectangles are centered on the canvas and their size is determined by the value of the blockSize variable.

    The sketch also displays the red, green, and blue values of each color below each rectangle.

    Finally, it displays the name of the palette at the bottom of the canvas.

    Related video: 7.1: What is an array? - p5.js Tutorial - This video covers the basics on using arrays in JavaScript. What do they look like, how do they work, when should you use them?

    Moving image showing constructing the ShiffBot prompt
    Constructing the ShiffBot prompt (click to enlarge)

    Other ShiffBot Features Implemented with Gemini

    Beside the long prompt that is running the conversation, other smaller prompts are used to generate ShiffBot features.


    Seeding the conversation with content pre-generated by Gemini

    ShiffBot greetings should be welcoming and fun. Ideally they make the user smile, so we started by thinking with Dan what could be good greetings for ShiffBot. After phrasing a few examples, we use Gemini to generate a bunch more, so we can have a variety in the greetings. Those greetings go into the conversation history and seed it with a unique style, but make ShiffBot feel fun and new every time you start a conversation. We did the same with the initial suggestion chips that show up when you start the conversation. When there’s no conversation context yet, it’s important to have some suggestions of what the user might ask. We pre-generated those to seed the conversation in an interesting and helpful way.


    Dynamically Generated Suggestion Chips

    Suggestion chips during the conversation should be relevant for what the user is currently trying to do. We have a prompt and a call to Gemini that are solely dedicated to generating the suggested questions chips. In this case, the model’s only task is to suggest followup questions for a given conversation. We also use the few-shot technique here (the same technique we used in the static part of the prompt described above, where we include a few examples for the model to learn from). This time the prompt includes some examples for good suggestions, so that the model could generalize to any conversation:

    Given a conversation between a user and an assistant in the p5js framework, suggest followup questions that the user could ask.

    Return up to 4 suggestions, separated by the ; sign.

    Avoid suggesting questions that the user already asked. The suggestions should only be related to creative coding and p5js.


    Examples:

    ShiffBot: Great idea! First, let's think about what in the sketch could be an object! What do you think?

    Suggestions: What does this code do?; What's wrong with my code?; Make it more readable please


    User: Help!

    ShiffBot: How can I help?

    Suggestions: Explain this code to me; Give me some ideas; Cleanup my code

    suggested response chips, generated by Gemini
    suggested response chips, generated by Gemini (click to enlarge)

    Final thoughts and next steps

    ShiffBot is an example of how you can experiment with the Gemini API to build applications with tailored experiences for and with a community.

    We found that the techniques above helped us bring out much of the experience that Dan had in mind for his students during our co-creation process. AI is a dynamic field and we’re sure your techniques will evolve with it, but hopefully they are helpful to you as a snapshot of our explorations and towards your own. We are also excited for things to come both in terms of Gemini and API tools that broaden human curiosity and creativity.

    For example, we’ve already started to explore how multimodality can help students show ShiffBot their work and the benefits that has on the learning process. We’re now learning how to weave it into the current experience and hope to share it soon.

    experimental exploration of multimodality in ShiffBot
    experimental exploration of multimodality in ShiffBot (click to enlarge)

    Whether for coding, writing and even thinking, creators play a crucial role in helping us imagine what these collaborations might look like. Our hope is that this Lab Session gives you a glimpse of what’s possible using the Gemini API, and inspires you to use Google’s AI offerings to bring your own ideas to life, in whatever your craft may be.

    HealthPulse AI Leverages MediaPipe to Increase Health Equity

    A guest post by Rouella Mendonca, AI Product Lead and Matt Brown, Machine Learning Engineer at Audere

    Please note that the information, uses, and applications expressed in the below post are solely those of our guest authors from Audere.


    About HealthPulse AI and its application in the real world

    Preventable and treatable diseases like HIV, COVID-19, and malaria infect ~12 million per year globally with a disproportionate number of cases impacting already underserved and under-resourced communities1. Communicable and non-communicable diseases are impeding human development by their negative impact on education, income, life expectancy, and other health indicators2. Lack of access to timely, accurate, and affordable diagnostics and care is a key contributor to high mortality rates.

    Due to their low cost and relative ease of use, ~1 billion rapid diagnostic tests (RDTs) are used globally per year and growing. However, there are challenges with RDT use.

    • Where RDT data is reported, results are hard to trust due to inflated case counts, lack of reported expected seasonal fluctuations, and non-adherence to treatment regimens.
    • They are used in decentralized care settings by those with limited or no training, increasing the risk of misadministration and misinterpretation of test results.

    HealthPulse AI, developed by a digital health non-profit Audere, leverages MediaPipe to address these issues by providing digital building blocks to increase trust in the world’s most widely used RDTs.

    HealthPulse AI is a set of building blocks that can turn any digital solution into a Rapid Diagnostic Test (RDT) reader. These building blocks solve prominent global health problems by improving rapid diagnostic test accuracy, reducing misadministration of tests, and expanding the availability of testing for conditions including malaria, COVID, and HIV in decentralized care settings. With just a low-end smartphone, HealthPulse AI improves the accuracy of rapid diagnostic test results while automatically digitizing data for surveillance, program reporting, and test validation. It provides AI facilitated digital capture and result interpretation; quality, accessible digital use instructions for provider and self-tests; and standards based real-time reporting of test results.

    These capabilities are available to local implementers, global NGOs, governments, and private sector pharmacies via a web service for use with chatbots, apps or server implementations; a mobile SDK for offline use in any mobile application; or directly through native Android and iOS apps.

    It enables innovative use cases such as quality-assured virtual care models which enables stigma-free, convenient HIV home testing with linkage to education, prevention, and treatment options.

    HealthPulse AI Use Cases

    HealthPulse AI can substantially democratize access to timely, quality care in the private sector (e.g. pharmacies), in the public sector (e.g. clinics), in community programs (e.g. community health workers), and self-testing use cases. Using only an RDT image captured on a low-end smartphone, HealthPulse AI can power virtual care models by providing valuable decision support and quality control to clinicians, especially in cases where lines may be faint and hard to detect with the human eye. In the private sector, it can automate and scale incentive programs so auditors only need to review automated alerts based on test anomalies; procedures which presently require human reviews of each incoming image and transaction. In community care programs, HealthPulse AI can be used as a training tool for health workers learning how to correctly administer and interpret tests. In the public sector, it can strengthen surveillance systems with real-time disease tracking and verification of results across all channels where care is delivered - enabling faster response and pandemic preparedness3.


    HealthPulse AI algorithms

    HealthPulse AI provides a library of AI algorithms for the top RDTs for malaria, HIV, and COVID. Each algorithm is a collection of Computer Vision (CV) models that are trained using machine learning (ML) algorithms. From an image of an RDT, our algorithms can:

    • Flag image quality issues common on low-end phones (blurriness, over/underexposure)
    • Detect the RDT type
    • Interpret the test result

    Image Quality Assurance

    When capturing an image of an RDT, it is important to ensure that the image captured is human and AI interpretable to power the use cases described above. Image quality issues are common, particularly when images are captured with low-end phones in settings that may have poor lighting or simply captured by users with shaky hands. As such, HealthPulse AI provides image quality assurance (IQA) to identify adversarial image conditions. IQA returns concerns detected and can be used to request users to retake the photo in real time. Without IQA, clients would have to retest due to uninterpretable images and expired RDT read windows in telehealth use cases, for example. With just-in-time quality concern flagging, additional cost and treatment delays can be avoided. Examples of some adversarial images that IQA would flag are shown in Figure 1 below.

    Images of malaria, HIV and COVID tests that are dark, blurry, too bright, and too small.
    Figure 1: Images of malaria, HIV and COVID tests that are dark, blurry, too bright, and too small.

    Classification

    With just an image captured on a 5MP camera from low-end smartphones commonly used in Africa, SE Asia, and Latin America where a disproportionate disease burden exists, HealthPulse AI can identify a specific test (brand, disease), individual test lines, and provide an interpretation of the test. Our current library of AI algorithms supports many of the most commonly used RDTs for malaria, HIV, and COVID-19 that are W.H.O. pre-qualified. Our AI is condition agnostic and can be easily extended to support any RDT for a range of communicable and non-communicable diseases (Diabetes, Influenza, Tuberculosis, Pregnancy, STIs and more).

    HealthPulse AI is able to detect the type of RDT in the image (for supported RDTs that the model was trained for), detect the presence of lines, and return a classification for the particular test (e.g. positive, negative, invalid, uninterpretable). See Figure 2.

    Figure 2: Interpretation of a supported lateral flow rapid test.
    Figure 2: Interpretation of a supported lateral flow rapid test.

    How and why we use MediaPipe

    Deploying HealthPulse AI in decentralized care settings with unstable infrastructure comes with a number of challenges. The first challenge is a lack of reliable internet connectivity, often requiring our CV and ML algorithms to run locally. Secondly, phones available in these settings are often very old, lacking the latest hardware (< 1 GB of ram and comparable CPU specs), and on different platforms and versions ( iOS, Android, Huawei; very old versions - possibly no longer receiving OS updates) mobile platforms. This necessitates having a platform agnostic, highly efficient inference engine. MediaPipe’s out-of-the-box multi-platform support for image-focused machine learning processes makes it efficient to meet these needs.

    As a non-profit operating in cost-recovery mode, it was important that solutions:

    • have broad reach globally,
    • are low-lift to maintain, and
    • meet the needs of our target population for offline, low resource, performant use.

    Without needing to write a lot of glue code, HealthPulse AI can support Android, iOS, and cloud devices using the same library built on MediaPipe.

    Our pipeline

    MediaPipe’s graph definitions allow us to build and iterate our inference pipeline on the fly. After a user submits a picture, the pipeline determines the RDT type, and attempts to classify the test result by passing the detected result-window crop of the RDT image to our classifier.

    For good human and AI interpretability, it is important to have good quality images. However, input images to the pipeline have a high level of variability we have little to no control over. Variability factors include (but are not limited to) varying image quality due to a range of smartphone camera features/megapixels/physical defects, decentralized testing settings which include differing and non-ideal lighting conditions, random orientations of the RDT cassettes, blurry and unfocused images, partial RDT images, and many other adversarial conditions that add challenges for the AI. As such, an important part of our solution is image quality assurance. Each image passes through a number of calculators geared towards highlighting quality concerns that may prevent the detector or classifier from doing its job accurately. The pipeline elevates these concerns to the host application, so an end-user can be requested in real-time to retake a photo when necessary. Since RDT results have a limited validity time (e.g. a time window specified by the RDT manufacturer for how long after processing a result can be accurately read), IQA is essential to ensure timely care and save costs. A high level flowchart of the pipeline is shown below in Figure 3.

    Figure 3: HealthPulse AI pipeline
    Figure 3: HealthPulse AI pipeline

    Summary

    HealthPulse AI is designed to improve the quality and richness of testing programs and data in underserved communities that are disproportionately impacted by preventable communicable and non-communicable diseases.

    Towards this mission, MediaPipe plays a critical role by providing a platform that allows Audere to quickly iterate and support new rapid diagnostic tests. This is imperative as new rapid tests come to market regularly, and test availability for community and home use can change frequently. Additionally, the flexibility allows for lower overhead in maintaining the pipeline, which is crucial for cost-effective operations. This, in turn, reduces the cost of use for governments and organizations globally that provide services to people who need them most.

    HealthPulse AI offerings allow organizations and governments to benefit from new innovations in the diagnostics space with minimal overhead. This is an essential component of the primary health journey - to ensure that populations in under-resourced communities have access to timely, cost-effective, and efficacious care.


    About Audere

    Audere is a global digital health nonprofit developing AI based solutions to address important problems in health delivery by providing innovative, scalable, interconnected tools to advance health equity in underserved communities worldwide. We operate at the unique intersection of global health and high tech, creating advanced, accessible software that revolutionizes the detection, prevention, and treatment of diseases — such as malaria, COVID-19, and HIV. Our diverse team of passionate, innovative minds combines human-centered design, smartphone technology, artificial intelligence (AI), open standards, and the best of cloud-based services to empower innovators globally to deliver healthcare in new ways in low-and-middle income settings. Audere operates primarily in Africa with projects in Nigeria, Kenya, Côte d’Ivoire, Benin, Uganda, Zambia, South Africa, and Ethiopia.


    1 WHO malaria fact sheets

    Accelerating startup growth through technology, expertise, and community

    Posted by Nivedita Kumari – Technical Anchor Mentor, Accelerator Program, and Prabhu Thiagarajan – Accelerator Success Mentor, Accelerator Program Google for Startups Accelerator: Sustainable Development Goals

    This International Mentoring Day, we recognize that mentorship is a critical part of the startup journey. Google for Startups Accelerator programs provide founders and teams with the technology, expertise, and mentorship they need to grow and succeed. As program mentors, we had the opportunity to engage with and empower many early stage startups helping them scale and grow.


    The Startup Challenge

    Although the startup ecosystem is rapidly expanding, success is rarely a smooth journey. On an average, it takes startups two to three years years to turn a profit, and fewer than 10% of startups that raise a seed round successfully raise a Series A investment. Even those that manage to secure funding, still face other hurdles like driving organic growth, fundraising, building brand and market expansion. Mentorship and access to networks has been proven time and again to make the critical difference for successful founders.

    To level the playing field for startup success, Google for Startups connects founders to the people, programs and best practices they need to grow and scale their companies. Google for Startups Accelerator programs provide participants with hands-on mentorship and support from Googlers as well as experienced entrepreneurs and investors. These experts work directly with startups over the course of 10 weeks to provide tailored technology, product development, marketing, sales and fundraising.


    Success Story

    As Google for Startups Accelerator program mentors, we had the opportunity to partner closely with various founders from around the world through the 2023 accelerator cohorts, including those focused on Cloud and Climate Change. One of these startups for the Cloud cohort was RealKey, an Automated Loan Processing (ALP) SaaS platform.

    RealKey automates document collection/review processes and centralizes communication to reduce touch points with underwriting. Through the Google for Startups Cloud Program, RealKey was able to accelerate Google AI based document processing and loan process automation to help create a clean loan submission process. Google for Startups helped RealKey reduce loan processing time and frustration for all parties involved. 

    "Google's Lending DocAI service enabled our platform to include document processing where we classify documents and run complex validation algorithms to ensure that a loan package meets all lending criteria. This is typically a manual process and our platform saves our clients valuable time and labor.” 
     Christopher Hussain, Founder & CEO, RealKey.
     
    Over the course of the 10 week program, the RealKey team worked closely with us to develop and track their program Objectives and Key Results (OKR). Through a series of tailored technical deep dives, mentor-led product and program workshops, and pairing with relevant experts from Google and the industry, RealKey was able to solve several business and technical challenges to accelerate their results. “With Google for Startups mentor support, we accomplished goals that we budgeted would take substantially longer”, says Christopher.

    The Google for Startups Accelerator program provides startups with the resources and support they need to thrive in the competitive world. Through its comprehensive program, startups gain access to funding, technical expertise, networking opportunities, and mentorship from Google AI experts, enabling them to overcome technical challenges, develop effective go-to-market strategies, and accelerate their growth. With the guidance and support of Google AI experts, startups can navigate the complexities of developing and commercializing their products, effectively reach their target audience, and establish themselves as leaders in the field of machine learning.


    Next Steps for Founders

    If you're a startup founder, Google for Startups Accelerator programs are a great way to get the help you need to grow your business and achieve your goals. Applications are now open for Google for Startups Accelerator: Women Founders and Black Founders cohorts in North America. We encourage applications from U.S. and Canadian headquartered technology startups until February 1, 2024, with the 10-week programs commencing in March. Learn more and register here.

    In addition to accelerator programs, Google for Startups offers a wide range of programs and initiatives to help startups at every stage of their journey. Whether you're just starting out or ready to scale, Google for Startups can help you connect with the right technology, expertise, and community to grow your business. Explore the best Google for Startups offerings for you and your team here.

    Google for Startups Accelerator Applications open Black Founders and Women Founders programs Go to startup.google.com

    #WeArePlay | Learn how a childhood experience with an earthquake shaped Álvaro’s entrepreneurial journey

    Posted by Leticia Lago – Developer Marketing

    Being trapped inside a house following a major earthquake as a child motivated Álvaro to research and improve the outcomes of destructive, large-scale quakes in Mexico. Using SkyAlert technology, sensors detect and report warnings of incoming earthquakes, giving people valuable time to prepare and get to safety.

    Álvaro shared his story in our latest film for #WeArePlay, which spotlights the founders and creatives behind inspiring apps and games on Google Play. We caught up with him to find out his motivations for SkyAlert, the impact the app’s had and what his future plans are.


    What was the inspiration behind SkyAlert?

    Being in Colima near the epicenter of a massive earthquake as a kid had a huge impact on me. I remember feeling powerless to nature and very vulnerable watching everything falling apart around me. I was struck by how quick and smart you had to be to get to a safe place in time. I remember hugging my family once it was over and looking towards the sea to watch out for an impending tsunami – which fortunately didn’t hit my region badly. It was at this moment that I became determined to find out what had caused this catastrophe and what could be done to prevent it being so destructive another time.

    Through my research, I learned that Mexico sits on five tectonic plates and, as a result, it is particularly prone to earthquakes. In fact, there've been seven major quakes in the last seven years, with hundreds losing their lives. Reducing the threat of earthquakes is my number one goal and the motivation behind SkyAlert. The technology we’ve developed can detect the warning signs of an earthquake early on, deliver alerts to vulnerable people and hopefully save lives.


    How does SkyAlert work exactly?

    SkyAlert collects data from a network of sensors and translates that information into alerts. People can put their zip code in order to filter updates for their locality. We’re constantly investing in getting the most reliable and fast technology available so we can make the service as timely and effective as possible.


    Did you always imagine you’d be an entrepreneur?

    Since I was a kid I knew I wanted to be an entrepreneur. This was inspired by my grandfather who ran a large candy company with factories all over Mexico. However, what I really wanted, beyond just running my own company, was to have a positive social impact and change lives for the better: a feat I feel proud to have achieved with SkyAlert.


    How is Google Play helping your app to grow?

    Being on Google Play helps us to reach the maximum number of people. We’ve achieved some amazing numbers in the last 10 years through Google Play, with over 7 million downloads. With 35% of our income coming from Google Play, this reach has helped us invest in new technologies and sensors.

    We also often receive advice from Google Play and they invite us to meetings to tell us how to do better and how to make the most of the platform. Google Play is a close partner that we feel really takes care of us.


    What impact has SkyAlert had on the people of Mexico?

    The biggest advantage of SkyAlert is that it can help them prepare for an earthquake. In 2017, we were able to notify people of a massive quake 12 seconds before it hit Mexico City. At least with those few seconds, many were able to get themselves to a safe place. Similarly, with a large earthquake in Oaxaca, we were able to give a warning of over a minute, allowing teachers to get students in schools away from infrastructure – saving kids’ lives.

    Also, many find having SkyAlert on their phone gives them peace of mind, knowing they’ll have some warning before an earthquake strikes. This can be very reassuring.


    What does the future look like for SkyAlert?

    We’re working hard to expand our services into new risk areas like flooding, storms and wildfires. The hope is to become a global company that can deliver alerts on a variety of natural phenomena in countries around the world.


    Read more about Álvaro and other inspiring app and game founders featured in #WeArePlay.



    How useful did you find this blog post?

    YouTube Ads Creative Analysis

    Posted by Brian Craft, Satish Shreenivasa, Huikun Zhang, Manisha Arora and Paul Cubre – gTech Data Science Team


    Introduction


    Why analyze YouTube ads?

    YouTube has billions of monthly logged-in users and every day people watch billions of hours of video and generate billions of views. Businesses can connect with YouTube users using YouTube ads, which are promotional videos that appear on YouTube's website and app, with a variety of video ad formats and goals.

    Image of a sample YouTube in-stream skippable video ad
    A sample YouTube in-stream skippable video ad

    The Challenge

    An effective video ad focuses on the ABCDs.

    • Attention: Capturing the viewer's attention till the end.
    • Branding: Helping them hear or visualize the brand.
    • Connection: Making them feel something about the brand.
    • Direction: Encouraging them to take action.

    But each YouTube ad has a varying number of components, for instance, objects, background music or a logo. Each of these components affect the view through rate (which is referred to as VTR for the remainder of the post) of the video ad. Therefore, analyzing video ads through the lens of the components in the ad helps businesses understand what about the ad improves VTR. The insights from these analyses can be used to inform the creation of new creatives and to optimize existing creatives to improve VTR.


    The Proposal

    We propose a machine learning based approach for analyzing a company’s YouTube ads to assess which components affect VTR, for the purpose of optimizing a video ad’s performance. We illustrate how to:

    • Use Google Cloud Video Intelligence API to extract the components of each video ad, using the underlying video files.
    • Transform that extracted data to engineered features that map to actionable business questions.
    • Use a machine learning model to isolate the effect on VTR of each engineered feature.
    • Interpret and action on those insights to improve video ad performance, for instance altering existing creatives or create new creatives to be used in an AB test.

    Approach


    The Process

    The proposed analysis has 5 steps, discussed below.

    1. Define Business Questions
    Align on a list of business questions that are actionable, for instance “does having a logo in the opening shot affect VTR?” We suggest taking feasibility into account ahead of time, for instance if a product disclaimer is necessary to have for legal reasons, there is no reason to assess the impact a disclaimer has on VTR.

    2. Raw Component Extraction
    Use Google Cloud technologies, such as the Google Cloud Video Intelligence API, and underlying video files to extract raw components from each video ad. For instance, but not limited to, objects appearing in the video at a particular timestamp, presence of text and its location on the screen, or the presence of specific sounds.

    3. Feature Engineering
    Using the raw components extracted in step 2, engineer features that align to the business questions defined in step 1. For example, if the business question is “does having a logo in the opening shot affect VTR”, create a feature that labels each video as either 1, having a logo in the opening shot or 0, not having a logo in the opening shot. Repeat this for each feature.

    4. Modeling
    Create an ML model using the engineered features from step 3, using VTR as the target in the model.

    5. Interpretation
    Extract statistically significant features from the ML model and interpret their effect on VTR. For example, “there is an xx% observed uplift in VTR when there is a logo in the opening shot.”


    Feature Engineering


    Data Extraction

    Consider 2 different YouTube Video Ads for a web browser, each highlighting a different product feature. Ad A has text that says “Built In Virus Protection'', while Ad B has text that says “Automatic Password Saving”.

    The raw text can be extracted from each video ad and allow for the creation of tabular datasets, such as the below. For brevity and simplicity, the example carried forward will deal with text features only and forgo the timestamp dimension.

     Ad

     Detected Raw Text

     Ad A

     Built In Virus Protection

     Ad B

     Automatic Password Saving


    Preprocessing

    After extracting the raw components in each ad, preprocessing may need to be applied, such as removing case sensitivity and punctuation.

     Ad

     Detected Raw Text

     Processed Text

     Ad A

     Built IVirus Protection

     built ivirus protection

     Ad B

     Automatic Password Saving

     automatic password saving


    Manual Feature Engineering

    Consider a scenario where the goal is to answer the business question, “does having a textual reference to a product feature affect VTR?”

    This feature could be built manually by exploring all the text in all the videos in the sample and creating a list of tokens or phrases that indicate a textual reference to a product feature. However, this approach can be time consuming and limits scaling.

    Image of pseudo code for manual feature engineering
    Pseudo code for manual feature engineering

    AI Based Feature Engineering

    Instead of manual feature engineering as described above, the text detected in each video ad creative can be passed to an LLM along with a prompt that performs the feature engineering automatically.

    For example, if the goal is to explore the value of highlighting a product feature in a video ad, ask an LLM if the text “‘built in virus protection’ is a feature callout”, followed by asking the LLM if the text “‘automatic password saving’ is a feature callout”.

    The answers can be extracted and transformed to a 0 or 1, to later be passed to a machine learning model.

     Ad

     Raw Text

     Processed Text

     Has Textual Reference to Feature

     Ad A

     Built IVirus Protection

     built ivirus protection

     Yes

     Ad B

     Automatic Password Saving

     automatic password saving

     Yes



    Modeling


    Training Data

    The result of the feature engineering step is a dataframe with columns that align to the initial business questions, which can be joined to a dataframe that has the VTR for each video ad in the sample.

     Ad

     Has Textual Reference to Feature

     VTR*

     Ad A

     Yes

     10%

     Ad B

     Yes

     50%


    *Values are random and not to be interpreted in any way.

    Modeling is done using fixed effects, bootstrapping and ElasticNet. More information can be found here in the post Introducing Discovery Ad Performance Analysis, written by Manisha Arora and Nithya Mahadevan.

    Interpretation

    The model output can be used to extract significant features, coefficient values, and standard deviation.

    Coefficient Value (+/- X%)
    Represents the absolute percentage uplift in VTR. Positive value indicates positive impact on VTR and a negative value indicates a negative impact on VTR.

    Significant Value (True/False)
    Represents whether the feature has a statistically significant impact on VTR.

     Feature

     Coefficient*

     Standard Deviation*

     Significant?*

     Has Textual Reference to Feature

    0.0222

    0.000033

    True


    *Values are random and not to be interpreted in any way.

    In the above hypothetical example, the feature “Has Feature Callout” has a statistically significant, positive impact of VTR. This can be interpreted as “there is an observed 2.22% absolute uplift in VTR when an ad has a textual reference to a product feature.”

    Challenges

    Challenges of the above approach are:

    • Interactions among the individual features input into the model are not considered. For example, if “has logo” and “has logo in the lower left” are individual features in the model, their interaction will not be assessed. However, a third feature can be engineered combining the above as “has large logo + has logo in the lower left”.
    • Inferences are based on historical data and not necessarily representative of future ad creative performance. There is no guarantee that insights will improve VTR.
    • Dimensionality can be a concern as given the number of components in a video ad.

    Activation Strategies


    Ads Creative Studio

    Ads Creative Studio is an effective tool for businesses to create multiple versions of a video by quickly combining text, images, video clips or audio. Use this tool to create new videos quickly by adding/removing features in accordance with model output.

    Image of sample video creation features in Ads creative studio
    Sample video creation features in Ads creative studio

    Video Experiments

    Design a new creative, varying a component based on the insights from the analysis, and run an AB test. For example, change the size of the logo and set up an experiment using Video Experiments.


    Summary


    Identifying which components of a YouTube Ad affect VTR is difficult, due to the number of components contained in the ad, but there is an incentive for advertisers to optimize their creatives to improve VTR. Google Cloud technologies, GenAI models and ML can be used to answer creative centric business questions in a scalable and actionable way. The resulting insights can be used to optimize YouTube ads and achieve business outcomes.


    Acknowledgements

    We would like to thank our collaborators at Google, specifically Luyang Yu, Vijai Kasthuri Rangan, Ahmad Emad, Chuyi Wang, Kun Chang, Mike Anderson, Yan Sun, Nithya Mahadevan, Tommy Mulc, David Letts, Tony Coconate, Akash Roy Choudhury, Alex Pronin, Toby Yang, Felix Abreu and Anthony Lui.