Tag Archives: UI

Meet the Android Studio Team: A Conversation with Android Developer UX Manager, Dan Dole

Posted by Ashley Tschudin – Social Media Specialist, MTP at Google

Welcome to "Meet the Android Studio Team"! In this blog series, we introduce you to the passionate people who create the Android development tools you use every day. Get to know the engineers, designers, product managers, and more who work hard to craft the best possible experience for Android developers, and explore their unique perspectives.

Dan Dole: Building Android Studio for You

Meet Dan Dole, a UX Manager for Android Developer UX, who offers a unique perspective on the Android development journey. He highlights the passion and talent within the Android Developer team, emphasizing the importance of elegant solutions and efficient experiences for developers.

Dan also delves into the exciting potential of AI and machine learning to transform Android development, foreseeing a future where AI accelerates learning, refines code, and empowers developers to focus on innovation.

Through his insights, Dan underscores the collaborative spirit and unwavering commitment to developer success that defines the Android Developer Experience.

Can you tell us about your journey to becoming a part of the Android Studio team? What sparked your interest in Android development?

My journey with Android Development and the Android Studio team started with a conversation with a former colleague and the product lead for Android Developer. She was a leader I respected as someone who was passionate about developers, and believed that UX was a critical component of product development. After meeting with her and understanding the direction of Android, I was convinced that Android could be not just an outstanding mobile platform but a platform that spanned devices, and this was an organization that was focused on enabling developers to bring their talents and creativity to billions of users. Each year, I see us advancing in that direction and feel more confident in my choice to be part of the Android Developer team.

This question can’t be answered without mentioning that the people working on Android Developer tools and APIs are some of the most passionate and talented people I have ever worked with.

What are some of the biggest challenges you've faced in your career as a developer, and how have those experiences shaped your approach to your job?

I am a UX professional in a highly technical environment. This has been the case for about two decades. One of the challenges I have faced is articulating the value of elegant solutions for developers.

This is partially because developers are very capable and resourceful. Clearly, they are tolerant and they will overcome issues that average users won’t. Prior to joining Android Developer Experience, I would have to create processes and negotiate quality bars to drive quality and build efficient experiences.

This challenge gave me skill in release management and how to understand some complexities unique to this space, but it also gave me tools to help explain that developers may be able to manage complexity better than most. Developers appreciate refinement, productivity, and quality, as much as they appreciate flexibility and capability.

How has the integration of AI and machine learning impacted Android developer capabilities, and how do you see it evolving in the future?

We are in the very early stages of AI and its ability to impact developers. As we learn how to be transparent and give developers control over how an AI can benefit them, we are seeing an immediate impact on accelerating learning and refining code.

I expect AI to remove the “chores” that developers have to do, creating more space for them to be productive. I also expect AI to evolve from generating artifacts to generating actions. Making AI features more proactive and allowing developers to more quickly adjust to users' needs.

How does the Android Studio team ensure that products or features meet the ever-changing needs of developers?

I lead our Android Developer research and design team. We spent countless hours listening to developers, evaluating feedback, and understanding technology investments. We approach these conversations and instruments by evaluating what we have already delivered, looking and listening to the challenges developers face, and designing and evaluating new approaches.

The Android Developer team (ENG, Product, UX and Test) are motivated by supporting developers, so all developer feedback is received with gratitude and influences all our investments.

What advice would you give to aspiring Android developers who are just starting their journey?

Android is a vibrant and welcoming community, so my advice would be to engage the community. It is where we learn, inspire and grow together. I have heard many Android developers talk about the pride they have working on this platform and the conviction they have in it being the best platform to work on. I feel like this is unique to Android, the platform isn’t a means to an end, it’s an identity and value system. Android is a community of amazing people, get involved.

Make Gemini in Android Studio Your Coding Companion

Embrace Dan's vision for the future of Android development and explore the latest AI advancements in Android Studio. Features like AI-powered code generation and refactoring tools empower you to develop higher-quality apps with greater efficiency.

Stay tuned!

Want to meet more of the Android Studio team? Stay tuned for future installments of this series, where we'll introduce you to new faces and share their unique insights.

Find Dan Dole on LinkedIn.

Source: Android Developers Blog

Meet the Android Studio Team: A Conversation with Engineering Director, Tor Norbye

Posted by Ashley Tschudin – Social Media Specialist, MTP at Google

Welcome to "Meet the Android Studio Team," our new ongoing blog series. Each week, we'll introduce you to the talented people behind Android Studio. Get to know the engineers, designers, product managers, and more who create the best possible experience for Android developers like you. Join us and explore their unique perspectives.

Tor Norbye: Building Android Studio for You

Trevor Johns, Staff Developer Programs Engineer

Meet Tor Norbye, an Engineering Director at Google leading the development of Android Studio.

From his early days of coding to leading the charge on AI-powered development tools, Tor shares his insights on the evolution of Android and the vital role Android Studio plays in its future.

We'll delve into the challenges of creating developer tools, the importance of community feedback, and how Google strives to empower developers worldwide.

Can you tell us about your journey to becoming a part of the Android Studio team? What sparked your interest in Android development?

I grew up in Norway and I was fascinated by programming; my first exposure was as a middle schooler reading program listings in magazines (yes, in the early 80s, monthly computer magazines would include source code!) and in 1983 I got my hands on a microcomputer, and knew immediately that's what I wanted to do as a career. And now, 40+ years later, I still love programming. It's not my day-job anymore, but I still write bits and pieces of code for Android Studio on the shuttle and during quiet periods.

I've worked on developer tools my whole career - first, 14 years at Sun Microsystems after college. In 2010 I got increasingly interested in the rise of mobile computing and really wanted to be part of it, so I joined the Android team, and I've been here since.

Back then there was no "Android Studio". At the time we were working on Eclipse-based tooling for Android development. But we all knew that IntelliJ was the gold-standard for Java development, so a couple years later we began the work on building Android Studio on top of IntelliJ and with various new and ported code from our Eclipse plugins. I then had the honor of doing the unveiling demo at Google I/O in 2013.

How has the integration of AI and machine learning impacted Android developer capabilities, and how do you see it evolving in the future?

The integration of artificial intelligence has absolutely impacted Android developer capabilities, and this is just the beginning.

I felt very fortunate to be part of bringing about the massive shift from desktop computing to mobile computing when I joined Android, and I can't believe I get to be in the middle of a second massive industry shift as well, with AI and large language models.

I actually spend a lot of my time on this, working with Studio engineers, UX and product managers on our various AI related features, and talking to partner AI teams at Google. We've made a huge amount of progress in the last couple of years, both on the Studio feature integration side, as well as Google-wide on the AI side. While there is some skepticism that we're just doing AI features for AI's sake, I don't see it that way. With AI, we can suddenly, with relatively low effort, build useful features not previously possible.

Here's a very simple example from the latest Studio version: When you invoke the Rename refactoring feature, we use Gemini to add additional naming suggestions into the name popup based on what your code is doing. Here we're helping you pick good names – and naming is famously one of the two hardest problems in computer science – naming, cache invalidation and off-by-one errors. Yet LLMs are good at this – so coupled with the safe refactoring machinery in the IDE, we were able to safely add a useful feature with relatively low engineering cost on the IDE side (of course, this is building on top of a massive investment from Google over on the Gemini side).

The field is moving incredibly quickly, so it's hard to predict where things are going, but we're actively working in several areas, making the AI more aware of your codebase, and making it handle larger, complex tasks via AI Agents, and so much more.

What are some of the biggest challenges you've faced in your career as a developer, and how have those experiences shaped your approach to your job?

Earlier in my career, at a different company, we had big annual releases. I took a lot of pride in my productivity, and as my responsibilities grew, I'd try to do the impossible and deliver, no matter what. I'd not only work long hours, but I'd also try to work as quickly as I can. This led to a lot of stress. I remember putting my (at the time) young children to bed and impatiently waiting for them to fall asleep such that I could head back out to the garage office and start the evening coding shift. And I knew that stress isn't healthy, so I'd also stress about being stressed! This obviously wasn't sustainable.

Now, I emphasize work life balance not only for myself, but also for our team. I want to make sure our work is sustainable, and that people can thrive and be in it for the long term. It's a marathon, not a sprint.

Can you share an example of how feedback from the developer community has directly influenced a feature or improvement?

We have a number of feedback channels; the most important one is the Android Studio issue tracker.

We still have a very large backlog of bugs, so it's easy to get the impression that we're ignoring user reports, but that's not true. As a team, we've actually fixed several thousand bugs in 2024 alone. The best bugs are those that are clear and actionable, ideally with steps to reproduce.

I'm also very thankful to everyone who turns on data sharing in Studio; if you don't already, please consider it! Our analytics is more of an indirect, but still vital, feedback channel from the community. In addition to collecting information on, for example, which menu items are clicked, we also use it to collect quality metrics on system health. For instance, when we detect that the UI is lagging (such as a 1+ second freeze in the UI thread), we grab a thread dump and send it to the server, then aggregate these into a dashboard where we can see top freeze spots in the IDE across the user population, and can focus our efforts on fixing those.

How does the Studio team contribute to Google's broader vision for the Android platform?

In Android Studio we're always making sure we support the latest technologies and recommendations from Android, Firebase, Material, and other Google technologies. That way, it's easier for developers to adopt recommendations, like using Kotlin, Coroutines, Compose, Material, and so on.

Explore the Power of AI

Unlock the full potential of AI in your Android development journey. Explore the latest advancements in Android Studio, including intelligent code completion, automated refactoring, and other AI-driven tools.

Stay tuned!

Don't miss our next and final installment in the "Meet the Android Studio Team" series; we'll feature one more talented team member and share their unique perspective. Stay tuned to learn more about the amazing people behind Android Studio.

Find Tor Norbye on Bluesky.

Source: Android Developers Blog

Meet the Android Studio Team: A Conversation with Staff Developer Programs Engineer, Trevor Johns

Posted by Ashley Tschudin – Social Media Specialist, MTP at Google

Android Studio isn't just code and algorithms – it's built by real people with fascinating stories. Our "Meet the Android Studio Team" series gives you a glimpse into the lives and passions of the talented individuals who craft the tools you use every day. Tune in each month to meet new team members and discover their unique journey.

Trevor Johns: Building Android Studio for You

Meet Trevor Johns, a seasoned Staff Developer Programs Engineer at Google.

Reflecting on his journey, Trevor sheds light on the most impactful advancements in the Android ecosystem and offers a glimpse into his vision for the future where AI plays a pivotal role in streamlining development workflows.

Trevor discusses the Android Studio team's dedication to enhancing developer productivity through AI, highlighting their focus on understanding and addressing developer needs, and reflects on the dynamic journey of Android development while sharing valuable insights.

Can you tell us about your journey to becoming a part of the Android Studio team? What sparked your interest in Android development?

I've been at Google in various roles since Google since 2007, and transferred to Android team in 2009 shortly after the launch of the HTC G1 — the first publicly available Android phone. Even in those early days it was clear that mobile computing was a unique opportunity to reimagine many of the limitations of desktop computers and how users interact with the digital world.

Among my first projects were helping developers optimize their apps for the MyTouch 3G and Motorola Droid, as well as creating developer resources for Android's 1.6 Donut release.

Over the years, I've worked on various parts of the Android OS including our first tablet devices, Android Wear, helping develop the original Android support libraries (which later became Jetpack), and the migration to Kotlin.

Recently I joined the Android Studio team to help improve developer productivity, using AI to streamline common developer tasks and help developers have more time to focus on creativity.

How does the Android Studio team ensure that products or features meet the ever-changing needs of developers?

Like the rest of Android, we approach development of new features by listening to our developer community. We hold regular listening sessions with publishers, work with our UX research team to conduct case studies, and participate in online discussions to get a sense for where developers face the most friction — and then try to find ways to reduce that friction.

For example, we developed Gemini in Android Studio's integration with Play Vitals and Firebase Crashlytics based on feedback from members of the developer community who commented to let us know where they would find AI most useful across their developer workflow.

Speaking of, if you'd like to provide us with feedback, you can always file a bug or feature request on the Android Studio issue tracker.

How does the Studio team contribute to Google's broader vision for the Android platform?

In addition to listening to the Android community, we also keep an eye on what's being developed across the rest of the Android team and make sure that Android Studio has the right tools to help developers quickly migrate between Android versions and adopt those new platform features.

Beyond that, the Studio team provides leading edge editing tools to make sure that Android remains one of the easiest computing platforms to develop for — unlocking this unique computing platform for millions of developers.

In your opinion, what is the most impactful feature or improvement the Android team has introduced in recent years, and why?

For developers, my answer would have to be the migration to Kotlin. This language has modernized the Android developer experience — letting developers write apps with less code and fewer errors. It's also the foundation for Jetpack Compose, which is the future of Android UI development.

If you could wave a magic wand and add one dream feature to the Android universe, what would it be and why?

I'd love to see Gemini be able to not just autocomplete code for me, but generate scaffolds for new projects. That way I can focus on building features rather than worrying about basic structure when starting a new project.

Develop Android Apps with Kotlin

Follow Trevor's lead and embrace the power of Kotlin for modern Android development. Enhance your skills and write better Android apps faster with Kotlin.

Stay tuned!

Get ready for another inspiring story! The "Meet the Android Studio Team" series continues next week with a new team member in the spotlight. Don't miss their unique insights and journey.

Find Trevor Johns on LinkedIn, X, Bluesky, and Medium.

Source: Android Developers Blog

Meet the Android Studio Team: A Conversation with Director of Product Management, Jamal Eason

Posted by Ashley Tschudin – Social Media Specialist, MTP at Google

Dive into the world of Android Studio and meet the masterminds behind your favorite development tools! In our recurring blog series, "Meet the Android Studio Team," we'll introduce you to the brilliant engineers, designers, product managers, and more who are shaping the future of Android development.

Join us each week to uncover the unique perspectives and stories of the people who make Android Studio the best it can be.

Jamal Eason: Building better Android apps - insights on Gemini, Crashlytics, and App Quality

Meet Jamal Eason, a Director of Product Management at Google, whose passion for empowering developers shines through in his work on Android Studio.

His journey, from studying computer science at West Point to developing Android hardware at Intel (including contributions to the Motorola Razr i), showcases a deep understanding of the developer experience. From attending the very first Android Studio unveiling at Google I/O to now shaping its future, Jamal brings a unique perspective to the team.

Jamal shares his insights on the evolution of Android Studio, the importance of a strong developer community, and the features he's most proud of.

Can you tell us about your journey to becoming a part of the Android Studio team? What sparked your interest in Android development?

I have had an interest in programming at an early age especially since studying computer science in undergrad at the United States Military Academy (West Point), and in that time I have had an interest not just in the creation of software but also in the tools developers use to make software.

My interest in Android development came when I was preparing for my first job after my telecommunications & computer networks military career when I was joining a team at the Intel Corporation that worked with Google to build Android hardware products. I thought the best way to understand Google and mobile was to download the Android SDK and create my own app end to end. My first taste of Android was Froyo 2.2 using the Eclipse based Android Developer Tools IDE.

At Intel, I worked on creating the x86 based version of the Android Emulator and Emulator system image, and also a new Hypervisor that would accelerate the performance of the Android Emulator on x86 based laptops. After helping ship the Motorola Razr i (xt890) Android phone with Intel technology inside and x86 optimized apps on the device, I made the move to the Android team at Google. With my experience in developing Android apps, and shipping Android developer tools, the Android developer tools team was a natural fit.

Interestingly, I attended the Google I/O the year Android Studio was first revealed as an attendee, and the following year I was working on the team to bring Android Studio to its Beta release at the following years Google I/O.

What unique perspective or experience do you bring to the Android Studio team, and how does it influence your work?

Unique experiences I bring include:

Technical Translation - In my prior roles, I worked with highly technical teams, and learned how to take absurd technical concepts and present them to different audiences of different technical skill levels. And in the reverse, I worked with many non-technical customers and colleagues and learned how to translate their pain points into product opportunities solved with technical solutions and innovation.
User Empathy - Previously, I was a software developer, and I regularly like to code on small side projects, and really enjoy spending time with developers who use Android Studio. From first-hand experience and user engagement, I regularly bring in the voice of the user into the discussion from the inception of a product idea to the final stages of the release process.
UX Design Sense - In a previous career, I designed and created websites, and user interfaces for software. I developed an eye for good UX design and flows particularly in technical software products. These skills aid in complementing the dedicated UX design team in Android Studio, and aids in avoiding productivity pitfalls with poor product and UX flows.

In your opinion, what is the most impactful feature or improvement the Android team has introduced in recent years, and why?

It’s hard to nail down just one, but the top three are:

1) product quality

2) integration of Gemini and

3) integrations with Crashlytics and Play with App Quality Insights.

The most impactful feature we worked on is product quality. We treat quality, especially the core code editing experience as a feature. If a developer can’t write a line of code and deploy it to a device, then everything else is secondary. Since Android is always evolving, it is an on-going effort but critical for the team to stay focused on.

On top of quality, thoughtful integration of Gemini into Android Studio is a real accelerate for app development. Our focus with AI is to make Android developers more productive, and make the harder tasks and toil easier. So from AI powered code completion, or built-in Gemini chat for Android app development, to enhancing existing tools with AI such as using Gemini to generate Jetpack Compose UI Previews, we are just at the beginning of leveraging AI to make Android app developers more productive.

Lastly, with App Quality Insights, it is now much easier for app developers to address the performance and quality issues found with Firebase Crashlytics and Android Vitals from Google Play. Surfacing these issues right next to source code and source control, make resolving issues much faster and intuitive.

How does the Android Studio team ensure that products or features meet the ever-changing needs of developers?

First step, the Android Studio team works hand-in-hand with the Android OS team so we strive to deliver developer tools in concert with new Android OS and API changes so developers are ready to adopt new Android platform capability into their apps. Then, we constantly review and prioritize developer feedback received via our issue tracker or via our bi-annaul developer survey we post on the Android Developers site. When we can, we sometimes engage with developers via various social media channels. And lastly, we regularly interview developers at various experience levels, and regions around the world in targeted User Research studies.

What advice would you give to aspiring Android developers who are just starting their journey?

Start with a robust set of code labs and tutorials.
Get inspired on the possibilities of Android and what you can build.
Join the Android developer community:

Deploy with Confidence

Inspired by Jamal's journey and dedication to empowering developers? Explore the latest Android Studio features, including App Quality Insights, to improve your app's performance and address issues quickly.

Stay tuned

Don't miss the next installment of our "Meet the Android Studio Team" series, where we'll introduce you to another amazing member of our team and share their unique journey. Stay tuned for more!

Find Jamal Eason on LinkedIn and X.

Source: Android Developers Blog

Meet the Android Studio Team: A Conversation with Product Manager, Paris Hsu

Posted by Ashley Tschudin – Social Media Specialist, MTP at Google

Welcome to "Meet the Android Studio Team"; a short blog series where we pull back the curtain and introduce you to the passionate people who build your favorite Android development tools. Get to know the talented minds – engineers, designers, product managers, and more – who pour their hearts into crafting the best possible experience for Android developers.

Join us each week to meet a new member of the team and explore their unique perspectives.

Paris Hsu: Empowering Android developers with Compose tools

Meet Paris Hsu, a Product Manager at Google passionate about empowering developers to build incredible Android apps.

Her journey to the Android Studio team started with a serendipitous internship at Microsoft, where she discovered the power of developer tools. Now, as part of the UI Tools team, Paris champions intuitive solutions that streamline the development process, like the innovative Compose Tools suite.

In this installment of "Meet the Android Studio Team," Paris shares insights into her work, the importance of developer feedback, and her dream Android feature (hint: it involves acing that forehand).

Can you tell us about your journey to becoming a part of the Android Studio team? What sparked your interest in Android development?

Honestly, I joined a bit by chance! The summer before my last year of grad school, I was in the Microsoft's Garage incubator internship program. Our project, InkToCode, turned handwritten designs into code. It was my first experience building developer tools and made me realize how powerful developer tools can be, which led me to the Android Studio team. Now, after 6 years, I'm constantly amazed by what Android developers create – from innovative productivity apps to immersive games. It's incredibly rewarding to build tools that empower developers to create more.

In your opinion, what is the most impactful feature or improvement the Android Studio team has introduced in recent years, and why?

As part of the UI Tools team in Android Studio, I'm biased towards Compose Tools! Our team spent a lot of time rethinking how we can take a code-first approach for tools as we transition the community for XML to Compose. Features like the Compose Preview and its submodes (Interactive, Animation, Deploy preview) enable fast UI iteration, while features such as Layout Inspector or Compose UI Check helps find and diagnose UI issues with ease. We are also exploring ways to apply multimodal AI into these tools to help developers write more high quality, adaptive, and inclusive Compose code quicker.

How does the Android Studio team ensure that products or features meet the ever-changing needs of developers?

We are constantly engaging and listening to developer feedback to ensure we are meeting their needs! some examples:

Direct feedback: UXR studies, Annual developer surveys, and Buganizer reports provide valuable insights.

Early access: We release Early Access Programs (EAPs) for new features, allowing developers to test them and provide feedback before official launch.

Community engagement: We have advisory boards with experienced Android developers, gather feedback from Google Developer Experts (GDEs), and attend conferences to connect directly with the community.

How does the Studio team contribute to Google's broader vision for the Android platform?

I think Android Studio contributes to Google's broader mission by providing Android developers with powerful and intuitive tools. This way, developers are empowered to create amazing apps that bring the best of Google's services and information to our users. Whether it's accessing knowledge through Search, leveraging Gemini, staying connected with Maps, or enjoying entertainment on YouTube, Android Studio helps developers build the experiences that connect people to what matters most.

If you could wave a magic wand and add one dream feature to the Android universe, what would it be and why?

Anyone who knows me knows that I am recently super obsessed with tennis. I would love to see more coaching wearables (e.g. Pixel Watch, Pixel Racket?!). I would love real-time feedback on my serve and especially forehand stroke analysis.

Learn more about Compose Tools

Inspired by Paris’ passion for empowering developers to build incredible Android apps? To learn more about how Compose Tools can streamline your app development process, check out the Compose Tools documentation and get started with the Jetpack Compose Tutorial.

Stay tuned

Keep an eye out for the next installment in our “Meet the Android Studio Team” series, where we’ll shine the spotlight on another team member and delve into their unique insights.

Find Paris Hsu on LinkedIn, X, and Medium.

Source: Android Developers Blog

ScreenAI: A visual language model for UI and visually-situated language understanding

Posted by Srinivas Sunkara and Gilles Baechler, Software Engineers, Google Research

Screen user interfaces (UIs) and infographics, such as charts, diagrams and tables, play important roles in human communication and human-machine interaction as they facilitate rich and interactive user experiences. UIs and infographics share similar design principles and visual language (e.g., icons and layouts), that offer an opportunity to build a single model that can understand, reason, and interact with these interfaces. However, because of their complexity and varied presentation formats, infographics and UIs present a unique modeling challenge.

To that end, we introduce “ScreenAI: A Vision-Language Model for UI and Infographics Understanding”. ScreenAI improves upon the PaLI architecture with the flexible patching strategy from pix2struct. We train ScreenAI on a unique mixture of datasets and tasks, including a novel Screen Annotation task that requires the model to identify UI element information (i.e., type, location and description) on a screen. These text annotations provide large language models (LLMs) with screen descriptions, enabling them to automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. At only 5B parameters, ScreenAI achieves state-of-the-art results on UI- and infographic-based tasks (WebSRC and MoTIF), and best-in-class performance on Chart QA, DocVQA, and InfographicVQA compared to models of similar size. We are also releasing three new datasets: Screen Annotation to evaluate the layout understanding capability of the model, as well as ScreenQA Short and Complex ScreenQA for a more comprehensive evaluation of its QA capability.

ScreenAI

ScreenAI’s architecture is based on PaLI, composed of a multimodal encoder block and an autoregressive decoder. The PaLI encoder uses a vision transformer (ViT) that creates image embeddings and a multimodal encoder that takes the concatenation of the image and text embeddings as input. This flexible architecture allows ScreenAI to solve vision tasks that can be recast as text+image-to-text problems.

On top of the PaLI architecture, we employ a flexible patching strategy introduced in pix2struct. Instead of using a fixed-grid pattern, the grid dimensions are selected such that they preserve the native aspect ratio of the input image. This enables ScreenAI to work well across images of various aspect ratios.

The ScreenAI model is trained in two stages: a pre-training stage followed by a fine-tuning stage. First, self-supervised learning is applied to automatically generate data labels, which are then used to train ViT and the language model. ViT is frozen during the fine-tuning stage, where most data used is manually labeled by human raters.

ScreenAI model architecture.

Data generation

To create a pre-training dataset for ScreenAI, we first compile an extensive collection of screenshots from various devices, including desktops, mobile, and tablets. This is achieved by using publicly accessible web pages and following the programmatic exploration approach used for the RICO dataset for mobile apps. We then apply a layout annotator, based on the DETR model, that identifies and labels a wide range of UI elements (e.g., image, pictogram, button, text) and their spatial relationships. Pictograms undergo further analysis using an icon classifier capable of distinguishing 77 different icon types. This detailed classification is essential for interpreting the subtle information conveyed through icons. For icons that are not covered by the classifier, and for infographics and images, we use the PaLI image captioning model to generate descriptive captions that provide contextual information. We also apply an optical character recognition (OCR) engine to extract and annotate textual content on screen. We combine the OCR text with the previous annotations to create a detailed description of each screen.

A mobile app screenshot with generated annotations that include UI elements and their descriptions, e.g., TEXT elements also contain the text content from OCR, IMAGE elements contain image captions, LIST_ITEMs contain all their child elements.

LLM-based data generation

We enhance the pre-training data's diversity using PaLM 2 to generate input-output pairs in a two-step process. First, screen annotations are generated using the technique outlined above, then we craft a prompt around this schema for the LLM to create synthetic data. This process requires prompt engineering and iterative refinement to find an effective prompt. We assess the generated data's quality through human validation against a quality threshold.

You only speak JSON. Do not write text that isn’t JSON.
You are given the following mobile screenshot, described in words. Can you generate 5 questions regarding the content of the screenshot as well as the corresponding short answers to them? 

The answer should be as short as possible, containing only the necessary information. Your answer should be structured as follows:
questions: [
{{question: the question,
    answer: the answer
}},
 ...
]

{THE SCREEN SCHEMA}

A sample prompt for QA data generation.

By combining the natural language capabilities of LLMs with a structured schema, we simulate a wide range of user interactions and scenarios to generate synthetic, realistic tasks. In particular, we generate three categories of tasks:

Question answering: The model is asked to answer questions regarding the content of the screenshots, e.g., “When does the restaurant open?”
Screen navigation: The model is asked to convert a natural language utterance into an executable action on a screen, e.g., “Click the search button.”
Screen summarization: The model is asked to summarize the screen content in one or two sentences.

Block diagram of our workflow for generating data for QA, summarization and navigation tasks using existing ScreenAI models and LLMs. Each task uses a custom prompt to emphasize desired aspects, like questions related to counting, involving reasoning, etc.

LLM-generated data. Examples for screen QA, navigation and summarization. For navigation, the action bounding box is displayed in red on the screenshot.

Experiments and results

As previously mentioned, ScreenAI is trained in two stages: pre-training and fine-tuning. Pre-training data labels are obtained using self-supervised learning and fine-tuning data labels comes from human raters.

We fine-tune ScreenAI using public QA, summarization, and navigation datasets and a variety of tasks related to UIs. For QA, we use well established benchmarks in the multimodal and document understanding field, such as ChartQA, DocVQA, Multi page DocVQA, InfographicVQA, OCR VQA, Web SRC and ScreenQA. For navigation, datasets used include Referring Expressions, MoTIF, Mug, and Android in the Wild. Finally, we use Screen2Words for screen summarization and Widget Captioning for describing specific UI elements. Along with the fine-tuning datasets, we evaluate the fine-tuned ScreenAI model using three novel benchmarks:

Screen Annotation: Enables the evaluation model layout annotations and spatial understanding capabilities.
ScreenQA Short: A variation of ScreenQA, where its ground truth answers have been shortened to contain only the relevant information that better aligns with other QA tasks.
Complex ScreenQA: Complements ScreenQA Short with more difficult questions (counting, arithmetic, comparison, and non-answerable questions) and contains screens with various aspect ratios.

The fine-tuned ScreenAI model achieves state-of-the-art results on various UI and infographic-based tasks (WebSRC and MoTIF) and best-in-class performance on Chart QA, DocVQA, and InfographicVQA compared to models of similar size. ScreenAI achieves competitive performance on Screen2Words and OCR-VQA. Additionally, we report results on the new benchmark datasets introduced to serve as a baseline for further research.

Comparing model performance of ScreenAI with state-of-the-art (SOTA) models of similar size.

Next, we examine ScreenAI’s scaling capabilities and observe that across all tasks, increasing the model size improves performances and the improvements have not saturated at the largest size.

Model performance increases with size, and the performance has not saturated even at the largest size of 5B params.

Conclusion

We introduce the ScreenAI model along with a unified representation that enables us to develop self-supervised learning tasks leveraging data from all these domains. We also illustrate the impact of data generation using LLMs and investigate improving model performance on specific aspects with modifying the training mixture. We apply all of these techniques to build multi-task trained models that perform competitively with state-of-the-art approaches on a number of public benchmarks. However, we also note that our approach still lags behind large models and further research is needed to bridge this gap.

Acknowledgements

This project is the result of joint work with Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Carbune, Jason Lin, Jindong Chen and Abhanshu Sharma. We thank Fangyu Liu, Xi Chen, Efi Kokiopoulou, Jesse Berent, Gabriel Barcik, Lukas Zilka, Oriana Riva, Gang Li,Yang Li, Radu Soricut, and Tania Bedrax-Weiss for their insightful feedback and discussions, along with Rahul Aralikatte, Hao Cheng and Daniel Kim for their support in data preparation. We also thank Jay Yagnik, Blaise Aguera y Arcas, Ewa Dominowska, David Petrou, and Matt Sharifi for their leadership, vision and support. We are very grateful toTom Small for helping us create the animation in this post.

Source: Google AI Blog

Enabling conversational interaction on mobile with LLMs

Posted by Bryan Wang, Student Researcher, and Yang Li, Research Scientist, Google Research

Intelligent assistants on mobile devices have significantly advanced language-based interactions for performing simple daily tasks, such as setting a timer or turning on a flashlight. Despite the progress, these assistants still face limitations in supporting conversational interactions in mobile user interfaces (UIs), where many user tasks are performed. For example, they cannot answer a user's question about specific information displayed on a screen. An agent would need to have a computational understanding of graphical user interfaces (GUIs) to achieve such capabilities.

Prior research has investigated several important technical building blocks to enable conversational interaction with mobile UIs, including summarizing a mobile screen for users to quickly understand its purpose, mapping language instructions to UI actions and modeling GUIs so that they are more amenable for language-based interaction. However, each of these only addresses a limited aspect of conversational interaction and requires considerable effort in curating large-scale datasets and training dedicated models. Furthermore, there is a broad spectrum of conversational interactions that can occur on mobile UIs. Therefore, it is imperative to develop a lightweight and generalizable approach to realize conversational interaction.

In “Enabling Conversational Interaction with Mobile UI using Large Language Models”, presented at CHI 2023, we investigate the viability of utilizing large language models (LLMs) to enable diverse language-based interactions with mobile UIs. Recent pre-trained LLMs, such as PaLM, have demonstrated abilities to adapt themselves to various downstream language tasks when being prompted with a handful of examples of the target task. We present a set of prompting techniques that enable interaction designers and developers to quickly prototype and test novel language interactions with users, which saves time and resources before investing in dedicated datasets and models. Since LLMs only take text tokens as input, we contribute a novel algorithm that generates the text representation of mobile UIs. Our results show that this approach achieves competitive performance using only two data examples per task. More broadly, we demonstrate LLMs’ potential to fundamentally transform the future workflow of conversational interaction design.

Animation showing our work on enabling various conversational interactions with mobile UI using LLMs.

Prompting LLMs with UIs

LLMs support in-context few-shot learning via prompting — instead of fine-tuning or re-training models for each new task, one can prompt an LLM with a few input and output data exemplars from the target task. For many natural language processing tasks, such as question-answering or translation, few-shot prompting performs competitively with benchmark approaches that train a model specific to each task. However, language models can only take text input, while mobile UIs are multimodal, containing text, image, and structural information in their view hierarchy data (i.e., the structural data containing detailed properties of UI elements) and screenshots. Moreover, directly inputting the view hierarchy data of a mobile screen into LLMs is not feasible as it contains excessive information, such as detailed properties of each UI element, which can exceed the input length limits of LLMs.

To address these challenges, we developed a set of techniques to prompt LLMs with mobile UIs. We contribute an algorithm that generates the text representation of mobile UIs using depth-first search traversal to convert the Android UI's view hierarchy into HTML syntax. We also utilize chain of thought prompting, which involves generating intermediate results and chaining them together to arrive at the final output, to elicit the reasoning ability of the LLM.

Animation showing the process of few-shot prompting LLMs with mobile UIs.

Our prompt design starts with a preamble that explains the prompt’s purpose. The preamble is followed by multiple exemplars consisting of the input, a chain of thought (if applicable), and the output for each task. Each exemplar’s input is a mobile screen in the HTML syntax. Following the input, chains of thought can be provided to elicit logical reasoning from LLMs. This step is not shown in the animation above as it is optional. The task output is the desired outcome for the target tasks, e.g., a screen summary or an answer to a user question. Few-shot prompting can be achieved with more than one exemplar included in the prompt. During prediction, we feed the model the prompt with a new input screen appended at the end.

Experiments

We conducted comprehensive experiments with four pivotal modeling tasks: (1) screen question-generation, (2) screen summarization, (3) screen question-answering, and (4) mapping instruction to UI action. Experimental results show that our approach achieves competitive performance using only two data examples per task.

Task 1: Screen question generation

Given a mobile UI screen, the goal of screen question-generation is to synthesize coherent, grammatically correct natural language questions relevant to the UI elements requiring user input.

We found that LLMs can leverage the UI context to generate questions for relevant information. LLMs significantly outperformed the heuristic approach (template-based generation) regarding question quality.

Example screen questions generated by the LLM. The LLM can utilize screen contexts to generate grammatically correct questions relevant to each input field on the mobile UI, while the template approach falls short.

We also revealed LLMs' ability to combine relevant input fields into a single question for efficient communication. For example, the filters asking for the minimum and maximum price were combined into a single question: “What’s the price range?

We observed that the LLM could use its prior knowledge to combine multiple related input fields to ask a single question.

In an evaluation, we solicited human ratings on whether the questions were grammatically correct (Grammar) and relevant to the input fields for which they were generated (Relevance). In addition to the human-labeled language quality, we automatically examined how well LLMs can cover all the elements that need to generate questions (Coverage F1). We found that the questions generated by LLM had almost perfect grammar (4.98/5) and were highly relevant to the input fields displayed on the screen (92.8%). Additionally, LLM performed well in terms of covering the input fields comprehensively (95.8%).

	Template	2-shot LLM
Grammar	3.6 (out of 5)	4.98 (out of 5)
Relevance	84.1%	92.8%
Coverage F1	100%	95.8%

Task 2: Screen summarization

Screen summarization is the automatic generation of descriptive language overviews that cover essential functionalities of mobile screens. The task helps users quickly understand the purpose of a mobile UI, which is particularly useful when the UI is not visually accessible.

Our results showed that LLMs can effectively summarize the essential functionalities of a mobile UI. They can generate more accurate summaries than the Screen2Words benchmark model that we previously introduced using UI-specific text, as highlighted in the colored text and boxes below.

Example summary generated by 2-shot LLM. We found the LLM is able to use specific text on the screen to compose more accurate summaries.

Interestingly, we observed LLMs using their prior knowledge to deduce information not presented in the UI when creating summaries. In the example below, the LLM inferred the subway stations belong to the London Tube system, while the input UI does not contain this information.

LLM uses its prior knowledge to help summarize the screens.

Human evaluation rated LLM summaries as more accurate than the benchmark, yet they scored lower on metrics like BLEU. The mismatch between perceived quality and metric scores echoes recent work showing LLMs write better summaries despite automatic metrics not reflecting it.

Left: Screen summarization performance on automatic metrics. Right: Screen summarization accuracy voted by human evaluators.

Task 3: Screen question-answering

Given a mobile UI and an open-ended question asking for information regarding the UI, the model should provide the correct answer. We focus on factual questions, which require answers based on information presented on the screen.

Example results from the screen QA experiment. The LLM significantly outperforms the off-the-shelf QA baseline model.

We report performance using four metrics: Exact Matches (identical predicted answer to ground truth), Contains GT (answer fully containing ground truth), Sub-String of GT (answer is a sub-string of ground truth), and the Micro-F1 score based on shared words between the predicted answer and ground truth across the entire dataset.

Our results showed that LLMs can correctly answer UI-related questions, such as "what's the headline?". The LLM performed significantly better than baseline QA model DistillBERT, achieving a 66.7% fully correct answer rate. Notably, the 0-shot LLM achieved an exact match score of 30.7%, indicating the model's intrinsic question answering capability.

Models

Exact Matches

Contains GT

Sub-String of GT

Micro-F1

0-shot LLM

30.7%

6.5%

5.6%

31.2%

1-shot LLM

65.8%

10.0%

7.8%

62.9%

2-shot LLM

66.7%

12.6%

5.2%

64.8%

DistillBERT

36.0%

8.5%

9.9%

37.2%

Task 4: Mapping instruction to UI action

Given a mobile UI screen and natural language instruction to control the UI, the model needs to predict the ID of the object to perform the instructed action. For example, when instructed with "Open Gmail," the model should correctly identify the Gmail icon on the home screen. This task is useful for controlling mobile apps using language input such as voice access. We introduced this benchmark task previously.

Example using data from the PixelHelp dataset. The dataset contains interaction traces for common UI tasks such as turning on wifi. Each trace contains multiple steps and corresponding instructions.

We assessed the performance of our approach using the Partial and Complete metrics from the Seq2Act paper. Partial refers to the percentage of correctly predicted individual steps, while Complete measures the portion of accurately predicted entire interaction traces. Although our LLM-based method did not surpass the benchmark trained on massive datasets, it still achieved remarkable performance with just two prompted data examples.

Models	Partial	Complete
0-shot LLM	1.29	0.00
1-shot LLM (cross-app)	74.69	31.67
2-shot LLM (cross-app)	75.28	34.44
1-shot LLM (in-app)	78.35	40.00
2-shot LLM (in-app)	80.36	45.00
Seq2Act	89.21	70.59

Takeaways and conclusion

Our study shows that prototyping novel language interactions on mobile UIs can be as easy as designing a data exemplar. As a result, an interaction designer can rapidly create functioning mock-ups to test new ideas with end users. Moreover, developers and researchers can explore different possibilities of a target task before investing significant efforts into developing new datasets and models.

We investigated the feasibility of prompting LLMs to enable various conversational interactions on mobile UIs. We proposed a suite of prompting techniques for adapting LLMs to mobile UIs. We conducted extensive experiments with the four important modeling tasks to evaluate the effectiveness of our approach. The results showed that compared to traditional machine learning pipelines that consist of expensive data collection and model training, one could rapidly realize novel language-based interactions using LLMs while achieving competitive performance.

Acknowledgements

We thank our paper co-author Gang Li, and appreciate the discussions and feedback from our colleagues Chin-Yi Cheng, Tao Li, Yu Hsiao, Michael Terry and Minsuk Chang. Special thanks to Muqthar Mohammad and Ashwin Kakarla for their invaluable assistance in coordinating data collection. We thank John Guilyard for helping create animations and graphics in the blog.

Source: Google AI Blog

Clue’s development speed improves 3X after rebuilding the app with Jetpack Compose

Posted by the Android team

Clue is a freemium menstrual health application founded in 2012 and was among the earliest developers in femtech. The app helps women and people who menstruate track their cycles and serves 11 million monthly active users in over 190 countries. Additional features, including tools for tracking prenatal and postpartum health, are available through the app’s subscription tier, Clue Plus.

Having access to streamlined and easily digestible menstrual health data can be an invaluable resource for people who menstruate, and Clue has supported these insights for Android users for over a decade. As with any codebase, however, the Clue app inherited technical debt. This limited the team’s ability to push changes and features quickly, scale developer efforts, and provide a modern UI to its users.

Clue previously relied heavily on custom views that made extending the existing codebase difficult and required time-consuming testing methods that slowed the development process. Clue’s codebase had additionally amassed UI inconsistencies from hard-coded theme values such as colors and sizes, and in 2022 Clue’s engineers recognized that they needed a more efficient and flexible solution. They ultimately landed on migrating to Jetpack Compose, Android’s modern toolkit for building native UI.

“We decided that a complete rewrite of the application, with a specific emphasis on the UI layer, would be the best course of action,” said Moctezuma Rojas, an Android developer at Clue. “This decision was based on the fact that it would enable us to have a more efficient and faster development cycle, quickly implement features that would have taken much longer to develop using views, and make our code more testable.”

Building a faster and more efficient codebase with Compose

The Clue team saw immediate benefits by rewriting its app with Compose. For one, a faster, more efficient testing and development cycle significantly reduced the time and effort necessary to improve the codebase while reducing bugs and errors. Compose also enabled Clue’s engineers to implement features faster than they could with Views.

Migrating the app to Compose resulted in improved testability for screens, faster development from ideation to release, and better standardization processes that aligned with the best practices recommended by Android developer resources. Compose also helped the Clue team double—and in some cases, triple—their development speed when compared to the old codebase.

“With the traditional view system, adding new features, visual representations, or user interactions was difficult due to the need for custom view creation and maintenance. However, by utilizing Jetpack Compose, we've been able to effortlessly develop and expand the Cycle View feature without any limitations in adding elements,” said Moctezuma.

Photo of Moctezuma Rojas, Android developer at Clue,smiling while snuggling a cat, with quote text which reads, '...By utilizing Jetpack Compose, we've been able to effortlessly develop and expand the Cycle View feature without any limitations in adding elements.'

Compose also helped Clue’s developers quickly overhaul several other important features within the application, including Calendar View, Analysis View, and the account management and settings screens.

More creativity made possible with Compose

Compose enabled developers to make Clue screens more intuitive, improve scrolling, and deploy a custom color system and component library that aligned with the brand—a huge win for the team. Previously, adding new features, visual representations, or user interactions was complicated because they required creating a custom view and ongoing maintenance.

Compose APIs made it much easier to test UI so Clue developers felt more confident about what they were shipping to users. As an added benefit, Clue developers now have more space for exploring UX innovation.

“The custom dynamic theming allows designers to freely explore their creativity without being limited by technological constraints,” said Moctezuma. “It provides a flexible and scalable approach to styling that can be easily adapted as our app evolves and grows, resulting in a visually appealing and cohesive user experience.”

All of these changes vastly improved the user experience for Clue subscribers, resulting in fewer error messages and bug reports. The Clue team also says that using Compose has enabled them to identify areas of improvement in the app’s code that could have potentially impacted its users.

“Compose increases developer velocity by eliminating boilerplate code, works seamlessly with the existing code base thanks to its Interoperability APIs, and improves UI testing—which has always been painful in Android development,” said Tilbe Saltan, a senior Android developer at Clue.

Continued success with Jetpack Compose

Compose has improved each subsequent app release and made preview and live editing features more reliable for Clue engineers, allowing for a more flexible development experience from start to finish. Since implementing Compose, the Clue team has also seen excitement from prospective candidates interested in working on the app so they can work with more modern development technologies.

“The future of Compose holds many potential development areas that could benefit developers and companies. As Compose continues to evolve, we can expect to see more improvements in performance, stability, tooling, and cross-platform support, which will make it an even more compelling choice for building high-quality UIs,” said Tilbe.

Get started

Optimize your UI development with Jetpack Compose.

Source: Android Developers Blog

Material Design 3 for Compose hits stable

Posted by Gurupreet Singh, Developer Advocate; Android

Today marks the first stable release of Compose Material 3. The library allows you to build Jetpack Compose UIs with Material Design 3, the next evolution of Material Design. You can start using Material Design 3 in your apps today!

Note: The terms "Material Design 3", "Material 3", and "M3" are used interchangeably.

Material 3 includes updated theming and components, exclusive features like dynamic color, and is designed to be aligned with the latest Android visual style and system UI.

Multiple apps using Material Design 3 theming

You can start using Material Design 3 in your apps by adding the Compose Material 3 dependency to your build.gradle files:

// Add dependency in module build.gradle

implementation "androidx.compose.material3:material3:$material3_version"

Note: See the latest M3 versions on the Compose Material 3 releases page.

Color schemes

Material 3 brings extensive, finer grained color customisation, and comes with both light and dark color scheme support out of the box. The Material Theme Builder allows you to generate a custom color scheme using core colors, and optionally export Compose theming code. You can read more about color schemes and color roles.

Material Theme Builder to export Material 3 color schemes

Dynamic color

Dynamic color derives from the user’s wallpaper. The colors can be applied to apps and the system UI.

Dynamic color is available on Android 12 (API level 31) and above. If dynamic color is available, you can set up a dynamic ColorScheme. If not, you should fall back to using a custom light or dark ColorScheme.

Reply Dynamic theming from wallpaper(Left) and Default app theming (Right)

The ColorScheme class provides builder functions to create both dynamic and custom light and dark color schemes:

Theme.kt

// Dynamic color is available on Android 12+
val dynamicColor = Build.VERSION.SDK_INT >= Build.VERSION_CODES.S
val colorScheme = when {
dynamicColor && darkTheme -> dynamicDarkColorScheme(LocalContext.current)
dynamicColor && !darkTheme -> dynamicLightColorScheme(LocalContext.current)
darkTheme -> darkColorScheme(...)
else -> lightColorScheme(...)
}

MaterialTheme(
colorScheme = colorScheme,
typography = typography,
shapes = shapes
) {
// M3 App content
}

Material components

The Compose Material 3 APIs contain a wide range of both new and evolved Material components, with more planned for future versions. Many of the Material components, like Card, RadioButton and CheckBox, are no longer considered experimental; their APIs are stable and they can be used without the ExperimentalMaterial3Api annotation.

The M3 Switch component has a brand new UI refresh with accessibility-compliant minimum touch target size support, color mappings, and optional icon support in the switch thumb. The touch target is bigger, and the thumb size increases on user interaction, providing feedback to the user that the thumb is being interacted with.

Material 3 Switch thumb interaction

Switch(
checked = isChecked,
onCheckedChange = { /*...*/ },
thumbContent = {
Icon(
imageVector = Icons.Default.Check,
contentDescription = stringResource(id = R.string.switch_check)
)
},
)

Navigation drawer components now provide wrapper sheets for content to change colors, shapes, and elevation independently.

Navigation drawer component	Content
ModalNavigationDrawer	ModalDrawerSheet
PermanentNavigationDrawer	PermanentDrawerSheet
DismissableNavigationDrawer	DismissableDrawerSheet

ModalNavigationDrawer with content wrapped in ModalDrawerSheet

ModalNavigationDrawer {
ModalDrawerSheet(
drawerShape = MaterialTheme.shapes.small,
drawerContainerColor = MaterialTheme.colorScheme.primaryContainer,
drawerContentColor = MaterialTheme.colorScheme.onPrimaryContainer,
drawerTonalElevation = 4.dp,
) {
DESTINATIONS.forEach { destination ->
NavigationDrawerItem(
selected = selectedDestination == destination.route,
onClick = { ... },
icon = { ... },
label = { ... }
)
}
}
}

We have a brand new CenterAlignedTopAppBar in addition to already existing app bars. This can be used for the main root page in an app: you can display the app name or page headline with home and action icons.

Material CenterAlignedTopAppBar with home and action items.

CenterAlignedTopAppBar(
title = {
Text(stringResources(R.string.top_stories))
},
scrollBehavior = scrollBehavior,
navigationIcon = { /* Navigation Icon */},
actions = { /* App bar actions */}
)

See the latest M3 components and layouts on the Compose Material 3 API reference overview. Keep an eye on the releases page for new and updated APIs.

Typography

Material 3 simplified the naming and grouping of typography to:

Display
Headline
Title
Body
Label

There are large, medium, and small sizes for each, providing a total of 15 text style variations.

The Typography constructor offers defaults for each style, so you can omit any parameters that you don’t want to customize:

val typography = Typography(
titleLarge = TextStyle(
fontWeight = FontWeight.SemiBold,
fontSize = 22.sp,
lineHeight = 28.sp,
letterSpacing = 0.sp
),
titleMedium = TextStyle(
fontWeight = FontWeight.SemiBold,
fontSize = 16.sp,
lineHeight = 24.sp,
letterSpacing = 0.15.sp
),
...
}

You can customize your typography by changing default values of TextStyle and font-related properties like fontFamily and letterSpacing.

bodyLarge = TextStyle(
fontWeight = FontWeight.Normal,
fontFamily = FontFamily.SansSerif,
fontStyle = FontStyle.Italic,
fontSize = 16.sp,
lineHeight = 24.sp,
letterSpacing = 0.15.sp,
baselineShift = BaselineShift.Subscript
)

Shapes

The Material 3 shape scale defines the style of container corners, offering a range of roundedness from square to fully circular.

There are different sizes of shapes:

Extra small
Small
Medium
Large
Extra large

Material Design 3 shapes used in various components as default value.

Each shape has a default value but you can override it:

val shapes = Shapes(
extraSmall = RoundedCornerShape(4.dp),
small = RoundedCornerShape(8.dp),
medium = RoundedCornerShape(12.dp),
large = RoundedCornerShape(16.dp),
extraLarge = RoundedCornerShape(28.dp)
)

You can read more about applying shape.

Window size classes

Jetpack Compose and Material 3 provide window size artifacts that can help make your apps adaptive. You can start by adding the Compose Material 3 window size class dependency to your build.gradle files:

// Add dependency in module build.gradle

implementation "androidx.compose.material3:material3-window-size-class:$material3_version"

Window size classes group sizes into standard size buckets, which are breakpoints that are designed to optimize your app for most unique cases.

WindowWidthSize Class for grouping devices in different size buckets

See the Reply Compose sample to learn more about adaptive apps and the window size classes implementation.

Window insets support

M3 components, like top app bars, navigation drawers, bar, and rail, include built-in support for window insets. These components, when used independently or with Scaffold, will automatically handle insets determined by the status bar, navigation bar, and other parts of the system UI.

Scaffold now supports the contentWindowInsets parameter which can help to specify insets for the scaffold content.

Scaffold insets are only taken into consideration when a topBar or bottomBar is not present in Scaffold, as these components handle insets at the component level.

Scaffold(
contentWindowInsets = WindowInsets(16.dp)
) {
// Scaffold content
}

Resources

With Compose Material 3 reaching stable, it’s a great time to start learning all about it and get ready to adopt it in your apps. Check out the resources below to get started.

Fully Material 3 and Compose sample Reply
The Material 3 guidance to start adding Material 3 to your apps
The Material 2 to Material 3 migration guide
The Jetpack Compose Samples GitHub repository, where you’ll find a variety of up-to-date samples using Material 3
The Compose community on StackOverflow and the Kotlin Slack group
The bug tracker, where you can report an issue and track feature requests

Source: Android Developers Blog

Monster Mash: A Sketch-Based Tool for Casual 3D Modeling and Animation

Posted by Cassidy Curtis, Visual Designer and David Salesin, Principal Scientist, Google Research

3D computer animation is a time-consuming and highly technical medium — to complete even a single animated scene requires numerous steps, like modeling, rigging and animating, each of which is itself a sub-discipline that can take years to master. Because of its complexity, 3D animation is generally practiced by teams of skilled specialists and is inaccessible to almost everyone else, despite decades of advances in technology and tools. With the recent development of tools that facilitate game character creation and game balance, a natural question arises: is it possible to democratize the 3D animation process so it’s accessible to everyone?

To explore this concept, we start with the observation that most forms of artistic expression have a casual mode: a classical guitarist might jam without any written music, a trained actor could ad-lib a line or two while rehearsing, and an oil painter can jot down a quick gesture drawing. What these casual modes have in common is that they allow an artist to express a complete thought quickly and intuitively without fear of making a mistake. This turns out to be essential to the creative process — when each sketch is nearly effortless, it is possible to iteratively explore the space of possibilities far more effectively.

In this post, we describe Monster Mash, an open source tool presented at SIGGRAPH Asia 2020 that allows experts and amateurs alike to create rich, expressive, deformable 3D models from scratch — and to animate them — all in a casual mode, without ever having to leave the 2D plane. With Monster Mash, the user sketches out a character, and the software automatically converts it to a soft, deformable 3D model that the user can immediately animate by grabbing parts of it and moving them around in real time. There is also an online demo, where you can try it out for yourself.

Creating a walk cycle using Monster Mash. Step 1: Draw a character. Step 2: Animate it.

Creating a 2D Sketch
The insight that makes this casual sketching approach possible is that many 3D models, particularly those of organic forms, can be described by an ordered set of overlapping 2D regions. This abstraction makes the complex task of 3D modeling much easier: the user creates 2D regions by drawing their outlines, then the algorithm creates a 3D model by stitching the regions together and inflating them. The result is a simple and intuitive user interface for sketching 3D figures.

For example, suppose the user wants to create a 3D model of an elephant. The first step is to draw the body as a closed stroke (a). Then the user adds strokes to depict other body parts such as legs (b). Drawing those additional strokes as open curves provides a hint to the system that they are meant to be smoothly connected with the regions they overlap. The user can also specify that some new parts should go behind the existing ones by drawing them with the right mouse button (c), and mark other parts as symmetrical by double-clicking on them (d). The result is an ordered list of 2D regions.

Steps in creating a 2D sketch of an elephant.

Stitching and Inflation
To understand how a 3D model is created from these 2D regions, let’s look more closely at one part of the elephant. First, the system identifies where the leg must be connected to the body (a) by finding the segment (red) that completes the open curve. The system cuts the body’s front surface along that segment, and then stitches the front of the leg together with the body (b). It then inflates the model into 3D by solving a modified form of Poisson’s equation to produce a surface with a rounded cross-section (c). The resulting model (d) is smooth and well-shaped, but because all of the 3D parts are rooted in the drawing plane, they may intersect each other, resulting in a somewhat odd-looking “elephant”. These intersections will be resolved by the deformation system.

Illustration of the details of the stitching and inflation process. The schematic illustrations (b, c) are cross-sections viewed from the elephant’s front.

Layered Deformation
At this point we just have a static model — we need to give the user an easy way to pose the model, and also separate the intersecting parts somehow. Monster Mash’s layered deformation system, based on the well-known smooth deformation method as-rigid-as-possible (ARAP), solves both of these problems at once. What’s novel about our layered “ARAP-L” approach is that it combines deformation and other constraints into a single optimization framework, allowing these processes to run in parallel at interactive speed, so that the user can manipulate the model in real time.

The framework incorporates a set of layering and equality constraints, which move body parts along the z axis to prevent them from visibly intersecting each other. These constraints are applied only at the silhouettes of overlapping parts, and are dynamically updated each frame.

In steps (d) through (h) above, ARAP-L transforms a model from one with intersecting 3D parts to one with the depth ordering specified by the user. The layering constraints force the leg’s silhouette to stay in front of the body (green), and the body’s silhouette to stay behind the leg (yellow). Equality constraints (red) seal together the loose boundaries between the leg and the body.

Meanwhile, in a separate thread of the framework, we satisfy point constraints to make the model follow user-defined control points (described in the section below) in the xy-plane. This ARAP-L method allows us to combine modeling, rigging, deformation, and animation all into a single process that is much more approachable to the non-specialist user.

The model deforms to match the point constraints (red dots) while the layering constraints prevent the parts from visibly intersecting.

Animation
To pose the model, the user can create control points anywhere on the model’s surface and move them. The deformation system converges over multiple frames, which gives the model’s movement a soft and floppy quality, allowing the user to intuitively grasp its dynamic properties — an essential prerequisite for kinesthetic learning.

Because the effect of deformations converges over multiple frames, our system lends 3D models a soft and dynamic quality.

To create animation, the system records the user’s movements in real time. The user can animate one control point, then play back that movement while recording additional control points. In this way, the user can build up a complex action like a walk by layering animation, one body part at a time. At every stage of the animation process, the only task required of the user is to move points around in 2D, a low-risk workflow meant to encourage experimentation and play.

Conclusion
We believe this new way of creating animation is intuitive and can thus help democratize the field of computer animation, encouraging novices who would normally be unable to try it on their own as well as experts who often require fast iteration under tight deadlines. Here you can see a few of the animated characters that have been created using Monster Mash. Most of these were created in a matter of minutes.

A selection of animated characters created using Monster Mash. The original hand-drawn outline used to create each 3D model is visible as an inset above each character.

All of the code for Monster Mash is available as open source, and you can watch our presentation and read our paper from SIGGRAPH Asia 2020 to learn more. We hope this software will make creating 3D animations more broadly accessible. Try out the online demo and see for yourself!

Acknowledgements
Monster Mash is the result of a collaboration between Google Research, Czech Technical University in Prague, ETH Zürich, and the University of Washington. Key contributors include Marek Dvorožňák, Daniel Sýkora, Cassidy Curtis, Brian Curless, Olga Sorkine-Hornung, and David Salesin. We are also grateful to Hélène Leroux, Neth Nom, David Murphy, Samuel Leather, Pavla Sýkorová, and Jakub Javora for participating in the early interactive sessions.