Tag Archives: API

Producer java library for Data Lineage is now open source

Integrating OpenLineage producers with GCP Lineage just got a lot easier


What is Data Lineage

Data Lineage is a GCP feature that allows tracking data movement. This tool helps data owners and analysts detect anomalies in data flows, find connections between data sources and verify the potential consequences of planned changes in data pipelines.

Lineage is injected automatically for some Google Cloud products (BigQuery, Cloud Data Fusion, Cloud Composer, Dataproc, Vertex AI). That means, if Lineage integration with any of those products is enabled in the projects, data movements coming from executing jobs by these products will be reported to GCP Lineage.

For custom integrations, the API can be used to report and fetch lineage.

After injecting, lineage can be viewed in the Google Cloud console (available from DataCatalog UI, BigQuery UI, Vertex UI). There are two representations: graph view, with data sources as nodes and data movements as edges, and list view, a tabular representation. Lineage information can also be fetched from the API.

More information is available in the documentation.


GCP Lineage information model

We describe data flows using the following concepts:

  • Process is a definition of some data transformation. For example, a SQL or Spark script.
  • Run is an execution of a Process.
  • Lineage Event is a data transformation event. It is reported in context of a Run.
  • A Link represents a connection between two data sources, when data in the link’s Target depends on its Source. A Lineage Event contains a list of Links.

OpenLineage support

OpenLineage is an open standard for reporting lineage information. It unifies lineage reporting between systems, which means the events generated in this format can be consumed by any product supporting it. This leads to more flexibility: adding or replacing a lineage producer does not imply changing the consumer, and vice versa.

OpenLineage format is adopted by a number of lineage producers and consumers, meaning there is already tooling available to report lineage from/to those systems. GCP Lineage is one of those consumers: users can report events in OpenLineage format, see the resulting lineage on the UI, and query it via the API.

OpenLineage is the preferred method for reporting lineage in GCP Lineage. It is used by the Dataproc lineage integration. To find out more about sending OpenLineage events to GCP Lineage refer to the documentation.

After injecting lineage in OpenLineage format, it can be accessed in the same way as if it was injected via other API methods or automatically: from the Google Cloud console or the API.


Why producer library

The GCP Lineage producer library is an extension of the client library. Client libraries are recommended for calling Cloud APIs programmatically. They handle low level API call details, leaving the necessary user code simpler and shorter.

The producer library further simplifies integration by providing ready to use code needed to call the API from Java. It adds additional functionality such as synchronous and asynchronous clients, translating OpenLineage JSON messages to the API friendly format, error handling etc.

Using the producer library, all the code needed to send a request to GCP Lineage API is:

SyncLineageProducerClient client = SyncLineageProducerClient.create();
ProcessOpenLineageRunEventRequest request =
        ProcessOpenLineageRunEventRequest.newBuilder()
            .setParent(parent)
            .setOpenLineage(openLineageMessage)
            .build();
client.processOpenLineageRunEvent(request);

The field openLineageMessage here is a protobuf Struct that includes information about job execution, inputs and outputs and other metadata. The object model is described in the documentation. An example message is:

{
  "eventType": "START",
  "eventTime": "2023-04-04T13:21:16.098Z",
  "run": {
    "runId": "502483d6-3e3d-474f-9380-da565eaa7516",
    "facets": {
       "spark_properties": {
        "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.22.0/integration/spark",
        "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet",
        "properties": {
          "spark.master": "yarn",
          "spark.app.name": "sparkJobTest.py"
        }
      }
    }
  },
  "job": {
    "namespace": "project-name",
    "name": "cluster-name",
    "facets": {
    "jobType": {
        "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.22.0/integration/spark",
        "_schemaURL": "https://openlineage.io/spec/facets/2-0-3/JobTypeJobFacet.json#/$defs/JobTypeJobFacet",
        "processingType": "BATCH",
        "integration": "SPARK",
        "jobType": "SQL_JOB"
      },

    }
  },
  "inputs": [
    {
      "namespace": "bigquery",
      "name": "project.dataset.input_table",
    }],
  "outputs": [
   {
      "namespace": "bigquery",
      "name": "project.dataset.output_table",
    }],
  "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.18.0/integration/spark",
  "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent"
}

Learn more about building an OpenLineage message.


Best Practices for Constructing OpenLineage Messages

The openLineageMessage should follow the OpenLineage format. The fields that are required for correct parsing by the GCP Lineage API are:

job

mapped to Process

job.namespace

used to construct Process name

job.name

used to construct Process name

run

mapped to Run

run.runId

used to construct Run name

producer

URI identifying the producer of this metadata

eventTime

time of the data movement

schemaURL

URL pointing to the schema definition for this message

In addition to those, the fields used to create lineage are:

eventType

corresponds to the status of the Run

inputs

mapped to sources of links. Must be specified according to the naming conventions

outputs

mapped to targets of links. Must be specified according to the naming conventions

The GCP Lineage API supports OpenLineage major versions 1 and 2. For more information please refer to the documentation.


How to access GCP Lineage?

The code is now publicly available on GitHub. The library is also published to Maven.


GcpLineageTransport

To simplify integration with GCP Lineage, we offer GcpLineageTransport. It is available on the OpenLineage GitHub repository and is built to a separate maven artifact. It is built on top of the producer library mentioned above.

Using the transport minimises the code for sending events to GCP Lineage. The GcpLineageTransport can be configured as the event sink for any existing OpenLineage producer such as Airflow, Spark, and Flink. Find more information and examples on GCP Lineage.

By Mary Idamkina – Data Lineage

Google Meet provides additional privacy for livestreaming with new eCDN On-Premises API

What’s changing

Earlier this year, we introduced Enterprise Content Delivery Network (eCDN) to enhance livestreaming in  Google Meet. When configured by admins, eCDN has the potential to reduce bandwidth consumption to a fraction of the traffic volume through peer-assisted media delivery.

However, environments that have additional security requirements would not be able to benefit from the network traffic savings enabled by eCDN. That changes today with the introduction of the eCDN On-Premises API for Google Meet, which admins can use to configure their network for eCDN while keeping classified IP addresses and network information private. Specifically, IP addresses will be replaced with self-assigned peering group names and encrypted information for session description protocol (SDP) handshakes. This ensures that no IP information is shared with Google, so customers can take advantage of eCDN while adhering to their own security guidelines.


Admin console > Apps > Google Workspace > Google Meet > Meet video settings > eCDN


Who’s impacted

Admins

Why it’s important

The eCDN On-Premises API can be used to deploy eCDN for Google Meet live streaming in a way that allows the eCDN tracker service to optimize peering topologies without access to internal network information such as IP addresses or subnets. A customer-supplied service uses the API to replace all IP address information with arbitrary text labels. The service also manages encryption of SDP offers/answers using encryption keys that are never made available to Google. Any decryption needed by client peers is performed completely inside the customer's own network. No network information is sent outside the organization's network, not even to Google. This ensures that bandwidth-optimized media delivery via eCDN can also be implemented in sensitive environments without compromising organizations’ internal security guidelines.

Getting started

Rollout pace

Availability

  • Available for all Google Workspace customers

Resources


Now generally available: the Groups Editor & Groups Reader roles can now be provisioned for specific group types

What’s changing

At the beginning of the year, we launched the ability to assign the Groups Editor and Groups Reader roles for security groups or non-security groups in open beta. Beginning today, this feature is now generally available. Groups Admins have access to all groups. The new roles of Groups Editor and Groups Reader offer delegated admin permissions for groups, and can use conditions to limit access to sensitive groups as needed.

Getting started: 

Create and manage rubrics using the Google Classroom API

What’s changing 

The Google Classroom API enables third-party developers to manage classes, rosters, invitations and more in Google Classroom. Since 2019, teachers have been able to create or reuse a rubric for an assignment, however this capability did not previously exist in the Classroom API. To improve upon this experience, we’re excited to announce that developers can now manage assignment rubrics via the Classroom API

More specifically, developers can read and write rubrics using the API, and also see student submission scores broken down by the corresponding rubric criteria, rather than just accessing the total score, enabling deeper insights into student performance. 

Create and manage rubrics using the Google Classroom API

Who’s impacted

Admins and developers 


Why it’s important 

This update enables developers to create and manage rubrics on behalf of teachers at scale, and retrieve rubric-based grades to support more holistic student performance insights. 


Getting started

  • Admins: The Classroom API provides a RESTful interface for you to manage courses and rosters in Google Classroom. Learn more about the Classroom API overview. 
  • Developers:

Rollout pace 

  • Available now. 

Availability 

Available for Google Workspace: 
  • Education Plus 

Resources

Now generally available: configure third-party apps by select API scopes

What’s changing 

Earlier this year, we launched the ability to configure third-party apps by select API scopes to open beta. Beginning today, this feature is now generally available. 


This update gives admins more granular control. They can limit third-party app access to specific OAuth 2.0 scopes for Google APIs, like Drive or Gmail. This prevents apps from gaining additional access without admin consent, even if they request new API scopes in the future. This helps ensure data access is restricted to only what admins deem necessary.



Getting started


Rollout pace


Availability

  • Available to all Google Workspace customers, as well as Cloud Identity Free and Premium customers


Resources


Google Workspace Updates Weekly Recap – November 8, 2024

4 New updates

Unless otherwise indicated, the features below are available to all Google Workspace customers, and are fully launched or in the process of rolling out. Rollouts should take no more than 15 business days to complete if launching to both Rapid and Scheduled Release at the same time. If not, each stage of rollout should take no more than 15 business days to complete.




Import data into group chats using the Google Chat API 
In September, we introduced a feature through the Google Workspace Developer Preview Program that enables developers to create group chats in import mode using the spaces.create method when migrating to Google Chat from other messaging platforms. This week, we’re excited to announce that this is now generally available for Google Workspace developers. | Roll out to Rapid Release domains and Scheduled Release domains is complete. | Available to all Google Workspace customers. | Learn more about import mode. 


Search for and reuse pre-defined queries from BigQuery in Connected Sheets
Currently, users can define saved queries in BigQuery Studio and notebooks, but they cannot reuse those queries in Connected Sheets without copy/pasting them. This week, we’re excited to announce that users can now search for and reuse pre-defined queries directly from BigQuery to load Connected Sheets data. To do so, go to Connection Settings > Edit connection > Saved queries and query editor and search for your query by project. | Rolling out now to Rapid Release and Scheduled Release domains at an extended rollout pace (potentially longer than 15 days for feature visibility), with expectation completion by December 6, 2024. | Available to all Google Workspace customers, Workspace Individual Subscribers, and users with personal Google accounts. | Visit the Help Center to learn more about writing & editing a query and getting started with BigQuery data in Google Sheets

Search for and reuse pre-defined queries from BigQuery in Connected Sheets


Launching to beta: Import sensitive Microsoft Word documents as client-side encrypted Google Docs. 
Beginning this week, eligible customers can import and convert sensitive Microsoft Word files into Google Docs with client-side encryption. When collaborating with external and internal stakeholders, you may find yourself working across both Google Docs and Microsoft Word. This update keeps your work moving by layering interoperability on top of the confidentiality benefits of client-side encryption: customers are in direct control of their encryption keys and the identity service that they choose to authenticate for those keys. Eligible Google Workspace admins can use this form to request access to the beta. | Available to Google Workspace Enterprise Plus, Education Plus, and Education Standard customers. | Visit the Help Center to learn more about client-side encryption. More specific instructions will be shared once you’re accepted into the beta. 


Select Google Chat settings can now be applied at the group level 
Admins can now apply the following Google Chat settings at the group level: 
While these settings can also be configured at the Organizational Unit (OU) level, this update provides more granular control for admins. This is critical for our customers, who frequently request more flexibility in how they apply settings, giving them more flexibility on how to configure settings based on the various needs of their users. | Roll out to Rapid Release domains and Scheduled Release domains is complete. | Available to all Google Workspace customers.


Previous announcements

The announcements below were published on the Workspace Updates blog earlier this week. Please refer to the original blog posts for complete details.


View in-meeting chat messages in Google Meet live streams 
Starting this week, when you’re viewing a Google Meet live stream, you will be able to see chat messages that are sent by participants who have joined via the meeting link. | Learn more about in-meeting chat messages in Meet live streams. 


Now generally available: use Gemini in the side panel of Workspace apps in seven additional languages 
Beginning this week, select users can use Gemini in the side panel of Google Docs, Google Sheets, Google Drive, and Gmail, in seven additional languages: French, German, Italian, Japanese, Korean, Portuguese and Spanish. | Learn more about additional Gemini languages. 


Announcing general availability of Google Vids: Our new AI-powered video creation app for work to help tell stories across your organization 
Earlier this year, we announced Google Vids, the newest productivity app in our suite of Google Workspace products. Vids is an AI-powered video creation app for work designed to help teams in customer service, learning and development, project ops and marketing tell more engaging stories at work through video. This week, we’re excited to announce the general availability of Google Vids for select Workspace editions. | Learn more about Vids


Google Vids is now available for Google Workspace for Education, providing easy video creation for teaching and learning 
Earlier this year, we announced Google Vids would soon empower educators and students to easily create and collaborate with video. This week, we’re excited to announce the general availability of Google Vids for Education Plus and Gemini for Workspace customers. | Learn more about Vids for EDU.


Introducing a refreshed library of high-quality Google Slides templates that elevate your presentations
We’re introducing a new collection of modern, professionally designed templates in Google Slides to help users build presentations much faster. These new templates cater to a wide range of use cases that provide users with the perfect starting point for their presentations. | Learn more about Slides templates.


Expanding access to the Gemini app for teen students in education
Google Workspace for Education admins can now turn on the Gemini app with added data protection as an additional service for their teen users (ages 13+ or the applicable age in your country) in the following languages and countries. | Learn more about the Gemini app for teen students in education.


Completed rollouts

The features below completed their rollouts to Rapid Release domains, Scheduled Release domains, or both. Please refer to the original blog posts for additional details.


Rapid Release Domains: 
Scheduled Release Domains: 
Rapid and Scheduled Release Domains: 

Paused rollouts

We have paused the rollout for this feature while we evaluate performance and quality. We will provide an update with new rollout information as soon as possible.

For a recap of announcements in the past six months, check out What’s new in Google Workspace (recent releases).  

Audit security settings using the Policy API, now available in open beta

What’s changing

Simplifying the management of Workspace settings continues to be a priority for us. To that end, we’re introducing new tools to help streamline the process for admins. 

Launching to open beta today, we’re pleased to introduce the Policy API, which will help super admins programmatically access information regarding how their Google Workspace environment service level settings and rules are configured. With the Policy API, customers  gain a comprehensive view of all their settings, giving them a holistic view of Workspace security and compliance configurations. Admins will no longer have to navigate through numerous pages in the Admin Console.

To start, the Policy API is available as a read-only API. In future releases, admins will be able to use the API to create, update, and delete their settings, as well as data loss prevention (DLP) rules. Admins will be able to use the API to audit certain settings in the following categories:

  • Authentication controls such as account recovery, advanced protection program, login challenges, passwords.
  • Chat
  • Classroom
  • Docs and Drive 
  • Gmail 
  • Groups
  • Marketplace
  • Meet 
  • Sites
  • Takeout

The Policy API can also be used to read DLP rules, including the ability to:
  • Read all DLP rule configurations in the admin console, including: rule names and descriptions; applicable organization units (OUs) and groups; triggers and conditions; and app-specific alert actions.
  • Read existing DLP detectors available in the admin console including the detector name, description, and wordlist configurations.
  • Read admin-modified system defined alerts.

Who’s impacted

Super Admins


Why it’s important

With the increase in sophistication and scale of cyber threats, the Cybersecurity & Infrastructure Security Agency’s Secure Cloud Business Applications (SCuBA) project provides guidance to secure agencies’ cloud business application environments and protect federal information that is created, accessed, shared and stored in those environments. 


The Policy API provides access to the settings that are part of these recommendations published in CISA’s Google Workspace secure configuration baselines. Customers who wish to evaluate their Workspace policies against these baselines can start testing using the Policy API. In future releases, we plan to expand support for additional policies described in CISA’s Workspace baselines.  


Getting started

  • Admins: You must be a super admin to use the Policy API. Use our Developer Documentation to learn more about the Policy API.
  • End users: There is no end user impact or action required.

Rollout pace

  • Available now.

Availability

  • Available to all Google Workspace customers

Resources


Google Workspace Updates Weekly Recap – September 20, 2024

3 New updates

Unless otherwise indicated, the features below are available to all Google Workspace customers, and are fully launched or in the process of rolling out. Rollouts should take no more than 15 business days to complete if launching to both Rapid and Scheduled Release at the same time. If not, each stage of rollout should take no more than 15 business days to complete.



Ability to create announcement spaces using the Google Chat API is now generally available 
In June, we introduced the option to create announcement spaces using the Google Chat API through the Google Workspace Developer Preview Program. We’re excited to announce Google Workspace developers can now use the Chat API to create announcement spaces, plus read and update the permission settings of a space. | Rolling out now to Rapid Release domains and Scheduled Release domains. | Available to all Google Workspace customers. | Visit these Developer Docs for more information: PredefinedPermissionSettings and PermissionSettings fields.


Additional improvements to tables in Google Sheets 
Following our announcement of improvements to tables in Google Sheets, we’re adding even more enhancements to the experience. More specifically, you can now: 

1. Insert a blank table from the pre-built table sidebar 
Insert a blank table from the pre-built table sidebar

2. Reference tables successfully via IMPORTRANGE. For example, if you had a table named Table1 with column header values of of Column 1, Column 2, Column 3, etc.: 
    • To import the table range, including header cells, you would input: =IMPORTRANGE(spreadsheet_url, “Table1[#ALL]”) 
    • To import only the footers of the table range, you would input: =IMPORTRANGE(spreadsheet_url, “Table1[“#TOTALS]”) 
    • To import the table range, excluding header cells, you would input: =IMPORTRANGE(spreadsheet_url, “Table1[“#DATA]”) 
    • To import the first two columns of the table range, including header cells, you would input: =IMPORTRANGE(spreadsheet_url, “Table1[[Column 1]:[Column 2],[#ALL]]”) 
3. Use the following keyboard shortcuts to easily convert ranges to tables: 
  • Cmd+Opt+T for Mac 
  • Ctrl+Alt+T for Linux and Windows 
    Rollout to Rapid Release and Scheduled Release domains for #1 is complete. | Rolling out to Rapid Release domains now for #2 and #3; launch to Scheduled Release domains planned for September 26, 2024 for #3 and October 3, 2024 for #2. | Available to all Google Workspace customers, Workspace Individual Subscribers, and users with personal Google accounts. | Visit the Help Center to learn more about using tables in Google Sheets.


    Introducing Dual Screen on Meet
    Following the recent announcement of an improved user experience for Google Meet on Android devices, we’re excited to introduce an additional feature, available on the Pixel 9 Pro Fold device, that provides you with a more immersive video call experience. Through the use of the front and inner cameras, you can now show both yourself and what you're looking at at the same time. Additionally, the person you're video chatting with can be seen on both the inner and outer screens to include everyone around you in the call. | Rollout to Rapid Release domains is complete; rolling out now to Scheduled Release domains. | Available to all Google Workspace customers, Workspace Individual Subscribers, and users with personal Google accounts. | Visit the Help Center to learn how Dual Screen on Google Meet works.

    Use both screens during a Meet call - rear screen view of foldable

    Use both screens during a Meet call - rear screen view of foldable


    Use both screens during a Meet call - front screen view of foldable



    Previous announcements

    The announcements below were published on the Workspace Updates blog earlier this week. Please refer to the original blog posts for complete details.


    Adding multi-monitor support to Google Slides
    We’re making it easier to view your Google Slides presentation controls on your computer while presenting to an audience using a connected external monitor or projector. | Learn more about multi-monitor support in Slides. 

    New beta available that restricts access to folders in Google Drive 
    We’ve introduced a beta that allows shared drive managers to restrict folders to specific users within a shared drive. This provides shared drive managers with greater flexibility to keep relevant content within a single shared drive, while restricting access to shared folders with sensitive information. | Learn more about restricting folder access.

    New design and accessibility improvements for embedded Google Calendars 
    Starting this week, you’ll notice a refreshed look and feel for embedded calendars that is in line with Google Material Design 3 and now includes enhanced accessibility features, such as the ability to use an embedded calendar with a screen reader and keyboard shortcuts to navigate more easily, and more. | Learn more about embedded Google Calendars.

    NotebookLM now available as an Additional Service 
    Last year, we introduced an Early Access App called NotebookLM, an experimental product using some of Google's most advanced models, like Gemini 1.5 Pro, that helps you gain critical insights grounded in the content of source documents you trust. We’re excited to announce that NotebookLM is officially available as an Additional Service. | Learn more about NotebookLM

    Admin features for space management via the Chat API are now generally available 
    Earlier this year, we introduced a series of space management capabilities for Workspace admins in the Google Chat API via the Google Workspace Developer Preview Program. These API features are now generally available for all Google Workspace customers and developers. | Learn more about space management via the Chat API. 

    Create birthdays in Google Calendar 
    To ensure a birthday is never missed, we’re introducing the ability to create and modify birthday events in Google Calendar on Android devices. | Learn more about birthdays in Calendar. 

    Additional iOS data exfiltration enhancement: account level data sharing between Google Workspace apps and non-Google Workspace apps on or off 
    Admins can now enable content sharing on personal Workspace accounts while preventing data sharing from corporate Workspace accounts on iOS devices. | Learn more about iOS data exfiltration enhancements.


    For a recap of announcements in the past six months, check out What’s new in Google Workspace (recent releases).  

    Admin features for space management via the Chat API are now generally available

    What’s changing

    Earlier this year, we introduced a series of space management capabilities for Workspace admins in the Google Chat API via the Google Workspace Developer Preview Program. These API features are now generally available for all Google Workspace customers and developers.

    Using these features, admins can easily perform a variety of space management related tasks at scale. This includes membership management, like adding and removing members, onboarding and offboarding users from spaces, cleaning up inactive spaces, and more. 

    These features are also available when using the Google Apps Manager (GAM), an open source command-line tool that helps administrators to perform bulk operations associated with various aspects of their Google Workspace. The tool can be used to automate space management tasks with command-line scripts, helping to reduce admin overhead and potential errors when using APIs. See this article in our Help Center for more information on using a third-party tool for mass provisioning.

    Who’s impacted

    Admins and developers

    Why you’d use it

    In 2023, we launched the Space Management tool, which allowed admins to view all the spaces within their organization, understand the activity within those spaces, and perform essential actions like deleting a space or assigning space managers. While finding the tool helpful to perform one-off tasks, admins expressed a desire for tools to perform these tasks at scale, for example, with the help of APIs. Admins can now use the Chat API to find information and manage spaces in their organization in bulk or programmatically. Specifically they can:

    • Find and delete inactive spaces: Using spaces.search, you can find spaces that haven’t been used since a specified date and time and then delete them.
    • Onboard and offboard users: Automatically add new users to relevant spaces and remove them from spaces when they leave or change roles.
    • Audit external members: Monitor and control access to your organization's data by identifying and removing external members from sensitive conversations.
    • Lookup and update space details: Easily manage space information like names, descriptions, and guidelines.
    • Verify user membership and upgrade roles: Manage user access and roles within spaces.
    • And more — please refer to our developer guidance for even more information.

    Getting started

    Available in open beta: configure third-party apps by select API scopes

    What’s changing 

    When your users sign in to third-party apps using the "Sign in with Google" option (single sign-on) or use OAuth to share their data with those apps, you can control what access those apps have to your organization’s Google data using app access controls


    Admins currently can configure the third-party apps as “Trusted”, giving them access to all OAuth scopes or as “Limited”, giving them access to scopes only from Google services which are not restricted. Beginning today, we’re giving admins another layer of granular control for third-party apps. Specifically, you can now configure apps to be limited by selected OAuth 2.0 Scopes for Google APIs, such as Drive or Gmail scopes. This helps ensure that these apps do not gain additional access without admin consent based on new API scopes that they might request in the future, keeping data access limited to only what is deemed absolutely necessary by admins.




    Getting started

    Rollout pace


    Availability

    • Available to all Google Workspace customers, as well as Cloud Identity Free and Premium customers


    Resources