Tag Archives: PostgreSQL

Full support of PostgreSQL engine comes to Logica

Logica is a logic programming language designed for intuitive and efficient data manipulation, which we open sourced in 2020. It compiles to SQL, providing access to the power of SQL engines with the convenience of a logic programming syntax.

When it was open sourced, Logica's only fully supported engine was BigQuery, a powerful data warehouse, executing queries with high parallelization and processing terabytes of data within seconds.

Modern machines can store and process significant amounts of data, even within a single computer. Thus relational SQL databases are as popular as ever. They contain a lot of data and its analysis is important. Among open source database options, PostgreSQL and SQLite are some of the most popular database engines (example1, example2). Logica added support for SQLite in 2021.

Now we are pleased to announce a new release of Logica that adds support for PostgreSQL.

As Logica compiles to SQL, it is natural to extend the language to use PostgreSQL as the engine. However, there are nuances in the SQL dialect of Postgres which require addressing. The biggest distinction is that PostgreSQL requires types of records to be explicitly spelled out in your query, while BigQuery determines the types automatically.

For example, consider a Logica predicate where for each user we collect a list of records with information about their purchases.

UserPurchases(

    user_id:,

    user_name:,

    purchases? List= {item_name:, item_price:}) distinct :-

  Purchase(purchase_id:, user_id:, item_name:, item_price:),

  UserInfo(user_id:, user_name:);


We can translate this Logica predicate to GoogleSQL to run on BigQuery as follows:

SELECT

  user_id,

  user_name,

  ARRAY_AGG(STRUCT(item_name as item_name, item_price as item_price)) as purchases

FROM

  Purchases INNER JOIN

  UserInfo USING (user_id)

GROUP BY 1, 2;

Logica's record {item_name:, item_price:} simply compiles into GoogleSQL's STRUCT(item_name as item_name, item_price as item_price).

However, in the dialect of PostgreSQL composite types must be explicitly defined and specified. In our example, we need to define the type PurchaseRecord with fields item_name and item_price. We should also specify in the query that the purchases column is aggregating records of type PurchaseRecord. Thus PostgreSQL query for our predicate would be written like so.

CREATE TYPE PurchaseRecord as (item_name text, item_price numeric);


SELECT

  user_id,

  user_name,

  ARRAY_AGG(ROW(item_name,

                item_price)::PurchaseRecord) AS purchases

FROM

  Purchase INNER JOIN

  UserInfo USING (user_id)

GROUP BY UserInfo.user_id, UserInfo.user_name;


Records and lists are also useful as intermediates in calculations, even if the input and output data are normalized. For example, we have a table called ItemSales and want to find a list of most sold items in each of the stores that the table describes. Specifically, we want to assemble a table with information about the top three most sold items among all of the stores. For each of the items, we may want to list the department of the store where the item is being sold. This can be achieved intuitively using the ArgMax3 aggregate function, which accumulates all the information about the items that we need, and no extra join is needed.


# Collecting information of top 3 most sold items for each store.

StoreTopItemsCollection(store) ArgMax3= {item:,

                                         department:} -> sales_volume :-

  ItemSales(store:, item:, department:, sales_volume:);


# Flattening top items collection.

StoreTopItems(store:, item:, department:) :-

  {item:, department:} in StoreTopItemsCollection(store);


To support the PostgreSQL engine, we extended the Logica compiler with type inference. Logica now infers data types for all expressions that a user employs. For records and arrays, Logica specifies their type in the produced SQL, just as PostgreSQL requires. Commands to create necessary types are produced as part of the compiled SQL. In this collab, we show an example of a program that writes a PostgreSQL table, and in this collab, we show how to give type hints when the program does not have enough information for complete inferences.

As a byproduct of type inference, we were able to improve error messages. Now that we know the types, we can point to the user where a mistake is made within the Logica program, rather than the user having to debug the generated SQL statement.

PostgreSQL is a popular and powerful engine. It is easy to start your own instance (maybe just in CoLab!), or use a serverless option. We are excited to provide users of Logica with the option to run on Postgres. If you already use PostgreSQL, we encourage you to give Logica a try, it is a joy to write data analysis with logic programming! If you have any feedback or questions, please share at the discussion section of Logica repository.

By Evgeny Skvortsov, Software Engineer – Google

The API Registry API

We’ve found that many organizations are challenged by the increasing number of APIs that they make and use. APIs become harder to track, which can lead to duplication rather than reuse. Also, as APIs expand to cover an ever-broadening set of topics, they can proliferate different design styles, at times creating frustrating inefficiencies.

To address this, we’ve designed the Registry API, an experimental approach to organizing information about APIs. The Registry API allows teams to upload and share machine-readable descriptions of APIs that are in use and in development. These descriptions include API specifications in standard formats like OpenAPI, the Google API Discovery Service Format, and the Protocol Buffers Language.

An organized collection of API descriptions can be the foundation for a wide range of tools and services that make APIs better and easier to use.
  • Linters verify that APIs follow standard patterns
  • Documentation generators provide documentation in consistent, easy-to-read, accessible formats
  • Code generators produce API clients and server scaffolding
  • Searchable online catalogs make everything easier to find
But perhaps most importantly, bringing everything about APIs together into one place can accelerate the consistency of an API portfolio. With organization-wide visibility, many find they need less explicit governance even as their APIs become more standardized and easy to use.

The Registry API is a gRPC service that is formally described by Protocol Buffers and that closely follows the Google API Design Guidelines at aip.dev. The Registry API description is annotated to support gRPC HTTP/JSON transcoding, which allows it to be automatically published as a JSON REST API using a proxy. Proxies also enable gRPC web, which allows gRPC calls to be directly made from browser-based applications, and the project includes an experimental GraphQL interface.

We’ve released a reference implementation that can be run locally or deployed in a container with Google Cloud Run or other container-based services. It stores data using the Google Cloud Datastore API or a configurable relational interface layer that currently supports PostgreSQL and SQLite.

Following AIP-181, we’ve set the Registry API’s stability level as "alpha," but our aim is to make it a stable base for API lifecycle applications. We’ve open-sourced our implementation to share progress and gather feedback. Please tell us about your experience if you use it.

By Tim Burks, Tech Lead – Apigee API Lifecycle and Governance

HarbourBridge: From PostgreSQL to Cloud Spanner

Would you like to try out Cloud Spanner with data from an existing PostgreSQL database? Maybe you’ve wanted to ‘kick the tires’ on Spanner, but have been discouraged by the effort involved?

Today, we’re announcing a tool that makes trying out Cloud Spanner using PostgreSQL data simple and easy.

HarbourBridge is a tool that loads Spanner with the contents of an existing PostgreSQL database. It requires zero configuration—no manifests or data maps to write. Instead, it ingests pg_dump output, automatically builds a Spanner schema, and creates a new Spanner database populated with data from pg_dump.

HarbourBridge is part of the Cloud Spanner Ecosystem, a collection of public, open source repositories contributed to, owned, and maintained by the Cloud Spanner user community. None of these repositories are officially supported by Google as part of Cloud Spanner.

Get up and running fast

HarbourBridge is designed to simplify Spanner evaluation, and in particular to bootstrap the process by getting moderate-size PostgreSQL datasets into Spanner (up to a few GB). Many features of PostgreSQL, especially those that don't map directly to Spanner features, are ignored, e.g. (non-primary) indexes, functions and sequences.

View HarbourBridge as a way to get up and running fast, so you can focus on critical things like tuning performance and getting the most out of Spanner. Expect that you'll need to tweak and enhance what HarbourBridge produces—More on this later.

Quick-start guide

The HarbourBridge README contains a step-by-step quick-start guide. We’ll quickly review the main steps. Before you begin, you'll need a Cloud Spanner instance, Cloud Spanner API enabled for your Google Cloud project, authentication credentials configured to use the Cloud API, and Go installed on your development machine.

To download HarbourBridge and install it, run
go get -u github.com/cloudspannerecosystem/harbourbridge
The tool should now be installed as $GOPATH/bin/harbourbridge. To use HarbourBridge on a PostgreSQL database called mydb, run
pg_dump mydb | $GOPATH/bin/harbourbridge
The tool will use the cloud project specified by the GCLOUD_PROJECT environment variable, automatically determine the Cloud Spanner instance associated with this project, convert the PostgreSQL schema for mydb to a Spanner schema, create a new Cloud Spanner database with this schema, and finally, populate this new database with the data from mydb. HarbourBridge also generates several files when it runs: a schema file, a report file (with details of the conversion), and a bad data file (if any data is dropped). See Files Generated by HarbourBridge.

Take care with ACLs

Note that PostgreSQL table-level and row-level ACLs are dropped during conversion since they are not supported by Spanner (Spanner manages access control at the database level). All data written to Spanner will be visible to anyone who can access the database created by HarbourBridge (which inherits default permissions from your Cloud Spanner instance).

Next steps

The tables created by HarbourBridge provide a starting point for evaluation of Spanner. While they preserve much of the core structure of your PostgreSQL schema and data, many important PostgreSQL features have been dropped.

In particular, HarbourBridge preserves primary keys but drops all other indexes. This means that the out-of-the-box performance you get from the tables created by HarbourBridge can be significantly slower than PostgreSQL performance. If HarbourBridge has dropped indexes that are important to the performance of your SQL queries, consider adding Secondary Indexes to the tables created by HarbourBridge. Use the existing PostgreSQL indexes as a guide. In addition, Spanner's Interleaved Tables can provide a significant performance boost.

Other dropped features include functions, sequences, procedures, triggers, and views. In addition, types have been mapped based on the types supported by Spanner. Types such as integers, floats, char/text, bools, timestamps and (some) array types map fairly directly to Spanner, but many other types do not and instead are mapped to Spanner's STRING(MAX). See Schema Conversion for details of the type conversions and their tradeoffs.

Recap

HarbourBridge automates much of the manual work of trying out Cloud Spanner using PostgreSQL data. The goal is to bootstrap your evaluation and help get you to the meaty issues as quickly as possible. The tables generated by HarbourBridge provide a starting point, but they will likely need to be tweaked and enhanced to support a full evaluation.

We encourage you to try out the tool, send feedback, file issues, fork and modify the codebase, and send PRs for fixes and new functionality. Our plans and aspirations for developing HarbourBridge further are outlined in the HarbourBridge Whitepaper. HarbourBridge is part of the Cloud Spanner Ecosystem, owned and maintained by the Cloud Spanner user community. It is not officially supported by Google as part of Cloud Spanner.

By Nevin Heintze, Cloud Spanner