Tag Archives: BigTable

Four areas of open source contributions from Cloud Databases

Open Cloud enables you to develop software faster, innovate more easily, and scale more efficiently—while also reducing technology risk. Google has a long history of leadership in open source, and today, I want to look back at our activities around open source projects, for databases, over the past year.

Give developers the best tools to be efficient

Developers choose to build applications with managed database services on Google Cloud to benefit from velocity, scalability, security, and performance. To enable you to be most efficient and deliver your best possible work, we deliver tools and frameworks that work with your preferred development environments, no matter if you develop in the cloud or on premises. To make local testing, building and continuous integration easier for our cloud-native databases, we released emulators for Cloud Spanner, Firestore, and Cloud Bigtable so that you can test your code wherever you develop it - without the need to create or re-create cloud infrastructure with every test run.

Another area where we are helping developers is with instrumentation of Cloud SQL for easier debugging and performance tuning. With Cloud SQL Insights it is easier than ever to pinpoint underperforming SQL statements. That said, without additional instrumentation, it can be cumbersome to identify the source code or microservice that issued that SQL - let alone tying a SQL statement to a client session and its context. So we released Sqlcommenter as an open source library that will automatically add this instrumentation as SQL comments in queries that are generated by popular ORMs like Hibernate, Django, Sqlalchemy, and others (repo blog). We didn’t stop there, but merged Sqlcommenter with OpenTelemetry (blog) to add SQL insights from instrumented queries back to OpenTelemetry traces.

Lastly, we want to broaden access to our differentiated offerings, like Spanner. The recently announced Spanner PostgreSQL interface allows organizations to access Spanner’s industry-leading consistency and availability at scale using tools and skills from the popular PostgreSQL ecosystem. This new way of working with Spanner provides familiarity for developers and portability for administrators. (blog) Learn more in the documentation or sign up for the preview today.

Provide connectivity that is simple and secure

Connecting to APIs and databases from an application running in the cloud should be simple and secure. That’s why we recommend using IAM and Application Default Credentials when authenticating to other services. The Cloud SQL Proxy (repo) has been doing this and also setting up firewalls for you for a while. It works by running a local client either inside your VM or a GKE cluster. This year, we added libraries for Java (repo) and Python (repo) that can provide similar functionality without the overhead of running an extra client such as the proxy.

Cloud Spanner also offers an open source adapter for its new PostgreSQL interface (repo). This local proxy allows tools, starting with psql, to connect to a Spanner database using the PostgreSQL wire protocol.

Image 1: White pipes in datacenter

Manage cloud infrastructure with the tools of your choice

When it comes to provisioning, monitoring, and managing your cloud database services, flexibility and choice are important. We provide you with our cloud console, gcloud cli, and APIs as well as our own Deployment Manager. That said, you may prefer different ways to manage cloud infrastructure - whether through interactive tools or scripts or embedded into CI/CD pipelines that support GitOps or other controls, checks, and balances. Terraform is one of those open tools that is very popular - and we ensure that our cloud databases can be managed from it as documented in this blog about creating Spanner instances with Terraform.

If you manage the majority of your resources with Kubernetes either directly or through package managers like Helm, then our Kubernetes Config Connector (KCC) might be for you. In a nutshell, KCC exposes Google Cloud services such as Cloud SQL, Spanner, and others as Custom Resources in Kubernetes. This allows you to create and reconcile cloud resources outside of Kubernetes just like K8s native objects.

Once you are managing cloud infrastructure with CI/CD, the next step is to extend that same mechanism to manage objects within your databases such as tables, indexes, and views. To that extent we have released a Liquibase extension for Cloud Spanner.

Help you to move data with confidence

Cloud journeys often involve moving data either in a lift and shift process or sometimes replatforming to a different database. Whatever your journey, we want to simplify the process and give you the confidence that your migration is successful.

For enterprise users with Oracle databases, we have several open source projects. First, we have the Optimus Prime database assessment tool (repo) that queries your database and collects information about schemas and historic performance to be analyzed for migration complexity and consolidation potential. Our own professional services teams have been using this toolset to plan migrations to Bare Metal Solution for Oracle.

Some Oracle users are looking for opportunities to transform their workloads to fit with their bigger strategy of modernizing applications with Kubernetes. For this group we developed and open sourced the El Carro Kubernetes operator for Oracle. This not only automates database lifecycle tasks for systems running on Kubernetes, but also exposes declarative APIs for these operations.

If your application supports replatforming from Oracle to PostgreSQL, then we have a toolset for schema conversion along with dataflow pipelines that will read the output of a change data capture job and load it into a PostgreSQL database. What a great use-case for Datastream - our new serverless change data capture service.

Another case of heterogeneous database migration is to move MySQL or PostgreSQL databases to Cloud Spanner. HarbourBridge helps with the evaluation and data migration, and our latest contribution was adding support for DynamoDB as a source database. Part of every heterogeneous migration should be to validate that the source and target data are matching - we have released the Data Validation Toolkit for that use-case. DVT can connect to a number of source and target databases and compare the data on each side - giving you the confidence that your migration did not miss or change any records.

Conclusion

Whether you are migrating existing databases or you are building your next application in the cloud - we want to make your journey as comfortable and seamless as possible. Open source projects play a big role in meeting you where you are and providing you with the connectivity options, language support, and tools you want for management and migrations.

By Bjoern Rost, Product Manager, Google Cloud Databases

BigQuery cost controls now let you set a daily maximum for query costs

Today we’re giving you better cost controls in BigQuery to help you manage your spend, along with improvements to the streaming API, a performance diagnostic tool, and a new way to capture detailed usage logs.

BigQuery is a Google-powered supercomputer that lets you derive meaningful analytics in SQL, letting you only pay for what you use. This makes BigQuery an analytics data warehouse that’s both powerful and flexible. Those accustomed to a traditional fixed-size cluster – where cost is fixed, performance degrades with increased load, and scaling is complex – may find granular cost controls helpful in budgeting your BigQuery usage.

In addition, we’re announcing availability of BigQuery access logs in Audit Logs Beta, improvements to the Streaming API, and a number of UI enhancements. We’re also launching Query Explain to provide insight on how BigQuery executes your queries, how to optimize your queries and how to troubleshoot them.

Custom Quotas: No fear of surprise when the bill comes


Custom quotas allow you to set daily quotas that will help prevent runaway query costs. There are two ways you can set the quota:

  • Project wide: an entire BigQuery project cannot exceed the daily custom quota.
  • Per user: each individual user within a BigQuery project is subject to the daily custom quota.


Query Explain: understand and optimize your queries

Query Explain shows, stage by stage, how BigQuery executes your queries. You can now see if your queries are write, read or compute heavy, and where any performance bottlenecks might be. You can use BigQuery Explain to optimize queries, troubleshoot errors or understand if BigQuery Slots might benefit you.

In the BigQuery Web UI, use the “Explanation” button next to “Results” to see this information.

Improvements to the Streaming API

Data is most valuable when it’s fresh, but loading data into an analytics data warehouse usually takes time. BigQuery is unique among warehouses in that it can easily ingest a stream of up to 100,000 rows per second per table, available for immediate analysis. Some customers even stream 4.5 million rows per second by sharding ingest across tables. Today we’re bringing several improvements to BigQuery Streaming API.

  • Streaming API in EU locations. It’s not just for the US anymore: you may now use the Streaming API to load data into your BigQuery datasets residing in EU.
  • Template tables is a new way to manage related tables used for streaming. It allows an existing table to serve as a template for a streaming insert request. The generated table will have the same schema, and be created in the same dataset and project as the template table. Better yet, when the schema of the template table is updated, the schema of the tables generated from this template will also be updated.
  • No more “warm-up” delay. After streaming the first row into a table, we no longer require a warm-up period of a couple of minutes before the table becomes available for analysis. Your data is available immediately after the first insertion.

Create a paper trail of queries with Audit Logs Beta


BigQuery Audit Logs form an audit trail of every query, every job and every action taken in your project, helping you analyze BigQuery usage and access at the project level, or down to individual users or jobs. Please note that Audit Logs is currently in Beta.

Audit Logs can be filtered in Cloud Logging, or exported back to BigQuery with one click, allowing you to analyze your usage and spend in real-time in SQL.

With today’s announcements, BigQuery gives you more control and visibility. BigQuery is already very easy to use, and with recently launched products like Datalab (a data science notebook integrated with BigQuery), just about anyone in your organization can become a big data expert. If you’re new to BigQuery, take a look at the Quickstart Guide, and the first 1TB of data processed per month is on us. To fully understand the power of BigQuery, check out the documentation and feel free to ask your questions using the “google-bigquery” tag on Stack Overflow.

-Posted by Tino Tereshko, Technical Program Manager