Much software development in the capital markets (and enterprise systems in general) revolves around the transformation, enrichment and movement of data from one system to another. The unpredictable nature of financial market data volumes, often driven by volatility, exacerbates the pain of scaling and posting data when and where it’s needed for daily trade reconciliation, settlement and regulatory reporting. The implications of technology missteps within such crucial business processes range from missed business opportunities to undesired risk exposure to regulatory non-compliance. These activities must be relentlessly predictable, repeatable and measurable to yield maximum value to stakeholders.
While developers rely on the Extract, Transform and Load (ETL) activities that are so crucial to processing data, they now face limits in terms of the speed and efficiency of ETL as the amount of transactions grows faster than they can process it. As shortened settlement durations and the Consolidated Audit Trail (CAT) loom on the horizon, financial services institutions need simple, fast and powerful approaches to quickly scale and ultimately mitigate time-sensitive risks and operational costs.
Traditionally, developers have considered the activities around ETL data an unglamorous yet necessary dimension of building software products for encapsulating functions that are core to every tier of computing. So when data-driven enterprises are tasked with harvesting insights from massive data sets, it’s quite likely that ETL, in one form or another, is lurking nearby. But in today’s world, data can come from anywhere and in any format, creating a series of labor, time and intellectual challenges. While there may be hundreds of ways to solve the problem, few provide the efficiency and effectiveness so needed in our “big data” world — until recently.
The Google Cloud Dataflow service and its associated software development kit (SDK) provides a series of powerful tools for a myriad of data transformation duties. Designed to perform data processing tasks of any size in a managed services environment, Google Cloud Dataflow simplifies the mechanics of large-scale transformation and supports both batch and stream processing using the same programming model. In our latest white paper, we introduce some of the main concepts behind building and running applications that use Dataflow, then get “hands on” with a job to transform and ingest options market symbol data before storing the transformations within a Google BigQuery data set.
In short, Google Cloud Dataflow allows you to focus on data processing tasks and not cluster management. Rather than asking you to guess the right cluster size, Dataflow automatically scales up or down horizontally as much as needed for your exact processing requirements. This includes scaling all the way down to zero when there is no work, so you’re never paying for an idle cluster. Dataflow also alleviates the pain of writing ETL jobs by standardizing the process of implementing application requirements. As a result, you’ll be able to focus on the data transformations you need to make rather than on the processing mechanics themselves. This not only provides greater flexibility, lower latency and enhanced control of ETL jobs; it offers built-in cost management and ties together other useful Google Cloud services. Beyond common ETL, Dataflow pipelines may also include inline computation ranging from simple counting to highly complex, multi-step analysis. In our experience with the service so far, it can potentially remove much of the work from engineers within financial institutions and regulatory organizations, while providing elasticity to the entire process and ensuring accuracy, scale, performance and cost efficiency.
As market volatility and reporting requirements drive the need for accuracy, low latency and risk reduction, transforming and interpreting market data in a big data world is imperative to trading efficiency and accessibility. Every second counts. With a more cost-effective, real-time and scalable method of processing an ever-increasing volume of data, financial institutions will be able to address specific requirements and volumes at hand while keeping up with the demands of a rapidly evolving global financial system. We hope our experience, as captured in the technical white paper, will prove useful to others in their quest for the more effective way to process data.
Please see this paper’s GitHub page for the complete and buildable project source code.
- Posted by Salvatore Sferrazza, Principal at FIS and Sebastian Just, Manager at FIS