Q1 2023 Release Notes
New Features
Webhook real-time notifications
We have implemented a new section on the Platform where users can configure multiple webhooks to receive real-time notifications from the platform. Webhooks are currently supported by Dataflow Studio. The user can select a previously created webhook or set up a new one during the DFS job configuration process.
Data Catalog
We have revamped our Data Catalog both from a UI and UX perspective. A new section on the Platform is now available for all our users to navigate the Data Catalog and explore how our data is structured, how it is partitioned, and how it can be queried.
Upgrade of Jupyter to v.3.4.8
We released an upgrade of the Jupyter tool that introduces hundreds of bug fixes and great new features like the ability to debug your code, a visual indicator that a cell has been modified from the last execution, and an improved autocompletion assistant. Learn more about the changes needed to be applied to your code within this new version in our Help Center.
New API domains
We released a new API gateway dedicated to the Spectus API (api.spectus.ai, auth.spectus.ai) to simplify access to the AdTech environment (Cuebiq) and the platform environment (Spectus). Learn more here.
Product Enhancements
Expanded data through HLL table and Longitudinal APIs
We have added Spectus proprietary data into the HLL table from Jan 1st, 2019, with more than 1.5M+ DAU for the entire period. We have also added probabilistic HLL data in the HLL table and Longitudinal APIs response, starting from Jan 1st, 2023.
Finally, we updated our Longitudinal APIs layer by allowing users to select data providers through a dedicated filter. This enables users to choose between the aforementioned new sources and pre-existing sources. Learn more about it in our API documentation.
Trino fault-tolerant execution
We have turned on Trino’s fault-tolerant mechanism that enables a cluster to mitigate query failures by retrying queries in the event of failure. This mechanism is fully transparent for the end user as everything happens under the hood.
Visit Engine providers support
A new enhanced Visit Engine template is now available in Data Flow Studio. This new version works with the last version of Spectus core data assets (paas_cda_v3) and allows users to select certain data providers to be utilized for the visits computation. Furthermore, new data providers are now available for the visit computation!
Connected cars enhanced trajectories
The trajectory asset within the schema vehicle_v1 has been improved with the following changes. New filters now enable users to remove noisy trajectories (unrealistically fast trajectories or those with insignificant distance or time). Moreover, we have added new fields to enable further use cases and to increase functionality of our connected vehicles trajectories. This includes:
- start and end block_group_id: another geographical dimension of where a car began and ended its trip
- max_time_gap_seconds: a measure of the trajectory accuracy
- trajectory_wkt: a string to easily plot and handle all the vehicle_location points composing the trajectory
- bounding_box_diagonal_meters: another measure of the spatial extension of the trajectory.
Dataflow Studio improvements
We increased the number of visible runs for a single job from 100 to 2000 to offer a broader view for jobs that run multiple times per day/week and generate a considerable amount of runs.
Moreover, we introduced a visual indication of the next run’s schedule in the Trigger section of a scheduled job under the Schedules panel.
Expanded data through HLL table and Longitudinal APIs
We have added Spectus proprietary data into the HLL table from January 1st, 2019, with more than 1.5M+ DAU for the entire period. We also expanded our Longitudinal APIs with data since May 1st, 2020.
New traffic analysis tutorial
We have added a new tutorial to our collection of Jupyter notebooks to support our clients in familiarizing with our core data assets. This tutorial illustrates how to enable numerous use cases with a variety of datasets and explains some of the most common traffic analyses enabled by the assets in the vehicle_v1 schema. The new tutorial is available in the Use Cases section of the Tutorials app or in the Use Cases section of the Jupyter tool.
Bug Fixes
- Mapbox expiration token on Superset: we have refreshed the Mapbox token used by Superset to show the map charts and avoid 403 unauthorized errors.
- Improved DFS run creation process: we increased the robustness of the run creation process in DFS that has led, in rare cases, to a failure.
Q4 2022 Release Notes
New Features
Home Switchers DPS
We have implemented a new Data Processing Service that enables users to easily identify the devices that moved home locations in areas of interest during a specified period.
This new tool allows users both to schedule home switchers extraction daily, and to run it over any range of historical data. Learn more in the Help Center.
Visit Map
We have implemented a new app that makes it easy to compare the Visitation Index (VI) between different brands – regardless of the industry. The number of visits per store and per brand is computed daily, and users can simply select the verticals and brands of interest to check their VI trends over time.
With our new Visit Map users can visualize brands’ VIs in different areas – darker areas represent a higher VI. Businesses can use our Visit Map to visualize a given brand’s VI trends, and to determine optimal DMAs based on the locations at which a brand drives more relative visits.
Tutorial for traceability analysis
We created a new tutorial that explains how to compute flows between polygons of interest. You can find it in the Use Cases section in the Tutorial app.
User Preferences
We have implemented a new section on the Platform where users can configure which email notifications they receive from Dataflow Studio. We plan to add more customization settings to this page in the future. To reach this screen, click on your name in the top right corner and then click Preferences.
Product Enhancements
Endpoints migration
We have migrated all endpoints of our tools (Jupyter, Superset, Trino, etc) from *.cuebiq.com to *.spectus.ai. This change is fundamental to scale the platform to support future developments.
Please review your browser bookmarks to avoid issues accessing the tools.
TrinoDB engine v402
We now continuously update our core SQL engine to ensure that clients leverage new functionality and performance improvements released by the community.
Longitudinal Analysis APIs
We updated our Longitudinal Analysis APIs infrastructure to significantly reduce the amount of time needed to process requests.
Dataflow Studio – Show deleted jobs
In Dataflow Studio, users can now review previous configurations and executions by viewing deleted jobs. Deleted jobs are hidden by default – now users can duplicate them, or view them by toggling the switch in the filter section above the main job list.
Bug Fixes
- Jupyter connection error 500 v1/statement – we fixed a bug that was causing a connection error between Jupyter and Trino that forced users to restart the Jupyter instance.
- Trino engine autoscaler tuning – we tuned the Trino autoscaler component to better manage the scale-up/down of workers to avoid query interruption and job failures.
- Permission error 403 on Apps – we fixed an authentication issue that caused a ‘permission denied’ error while accessing the Apps (Data Catalog, Tutorials, Hurricanes, etc.)
- Superset SQL Editor timeout increased – we increased the timeout duration of a query execution in the SQL Editor within Superset from 3 to 30 minutes.
- Dataflow Studio job activation time reduction – we reduced Dataflow Studio job activation time by ~20% by optimizing the deployment pipeline.
- “Map” column type on Dataflow Studio – we fixed the map column type in the data schema definition. Now it works as expected.
- Minor Dataflow Studio UI improvements
- Added the job name in the ‘delete confirmation’ dialog
- Fixed the documentation link within the changelog popup
- Increased notebook dropdown width to improve readability
- Trino engine autoscaler tuning – we tuned the Trino autoscaler component to better manage the scale-up/down of workers to avoid query interruption and job failures.
- Improved S3 copy operations – we reduced the time needed to copy large files across different S3 buckets by 80%.
- Dataflow Studio – Import action – we fixed an issue related to the Import action in the Standard ETL Template when trying to import a partitioned dataset with the option FIXED enabled.
- Dataflow Studio – Notebook logs autosave – sometimes, the logs related to the last cell of a notebook were missing. We enabled the autosave mechanism for each notebook’s cell execution to guarantee the availability of the logs for all cells in case of computation failure.
Q3 2022 Release Notes
New Features
Two New Methods to Access Footfall
APIs for On-Demand Footfall
With our APIs, you can request real-time processing of custom polygons over desired windows of time to receive historical (up to 3 years) and ongoing aggregate footfall (count of distinct devices) within a short response period5. Our on-demand APIs are ideal for use-cases in which the areas of interest and the desired time-windows cannot be known ahead of time, and requests are typically made by multiple users at the same time.
Example: An interactive multi-user application where users select the desired locations and time-windows and expect to receive associated footfall data within a short period of time.
Our APIs provide options to filter out specific days from the chosen time-windows and request for additional granularity (hourly, daily, monthly) in the footfall delivered to you. Spectus can serve up to 5 APIs per second.
Batch Processing for POIs at Scale
With improved processing speed, our batch processing mode allows you to request footfall for millions of POIs over the desired windows of time and get the associated footfall delivered directly to your S3 buckets. You can choose the POIs from Spectus’ Geosets or upload your own list of custom polygons to receive historical and ongoing footfall.
Our batch processing method also allows you to normalize values, creating sub-aggregate results, and comparing overlaps between different time-periods and POIs.
5Processing times range between seconds to minutes depending on the density of the polygon and the duration of the time window requested.
Stoppers by Geohash – A New Core Data Asset
We’ve built the new Stoppers by Geohash asset to support our On-Demand Footfall APIs and the Batch Processing Mode for POIs at Scale with reliable and rapid footfall calculations. Available within the paas_cda_v3 schema, the Stoppers by Geohash is designed to report hourly and daily footfall, along with the Hyper Log Log values of the distinct devices that stop in any chosen geographical location.
Dedicated Trino Clusters For Speed and Efficiency
Historically, the jobs configured via Dataflow Studio and the solutions built on Jupyter & Superset shared the same underlying Trino cluster, making queries from the interactive tools potentially delay the ongoing jobs scheduled through Dataflow Studio.
We’ve now launched separate Trino clusters for Dataflow Studio and Spectus’ interactive tools (Jupyter & Superset) to ensure that queries from Jupyter & Superset don’t interrupt the scheduled Dataflow Studio jobs that are in-progress.
Clusters – A Trino Cluster Management Solution
Our new cluster management feature lets you modify the computational power of the individual clusters that run your Dataflow Studio jobs and queries from Jupyter & Superset. The Cluster Configuration Page, available under the Clusters module, lets you choose the range of the processing capacity (workers) you’d like to assign to your queries. Spectus will automatically adjust the processing power between this range to run your solutions.
Device Metrics – A New Core Data Asset
The Device Metrics asset contains information about the individual devices in our panel. We’ve extracted aggregated metrics from other core data assets such as Device Location, Stops, etc, to build an easy-to-use table to help you choose the devices that meet your criteria and run solutions with associated data. You can access aggregated data related to device location, stops, device footprint, etc, via the Device Metrics asset.
Brand New Look
The Spectus platform is now live in a new look. You can find all of your existing notebooks and tables, as expected, where you left them. The rebranded platform lets you access all the data clean room components directly from the home page.
New privacy enhanced trajectory
Following the release of the improved uplevelling algorithm, made to privatize home locations in a more robust way, we have released a new version of the trajectory core data asset, the paas_cda_v3.trajectory, allowing the users to identify trajectories starting or ending in a personal area.
We have also updated the tutorial documentation about this specific core data asset with a dedicated section. You can find it in the Platform documentation folder under Explore The Catalog/Core Data Assets/05 Trajectories.
Product Enhancements
An Enhanced Trajectories Data Asset
We’ve improved the data within our Trajectories core data asset by eliminating
- Trajectories with unacceptably high values of speed
- Trajectories that span over extremely short time windows
- Trajectories that are too short in their distance covered.
This ready-to-use dataset eliminates the potential need to filter out data that might not serve your quality thresholds.
Improved Dataflow Studio error reporting
We’ve improved the error management of Dataflow Studio jobs by moving the error management from the execution to the activation phase. For example, before this release, if a user selected a malformed notebook in the Compute action of a job, the activation would be completed successfully, but then the job would fail at first run. With this release, the job activation process will instead fail immediately so users can check the notebook and retry the activation.
The same behavior has been added in case the job configuration has some invalid characters like accents or white spaces.
Easily access logs of a notebook execution
Before this release, the only way to download the logs of a notebook execution was through an unfriendly mechanism that involved multiple manual steps.
In this release, we have significantly improved this process by providing you a simple button to download the logs. Then, you are free to upload it into your Jupyter instance or debug it on your local machine.
Moreover, the ability to download these logs has been extended to the runs completed successfully in addition to the runs that failed.
Longitudinal APIs now supports MULTIPOLYGON wkt type
Longitudinal APIs allow you to calculate the footfall over a custom polygon, with up to three years of historical data, down to hourly granularity in a few minutes.
It’s now possible to submit longitudinal requests with the MULTIPOLYGON wkt type that allows you to reduce footfall errors for areas that require multiple polygons to be well defined. Some examples could include: islands, areas with holes, etc.
We have also updated the tutorial documentation about longitudinal analysis in order to include this use case. You can find it in the Platform documentation folder under use-cases/longitudinal_analysis.ipynb.
Notebook Tutorials
Footfall Time-Series – In this tutorial, we walk you through all the steps needed to calculate footfall time-series over a polygon using the HLL table available in the paas_cda_v3 schema.
Please find all our tutorials in the data-clean-room-help folder within Jupyter.
Q2 2022 Release Notes
New Features
Expand your analysis with Density Metrics
Density metrics across different geospatial standards are now available for access directly via our new Density Tables. Six new tables containing daily and monthly updated metrics allow you to pick and choose parameters that strengthen your analysis.
Three daily updated tables1:
- stoppers_hll_by_geohash
- stoppers_hll_by_bing_tiles
- stoppers_hll_by_h3
Three monthly updated tables2:
- stoppers_metrics_by_geohash
- stoppers_metrics_by_bing_tiles
- stoppers_metrics_by_h3
Additionally, our new Spatial Index Functions allow you to convert metrics to your preferred spatial standards to support your custom analysis3.
Unlock new global use cases with EU Trajectories.
Expanding on our last quarter’s innovative offering, the Trajectories data asset is now available for Europe (UK, France, Italy, Spain, Germany).
1Daily updated tables contain approximate number of distinct devices
2Monthly updated density tables include total distinct devices, avg daily distinct devices, min daily distinct devices, max daily distinct devices.
3For additional information please refer to the tutorial python_spatial_indexing_systems available in the cuebiq-examples folder on Jupyter.
Image represents trajectories observed in London
An extension of our “Trips” data asset that only gives information on the origin and destination stops, our “Trajectories” data asset captures the actual path traveled by a user. This is done by collecting location pings along a user’s entire path to reveal the route and different stops they make as they head towards their destination.
A New ‘provider_id’ to Help You Choose Data that Best Suits Your Needs
We’ve introduced a new field ‘provider_id’ in the below core data assets to help you differentiate among data sources and choose the desired data from a provider who meets your standards and requirements.
- Device Location
- Stop
- Visit
- Device Recurring Area
A New Version of the paas_cda Schema
We’ve introduced the paas_cda_v3 schema to include our updated data assets such as Device Location, Stop, Visit, Geography Registry and Device Recurring Area. Our versioning system improves the organization of data assets and provides you with a smooth operational experience. All changes incorporated in paas_cda_v3 can be found in both Cuebiq Data Catalog and the Changelog within Jupyter. Learn more about the changes across multiple versions of the CDA schema here.
Product Enhancements
Experience an improved operational process
We’ve added enhancements throughout our dataflow studio to make it easier for you to track and optimize workflows.
1. Activation ERROR Status
Introducing a new job status that indicates if a job failed the activation process. This feature also provides a list of errors that help users easily identify reasons for failure and fix them independently.
2. Step-by-Step Logs of Job Runs
Follow your job run in the same sequence of the actions configured in the job.
3. View the Last Run Status
Know the status of the last run for any job right on the Job Listing page. Look for the filled circle in the Last Run column to recognize if a run was successful (green), failed (red) or is in progress (blue).
An Updated Geographical Model for Improved Location Accuracy
As we strive to continually improve our offerings, we’ve updated our geography model that lets us map mobility data to geographical areas (e.g. Countries, States, Counties) with higher precision and accuracy. This new model (representing the entire globe), available via the Geography Registry core data asset is also enriched with further administrative regions (e.g. csa, cbsa, dma) and yields improved results for your location analyses, especially around borders and coasts.
Auto Refresh on Dataflow Studio
All pages within the Dataflow studio now refresh automatically as you configure your jobs and run them, so you can view changes in your job-run status without manually refreshing the user interface.
Notebook Tutorials
New Tutorial – Stop Density guides you to quickly and easily compute how many devices stopped in a given area over a period of time4.
Our notebook tutorials help you gain unique insights into how Spectus data assets can be leveraged to fuel your use cases and analysis and can be found in the data-clean-room-helpcuebiq-examples folder on Jupyter
4Stop Density can be found in cuebiq-examples/explore-the-catalog/core-data-assets/12_densities
We’ve updated the tutorial Manage Table, to include section 5.2 to describe how to insert in-memory data from Pandas into a table in your dedicated catalog. This can be found in the location- data-clean-room-help/getting-started/01_manage_tables
Note: The data-clean-room-help folder is read-only, so users cannot edit the tutorials inside the folder. To modify a tutorial, users may copy the tutorial notebook, save it in a folder outside the data-clean-room-help where users have read and write permissions.
Q1 2022 Release Notes
New Features
- Data Event Trigger to automate time-sensitive analyses as soon as new data is available.
- Custom POI Calculator to calculate visitation data on your own POIs.
- New Core Data Asset – Trajectories (US/CA) to help you unlock new use cases involving the actual path traveled by a user between two consecutive stops.
Product Enhancements
- Improved Visits Table with higher degree of accuracy, dynamically updated POI data, and strengthened user-privacy.
- Improved POI & Brand Tables with new brand additions, SIC codes, and indicator if a brand is a distributor.
Notebook Tutorials
New Notebook Tutorials on Trajectories, Travelers, Visits, Commuters and Optimal Store Location.