Data Overview
Understanding the Dataset
The workshop uses two main data sources: Prishtina Places Dataset:- Restaurants: Individual JSON files (r1.json, r2.json, etc.)
- Coffee Shops: Individual JSON files (c1.json, c2.json, etc.)
- Data Fields: Name, location, rating, reviews, coordinates, address
- Source: Scraped using SerpAPI from Google Places
- Format: PostgreSQL database
- Content: Synthetic daily revenue data for each establishment
- Time Range: Historical data for trend analysis
Data Structure
Each JSON file contains a raw SERPApi response with a JSON object with metadata and local results of place objects with the following structure:JSON Data Exploration and Processing
Explore the JSON Data
Start by exploring the structure of your JSON files:Exploration Hints
Exploration Hints
Clean the JSON
The JSON is somewhat unstructured because it contains a lot of unncessary metadata. Find a way to parse out the individual restaurant data.Parsing Hints
Parsing Hints
Create a Table named ‘places’from the JSON
Create Table from JSON
Create Table from JSON
Query aggregations
Query the count of places
Count of places
Count of places
Query the average rating of places
Average rating of places
Average rating of places
Analyze the data further on your own
Query the data on your own with other types of interesting aggregations / analytics. Write down some interesting queries and insights.Find Top 10 Rated Places in Prishtina
Analyze the highest-rated and lowest-rated places with at least 50 reviews.Hints
Hints
Geospatial Analytics
Install and Explore DuckDB Geospatial Extension
Enable geospatial capabilities:Geospatial Extension Resources
Geospatial Extension Resources
Calculate Distances Between Places
Find the nearest place to ICK (Innovation Centre Kosovo)
Hint
Hint
Task Summary
Congratulations! You’ve completed the batch analytics portion of the workshop. You’ve learned to: ✅ Explore and clean JSON data✅ Analyze places data from Google ✅ Implement geospatial analytics Next Step: Ready for real-time analytics? Move on to Task 2: Real-Time Analytics to learn about ClickHouse and streaming data processing.