Databricks logo

Databricks

San Francisco, CAData & AI

Interview Questions

MockHashMap with Load Measurement

Asked at Databricks
technical
data structures
system design

Design a MockHashMap class that tracks operation frequency.

Required Methods:

  1. Basic HashMap Operations:
put(key, val)  // Insert key-value pair
get(key)       // Retrieve value for key
  1. Load Measurement Methods:
measure_put_load()  // Average put calls/second in last 5 minutes
measure_get_load()  // Average get calls/second in last 5 minutes

Implementation Requirements:

  • Record timestamp for each operation
  • Calculate 5-minute rolling averages
  • Filter records within time window
  • Divide by 300 seconds for average

Example:

// 600 put calls in 5 minutes
measure_put_load() → 2.0  // (600/300 seconds)

// 150 get calls in 5 minutes
measure_get_load() → 0.5  // (150/300 seconds)

Web Crawler Design

Asked at Databricks
technical
system design
distributed systems

Design a web crawler that downloads pages up to depth K from a start URL.

Core Requirements:

  1. Crawling Logic:
  • Start from start_url
  • Crawl recursively up to depth K
  • Extract and follow links
  1. Storage Management:
  • Create directory structure for downloads
  • Store pages based on URL and depth
  • Efficient storage organization
  1. Robustness:
  • Track visited URLs (deduplication)
  • Detect and handle cycles
  • Handle malformed URLs

Follow-up Questions:

  1. Distributed Architecture:
  • How to parallelize crawling?
  • Handle shared state
  • Load balancing
  1. Performance:
  • Handle slow responses
  • Implement timeouts
  • Rate limiting

Share Your Experience at Databricks