Cache Content with Azure Managed Redis
Introduction
3 minAzure Managed Redis is a fully managed Redis service. Redis is an in-memory key-value store that runs at microsecond latency β 100-1,000x faster than database queries. AI applications use it to cache model inference results, prompt/response pairs, embedding vectors, rate-limit counters, and session state β all to reduce cost and latency.
Explore Azure Managed Redis
7 minCache-Aside Pattern: App checks Redis β HIT returns instantly β MISS fetches DB then caches result
1. Azure Managed Redis Tiers
| # | Tier | Memory | Persistence | Cluster | Use For |
|---|---|---|---|---|---|
| 1 | Memory Optimized | Up to 1.5 TB | AOF | β | Large AI caches, embedding stores |
| 2 | Balanced | Up to 120 GB | AOF | β | General production AI workloads |
| 3 | Compute Optimized | Up to 120 GB | AOF | β | High throughput, CPU-intensive operations |
| 4 | Flash Optimized | Up to 13 TB | AOF | β | Very large datasets, warm data on flash |
Also: Azure Cache for Redis with Basic/Standard/Premium tiers β Basic is single-node dev/test; Standard adds replication; Premium adds geo-replication and VNet.
2. Connection: Port 10000 (Not 6379)
Azure Managed Redis uses port 10000 with mandatory TLS β not the default Redis port 6379. This is a common exam trap.
redis://myinstance.redis.azure.com:10000 (TLS required) 3. Caching Strategies β Which to Use When
- Cache-aside (Lazy loading) β app checks cache first, on MISS fetches from DB and stores in cache. Most common for AI inference results. Requires cache invalidation on source data change.
- Write-through β writes go to cache AND DB simultaneously. Cache always consistent. Higher write latency. Good for user profiles that must always be current.
- Write-behind (Write-back) β writes go to cache, DB updated asynchronously. Low write latency. Risk of data loss on crash. Advanced scenario.
- Read-through β cache fetches from DB on MISS automatically. Cache acts as transparent proxy. Simplifies application code.
Client Libraries and Configuration
5 min1. Python: redis-py with Managed Identity
import redis
# Access key authentication
r = redis.Redis(
host='your-instance.redis.azure.com',
port=10000,
ssl=True,
decode_responses=True,
password='your-access-key'
)
# Entra ID (Managed Identity) authentication
from redis_entraid.cred_provider import create_from_default_azure_credential
credential_provider = create_from_default_azure_credential(
("https://redis.azure.com/.default",)
)
r = redis.Redis(host='...', port=10000, ssl=True,
decode_responses=True, credential_provider=credential_provider) create_from_default_azure_credential from redis-entraid package. Token refreshes automatically. Access key = shared long-lived secret β avoid in production. 2. decode_responses=True
Set this in ALL clients. Without it, Redis returns bytes objects. With it, returns Python str. Nearly always what you want.
Implement Redis Data Operations
12 min1. Strings β Most Common Type (KeyβValue)
r.set("inference:result:user-123", '{"answer": "42", "confidence": 0.98}', ex=300)
result = r.get("inference:result:user-123") # None if expired
r.setex("rate:user-123", 60, 1) # Set with TTL of 60 seconds
r.incr("rate:user-123") # Atomic increment β thread-safe counter
remaining = r.ttl("rate:user-123") # Time to live in seconds 2. Hashes β Object with Multiple Fields
Use for session data, user profiles β avoids serializing/deserializing a full JSON blob when you only need one field.
r.hset("user:1001", mapping={"name": "Alice", "tier": "premium", "credits": "500"})
name = r.hget("user:1001", "name") # Get one field
profile = r.hgetall("user:1001") # Get all fields as dict
r.hincrby("user:1001", "credits", -10) # Atomically decrement credits 3. Lists β Ordered Queue / Recent History
r.lpush("chat:session-abc", "Hello") # Push to left
r.rpush("chat:session-abc", "World") # Push to right
history = r.lrange("chat:session-abc", 0, -1) # All items
r.ltrim("chat:session-abc", 0, 9) # Keep only last 10 messages 4. Sets β Unique Members
r.sadd("active-sessions", "sess-001", "sess-002")
r.sismember("active-sessions", "sess-001") # True/False
r.smembers("active-sessions") # All members 5. Sorted Sets β Leaderboards / Priority Queues
r.zadd("model-latency", {"gpt-4o": 250.5, "gpt-4o-mini": 85.2})
fastest = r.zrange("model-latency", 0, -1, withscores=True) # Ascending (fastest first) 6. TTL Strategy by Data Type
| # | Data | TTL | Reason |
|---|---|---|---|
| 1 | Rate limit counters | 60β300 seconds | Reset per time window |
| 2 | Inference result cache | 5β60 minutes | Reasonably fresh, invalidate on model update |
| 3 | User session data | 15β60 minutes | Expire inactive sessions |
| 4 | Product catalog | 1β24 hours | Changes infrequently |
| 5 | Static config | 24+ hours | Very stable data |
β‘ Redis Master Cheatsheet
r.set(key, val, ex=300)r.incr(key) β thread-safer.hset(key, mapping=dict)r.saddr.zadddecode_responses=TrueExercise β Cache AI Inference Results
30 min- Create Azure Managed Redis instance and connect on port 10000 with TLS
- Implement cache-aside for model inference: GET β MISS β model call β SET with 5-min TTL
- Implement rate limiting: INCR per user per minute, block at limit
- Store session history in a Redis List, trim to last 10 messages
- Measure latency difference: Redis HIT vs database query
Knowledge Check
5 min- Q: Azure Managed Redis connection port? A: 10000 with TLS (not 6379)
- Q: Cache frequently-read inference results, update on source change. Which pattern? A: Cache-aside (lazy loading)
- Q: Track API calls per user per minute atomically. Which command? A: INCR with SETEX for the window TTL
- Q: Store user profile fields individually without full JSON serialization. Which data type? A: Redis Hash
- Q: Production authentication for Redis without stored keys? A: Entra ID with redis-entraid + DefaultAzureCredential
Summary
2 minAzure Managed Redis = in-memory cache at microsecond latency. Connect on port 10000 with TLS. Implement cache-aside for AI inference results. Choose data types by pattern: String (simple cache), Hash (objects), List (queues/history), Set (unique members), Sorted Set (leaderboards). Set TTLs based on data volatility. Use Entra ID for production auth.
π§ Memory Tricks
Cache-aside flow: "Get β Miss β DB β Set" β always in that order
Data type mnemonic: "SHLSS" β String (cache), Hash (objects), List (queue), Set (unique), Sorted Set (ranked)
Azure Redis port: 10000 = "ten thousand ms slower than RAM... but still fast" (just remember: 10000, not 6379)
Azure Managed Redis
π Key Facts
- Azure Redis port β 10000 with TLS β NOT the open-source default 6379
- Cache-aside flow β GET β MISS β DB β SET with TTL β return
- Atomic counter β r.incr(key) β thread-safe, use for rate limiting
- Hash (object) β r.hset(key, mapping=dict) β avoids full JSON serialize/deserialize
- List (queue) β lpush/rpush + lrange + ltrim β chat history, task queues
- Set (unique) β r.sadd/sismember β active sessions, deduplicated sets
- Sorted Set (ranked) β r.zadd β leaderboards, priority queues, TTL ordering
- Prod auth β redis-entraid + DefaultAzureCredential (no stored key)
π» Commands & Patterns
import redis
r = redis.Redis(host="myinst.redis.azure.com",
port=10000, ssl=True, decode_responses=True,
password="your-access-key")
# Cache-aside
def get_result(key):
hit = r.get(key)
if hit: return hit
val = db_query()
r.set(key, val, ex=300) # 5-min TTL
return val
# Atomic rate limit
key = f"rate:{user_id}:{int(time.time())//60}"
count = r.incr(key); r.expire(key, 60)
if count > 10: raise RateLimitError()
# Hash for user profile
r.hset("user:1001", mapping={"name":"Alice","tier":"premium"})
r.hincrby("user:1001", "credits", -10)
# List: keep last 10 messages
r.lpush("chat:abc", "Hello"); r.ltrim("chat:abc", 0, 9) Implement Vector Indexing and Semantic Caching with Azure Managed Redis
Introduction to Redis Vector Search
3 minAzure Managed Redis (via the RediSearch module) supports vector similarity search β store embeddings in Redis hashes and query with FT.SEARCH KNN. The primary AI use case is semantic caching: return cached LLM responses for semantically similar prompts instead of calling the API every time.
Create a Vector Index with FT.CREATE
8 minCreate RediSearch Vector Index
import redis, struct
r = redis.Redis(
host="myredis.redis.cache.windows.net",
port=10000, password="key", ssl=True,
decode_responses=False # MUST be False for binary vectors
)
r.execute_command(
"FT.CREATE", "idx:docs", "ON", "HASH",
"PREFIX", "1", "doc:",
"SCHEMA",
"title", "TEXT",
"embedding", "VECTOR", "HNSW", "6",
"TYPE", "FLOAT32",
"DIM", "1536",
"DISTANCE_METRIC", "COSINE"
)
# Store document with embedding
def embed(text):
vec = oai.embeddings.create(input=text,
model="text-embedding-3-small").data[0].embedding
return struct.pack(f"{len(vec)}f", *vec)
r.hset("doc:1", mapping={
"title": "Azure Key Vault",
"embedding": embed("Azure Key Vault secures secrets")
}) decode_responses=False required for binary vector data. HNSW = approximate index type. DIM 1536 must match your embedding model dimensions. KNN Vector Search
7 minFT.SEARCH with KNN
def vector_search(query_text, top_k=5):
q_emb = embed(query_text)
results = r.execute_command(
"FT.SEARCH", "idx:docs",
f"*=>[KNN {top_k} @embedding $blob AS score]",
"PARAMS", "2", "blob", q_emb,
"RETURN", "3", "title", "score",
"SORTBY", "score",
"DIALECT", "2"
)
return results $blob passes binary embedding. AS score names the distance field. SORTBY score = nearest first. DIALECT 2 required for vector syntax. Semantic Caching Pattern
8 minCache-Aside with Similarity Threshold
import hashlib
THRESHOLD = 0.95 # cosine similarity for cache hit
def semantic_cache_get(prompt):
q_emb = embed(prompt)
res = r.execute_command(
"FT.SEARCH", "idx:cache",
"*=>[KNN 1 @embedding $blob AS score]",
"PARAMS", "2", "blob", q_emb,
"RETURN", "2", "response", "score",
"DIALECT", "2"
)
if res[0] > 0:
distance = float(dict(zip(res[2][::2],
res[2][1::2]))[b"score"])
if (1 - distance) >= THRESHOLD:
return res[2][res[2].index(b"response")+1]
return None
def semantic_cache_set(prompt, response, ttl=3600):
key = f"cache:{hashlib.md5(prompt.encode()).hexdigest()}"
r.hset(key, mapping={
"prompt": prompt,
"response": response,
"embedding": embed(prompt)
})
r.expire(key, ttl) 1 - distance. Always set TTL on cached responses β LLM answers become stale. Never cache user-specific or sensitive data. Exercise
20 min- Create an Azure Managed Redis (Enterprise tier with RediSearch)
- Create an
FT.CREATEindex with HNSW FLOAT32 DIM=1536 COSINE - Store 10 Q&A pairs with embeddings using
r.hset() - Implement
semantic_cache_get()with 0.95 similarity threshold - Test with rephrased questions β verify cache hits vs misses
Summary
2 minRedis vector search: FT.CREATE with HNSW index β HSET with binary embedding β FT.SEARCH KNN with DIALECT 2. Cosine similarity = 1 - distance. Semantic cache returns cached LLM responses for similar prompts (β₯0.95 threshold). Always TTL cache entries. Use decode_responses=False for binary vector operations.