Streaming allows you to receive partial responses from the Perplexity API as they are generated, rather than waiting for the complete response. This is particularly useful for:
Real-time user experiences - Display responses as they’re generated
Long responses - Start showing content immediately for lengthy analyses
Interactive applications - Provide immediate feedback to users
Streaming is supported across all Perplexity models including Sonar, Sonar Pro, and reasoning models.
With this code snippet, you can stream responses from the Perplexity API using the requests library. However, you will need to parse the response manually to get the content, search results, and metadata.
Copy
Ask AI
import requests# Set up the API endpoint and headersurl = "https://api.perplexity.ai/chat/completions"headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}payload = { "model": "sonar-pro", "messages": [ {"role": "user", "content": "Who are the top 5 tech influencers on X?"} ], "stream": True # Enable streaming for real-time responses}response = requests.post(url, headers=headers, json=payload, stream=True)# Process the streaming response (simplified example)for line in response.iter_lines(): if line: print(line.decode('utf-8'))
import requestsimport jsondef stream_with_requests_metadata(): url = "https://api.perplexity.ai/chat/completions" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } payload = { "model": "sonar", "messages": [{"role": "user", "content": "Explain quantum computing"}], "stream": True } response = requests.post(url, headers=headers, json=payload, stream=True) content = "" metadata = {} for line in response.iter_lines(): if line: line = line.decode('utf-8') if line.startswith('data: '): data_str = line[6:] if data_str == '[DONE]': break try: chunk = json.loads(data_str) # Process content if 'choices' in chunk and chunk['choices'][0]['delta'].get('content'): content_piece = chunk['choices'][0]['delta']['content'] content += content_piece print(content_piece, end='', flush=True) # Collect metadata for key in ['search_results', 'usage']: if key in chunk: metadata[key] = chunk[key] # Check if streaming is complete if chunk['choices'][0].get('finish_reason'): print(f"\n\nMetadata: {metadata}") except json.JSONDecodeError: continue return content, metadatastream_with_requests_metadata()
Important: If you need search results immediately for your user interface, consider using non-streaming requests for use cases where search result display is critical to the real-time user experience.