Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,10 @@ import litellm

litellm.api_base = "http://localhost:8080"
```
### Cache-aside

Install with:

### Cache-aside
Install with:
```bash
pip install semcache
```
Expand All @@ -127,6 +127,28 @@ response = client.get("Tell me France's capital city.")
print(response) # "Paris"
```


or in Node.js

Install with
```bash
npm install semcache
```
Use the sdk in your service

```javascript
const SemcacheClient = require('semcache');

const client = new SemcacheClient('http://localhost:8080');

(async () => {
await client.put('What is the capital of France?', 'Paris');

const result = await client.get('What is the capital of France?');
console.log(result); // => 'Paris'
})();
```

## Configuration

Configure via environment variables or `config.yaml`:
Expand Down
52 changes: 48 additions & 4 deletions docs/semcache/docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ docker run -p 8080:8080 semcache/semcache:latest

Semcache will start on `http://localhost:8080` and is ready to proxy LLM requests.

## Setting Up The Client
## Setting up proxy client

Semcache acts as a drop-in replacement for LLM APIs. Point your existing SDK to Semcache instead of the provider's endpoint:

Expand Down Expand Up @@ -105,7 +105,7 @@ This request will:
3. The provider responds with the answer
4. Semcache caches the response and returns it to you

## Testing Semantic Similarity
### Testing Semantic Similarity

Now try a semantically similar but differently worded question:

Expand Down Expand Up @@ -153,7 +153,7 @@ Now try a semantically similar but differently worded question:

Even though the wording is different, Semcache recognizes the semantic similarity and returns the cached response instantly - no API call to the upstream provider!

## Checking Cache Status
### Checking Cache Status

You can verify cache hits by checking the response headers. If there is a cache hit the `X-Cache-Status` header will be set to `hit`:

Expand Down Expand Up @@ -249,6 +249,50 @@ You can verify cache hits by checking the response headers. If there is a cache
</TabItem>
</Tabs>


## Setting up cache aside instance

<Tabs groupId="sdk">
<TabItem value="python" label="Python" default>
Install with
```bash
pip install semcache
```

```python
from semcache import Semcache

# Initialize the client
client = Semcache(base_url="http://localhost:8080")

# Store a key-data pair
client.put("What is the capital of France?", "Paris")

# Retrieve data by semantic similarity
response = client.get("Tell me France's capital city.")
print(response) # "Paris"
```
</TabItem>
<TabItem value="Node.js" label="Node.js">
Install with
```bash
npm install semcache
```
```javascript
const SemcacheClient = require('semcache');

const client = new SemcacheClient('http://localhost:8080');

(async () => {
await client.put('What is the capital of France?', 'Paris');

const result = await client.get('What is the capital of France?');
console.log(result); // => 'Paris'
})();
```
</TabItem>
</Tabs>

## Monitor Your Cache

Visit the built-in admin dashboard at `http://localhost:8080/admin` to monitor:
Expand All @@ -263,4 +307,4 @@ The process is identical across all providers - Semcache automatically detects t

- **[LLM Providers & Tools](./llm-providers-tools.md)** - Configure additional providers like DeepSeek, Mistral, and custom LLMs
- **[Configuration](./configuration/cache-settings.md)** - Adjust similarity thresholds and cache behavior
- **[Monitoring](./monitoring/metrics.md)** - Set up production monitoring with Prometheus and Grafana
- **[Monitoring](./monitoring/metrics.md)** - Set up production monitoring with Prometheus and Grafana