Optimized Data Similarity API is a high-speed fuzzy matching and deduplication API built for real-world, messy data. It helps you identify near-duplicate records and reconcile entities even when values don’t match exactly—typos, casing differences, missing punctuation, spacing issues, abbreviations, and minor word-order changes.
Instead of building and tuning your own fuzzy matching pipeline, you send your strings (or records) to the API and get back similarity-scored matches you can trust. Typical outputs include matched pairs (e.g., “Apple” ↔ “apple inc.”), similarity scores, and structured results that are easy to plug into data cleaning workflows, CRMs, ETL jobs, and analytics pipelines.
Common use cases:
Deduplicate lists: find duplicates inside a dataset (all-to-all matching) and return likely duplicate pairs.
Reconcile against a master list: match an incoming list to a canonical set (list-to-master).
CRM and customer data hygiene: clean leads/accounts/companies where duplicates break reporting and outreach.
Entity resolution & record linkage: connect references to the same real-world entity across sources.
Why teams use it:
Works on messy text out of the box (no manual rules for every edge case)
Similarity scores for ranking and thresholds (you choose how strict to be)
Built for scale and automation (designed to run in pipelines, not just one-off scripts)
{"status":"success","response_data":[["Apple","appl!e",1.0]]}
curl --location --request POST 'https://zylalabs.com/api/11920/optimized+data+similarity+api/22654/dedupe?data=["Apple", "appl!e"]' --header 'Authorization: Bearer YOUR_API_KEY'
| Header | Description |
|---|---|
Authorization
|
[Required] Should be Bearer access_key. See "Your API Access Key" above when you are subscribed. |
No long-term commitment. Upgrade, downgrade, or cancel anytime.
The Dedupe endpoint returns a JSON object containing matched pairs of strings, similarity scores, and optional deduplicated results. The output can be formatted as string pairs, index pairs, or deduplicated strings, depending on the specified configuration.
Key fields in the response data include "status" (indicating success or error) and "response_data," which contains the results formatted according to the user's request, such as matched pairs or deduplicated strings.
Users can customize requests by adjusting parameters in the "config" object, such as "similarity_threshold" for match strictness, "remove_punctuation" for preprocessing, and "output_format" to choose the desired result structure.
The response data is organized as an array of results, where each entry corresponds to a match or deduplicated string. Depending on the output format, entries may include original strings, indices, and similarity scores, facilitating easy integration into workflows.
Typical use cases include deduplicating customer lists, reconciling records against a master list, cleaning CRM data, and performing entity resolution across different data sources to ensure data integrity and accuracy.
Data accuracy is maintained through advanced fuzzy matching algorithms that account for common data issues like typos and casing differences. The API is designed to handle messy data effectively, ensuring reliable matching results.
Accepted parameter values include "similarity_threshold" (0 to 1), "remove_punctuation" (boolean), "to_lowercase" (boolean), "use_token_sort" (boolean), and "top_k" (integer or "all"). These parameters allow users to tailor the matching process to their specific needs.
If the Dedupe endpoint returns partial or empty results, users should check the input data for quality issues, such as excessive duplicates or very low similarity thresholds. Adjusting the "similarity_threshold" or reviewing the input list can help improve results.
To obtain your API key, first sign in to your account and navigate to the API you want to use. From the API's Pricing section, choose a plan and complete the subscription process. Once subscribed, return to the API page and you will see your API Access Key displayed at the top of the documentation page. You can use this key to authenticate your requests.
You can’t switch APIs during the free trial. If you subscribe to a different API, your trial will end and the new subscription will start as a paid plan.
The free trial lasts for 7 days and allows you to make up to 50 API requests.
No, the free trial is available only once, so we recommend using it on the API that interests you the most. Most of our APIs offer a free trial, but some may not include this option.
Yes. If the API offers a free trial, you will see a "Free 7-Day Trial" option in its Pricing section. The trial lasts for 7 days and allows up to 50 API requests, enabling you to evaluate the API before subscribing to a paid plan.
Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.
You can monitor your API usage through the response headers included with every request:
x-zyla-api-calls-monthly-used: Shows the total number of API requests you have used during the current billing period.
x-zyla-api-calls-monthly-remaining: Shows the number of API requests you have remaining for the current billing period.
Yes, you can cancel your subscription at any time. Simply go to the Pricing section of the API you're subscribed to and click the "Unsubscribe" button.
Please note that upgrades, downgrades, and cancellations take effect immediately. Once your subscription is canceled, access to the service will end immediately, regardless of any remaining API calls in your quota.
Please have a look at our Refund Policy: https://zylalabs.com/terms#refund
Service Level:
100%
Response Time:
684ms
Service Level:
99%
Response Time:
471ms
Service Level:
100%
Response Time:
304ms
Service Level:
99%
Response Time:
9,434ms
Service Level:
100%
Response Time:
402ms
Service Level:
100%
Response Time:
284ms
Service Level:
100%
Response Time:
156ms
Service Level:
100%
Response Time:
89ms
Service Level:
100%
Response Time:
55ms
Service Level:
100%
Response Time:
61ms
Service Level:
100%
Response Time:
1,398ms
Service Level:
100%
Response Time:
1,232ms
Service Level:
100%
Response Time:
418ms
Service Level:
100%
Response Time:
1,044ms
Service Level:
100%
Response Time:
839ms
Service Level:
100%
Response Time:
728ms
Service Level:
100%
Response Time:
520ms
Service Level:
100%
Response Time:
403ms
Service Level:
100%
Response Time:
1,944ms
Service Level:
100%
Response Time:
7ms