Optimized Data Similarity API API ID: 11920

Optimized Data Optimized Data Similarity API: Enhance your applications with efficient data similarity solutions tailored for performance.

Use this API from your AI agent via MCP

Works with OpenClaw, Claude Code/Desktop, Cursor, Windsurf, Cline and any MCP-compatible AI client.

Docs & setup

Create a skill by wrapping this MCP: https://mcp.zylalabs.com/mcp?apikey=YOUR_ZYLA_API_KEY

Long description (balanced, marketplace-friendly)

Optimized Data Similarity API is a high-speed fuzzy matching and deduplication API built for real-world, messy data. It helps you identify near-duplicate records and reconcile entities even when values don’t match exactly—typos, casing differences, missing punctuation, spacing issues, abbreviations, and minor word-order changes.

Instead of building and tuning your own fuzzy matching pipeline, you send your strings (or records) to the API and get back similarity-scored matches you can trust. Typical outputs include matched pairs (e.g., “Apple” ↔ “apple inc.”), similarity scores, and structured results that are easy to plug into data cleaning workflows, CRMs, ETL jobs, and analytics pipelines.

Common use cases:

Deduplicate lists: find duplicates inside a dataset (all-to-all matching) and return likely duplicate pairs.
Reconcile against a master list: match an incoming list to a canonical set (list-to-master).
CRM and customer data hygiene: clean leads/accounts/companies where duplicates break reporting and outreach.
Entity resolution & record linkage: connect references to the same real-world entity across sources.

Why teams use it:

Works on messy text out of the box (no manual rules for every edge case)
Similarity scores for ranking and thresholds (you choose how strict to be)
Built for scale and automation (designed to run in pipelines, not just one-off scripts)

API Documentation

Endpoints

Dedupe Endpoint ID: 22654

Dedupe is an all-to-all fuzzy matching endpoint for finding duplicates within a single list of strings. Instead of comparing only two inputs per API call, you send a dataset and it returns similar pairs and/or deduplicated groups across the entire set.

Why you’d use it

Massive speedup: typically ~300× to 1,000× faster than “regular” approaches people try first (pairwise comparisons, looping fuzzy scorers, etc.) once you go beyond tiny lists.
Optional cleanup built-in: you can enable common text cleanup (lowercasing, punctuation removal, token sorting). This saves hours (or days) of development + ongoing maintenance.
Company suffixes handled automatically: common endings like “Inc”, “LLC”, “Ltd”, etc. are stripped so you match the real name.

Benchmarks: similarity-api/blog/speed-benchmarks (1M records in ~7 minutes; faster than common Python fuzzy matching libraries).

Hard limits on Zyla

Max 1,000 strings per request (enforced).

Need bigger / unlimited?

Use the full version at similarity-api/docs

Parameters (POST request)

data (required)

A string containing a JSON array of strings.

Example value for data:
["Acme Inc","ACME LLC","Globex GmbH"]

similarity_threshold (optional, 0.0 to 1.0, default 0.75)

Higher = stricter matching (fewer pairs). Typical: 0.80–0.90 for company dedupe.

remove_punctuation (optional, true/false, default true)

Removes punctuation differences (e.g., “A.C.M.E.” vs “ACME”).

to_lowercase (optional, true/false, default true)

Makes matching case-insensitive.

use_token_sort (optional, true/false, default false)

Helps when word order changes (e.g., “Bank of America” vs “America Bank of”).
output_format (optional, default string_pairs)

This exndpoint can return data in multiple formats. Please select one of the following:
- string_pairs:
  - Returns the duplicate matches as text, so you can read them immediately.
    Each row is: [string_A, string_B, similarity]
    Use when: you want to see which names matched which names.
- index_pairs:
  - Same as string_pairs, but returns positions in your input list instead of the strings.
    Each row is: [index_A, index_B, similarity]
    Use when: you want to join results back to your source rows safely (databases, spreadsheets, CRM exports).
- deduped_strings:
  - Returns a cleaned list with duplicates removed (keeps one representative from each duplicate group).
    Use when: you want a final list to export/use, without worrying about mapping back.
- deduped_indices:
  - Same idea as deduped_strings, but returns the indices of the kept items.
    Use when: you want to keep the original rows (by index) and drop the duplicates.
- membership_map:
  - Returns a list the same length as your input where each position tells you the representative index for that item.
    Example: [0,0,0,3,3] means rows 0/1/2 are one group (rep=0) and rows 3/4 are another (rep=3).
    Use when: you want clustering / group IDs per row.
- row_annotations:
  - Returns one object per input row with an explanation of what it belongs to (rep row + similarity).
    Use when: you want a human-readable, per-row result for debugging or UI display.
top_k (optional, integer or "all", default "all")

all = find all matches above threshold.

Or an integer (e.g., 50) to limit matches per row (faster, fewer results).

Sample request in python

import requests, json

API_KEY = "YOUR_ZYLA_KEY"
URL = "API_URL/dedupe"

data_list = ["Microsoft","Micsrosoft","Apple Inc","Apple","Google LLC","9oogle"]

params = {
"data": json.dumps(data_list),
"similarity_threshold": "0.75",
"remove_punctuation": "true",
"to_lowercase": "true",
"use_token_sort": "false",
"output_format": "string_pairs",
"top_k": "all"
}

headers = {"Authorization": f"Bearer {API_KEY}"}
r = requests.post(URL, headers=headers, params=params, timeout=60)
print(r.status_code)
print(r.json())

                                                                            
POST https://pr189-testing.zylalabs.com/api/11920/optimized+data+similarity+api/22654/dedupe

Dedupe - Endpoint Features

Object	Description
`data`	[Required] JSON array of strings to deduplicate (max 1000). Example: ["a","b","c"]
`similarity_threshold`	Optional Similarity cutoff from 0 to 1. Higher values are stricter (fewer matches). Default is 0.75.
`remove_punctuation`	Optional If true, punctuation is removed before matching. Default is true.
`to_lowercase`	Optional If true, strings are lowercased before matching. Default is true.
`use_token_sort`	Optional If true, tokens in each string are sorted before matching. Useful when word order varies. Default is false.
`output_format`	Optional Default: string_pairs Allowed values (and what each means): index_pairs List of matches as [i, j, score] where i and j are indices in the input list. string_pairs List of matches as [string_i, string_j, score] using original strings. deduped_strings List of strings with duplicates removed (one representative per group). deduped_indices List of indices representing the deduplicated set (one representative per group). membership_map Array of length N where entry i is the representative index for the group of data[i]. row_annotations Array of objects (one per input row) with fields: index, original_string, rep_index, rep_string, similarity_to_rep.
`top_k`	Optional Limits how many neighbors are returned per input string. Use all for full dedupe, or a positive integer for top matches per row.

Free test requests remaining: 3 of 3.

INPUT PARAMETERS

data

similarity_threshold

remove_punctuation

to_lowercase

use_token_sort

output_format

top_k

API EXAMPLE RESPONSE

{"status":"success","response_data":[["Apple","appl!e",1.0]]}

Dedupe - CODE SNIPPETS


curl --location --request POST 'https://zylalabs.com/api/11920/optimized+data+similarity+api/22654/dedupe?data=["Apple", "appl!e"]' --header 'Authorization: Bearer YOUR_API_KEY'

API Access Key & Authentication

After signing up, every developer is assigned a personal API access key, a unique combination of letters and digits provided to access to our API endpoint. To authenticate with the Optimized Data Similarity API simply include your bearer token in the Authorization header.

Headers

Header	Description
`Authorization`	[Required] Should be `Bearer access_key`. See "Your API Access Key" above when you are subscribed.

Questions

Simple Transparent Pricing

No long-term commitment. Upgrade, downgrade, or cancel anytime.

💫Basic

$24.99/Month

50 Requests / Month
Then $0.6497400 per request if limit exceeded.
Rate Limit: 60 reqs per minute
Specialized Customer Support
Real-Time API Monitoring
Unlimited Data Transfer Included

$24.99 / Month

No commitment. Cancel anytime

Popular

⚡Pro

$49.99/Month

100 Requests / Month
Then $0.6497400 per request if limit exceeded.
Rate Limit: 60 reqs per minute
Specialized Customer Support
Real-Time API Monitoring
Unlimited Data Transfer Included

$49.99 / Month

No commitment. Cancel anytime

🔥Pro Plus

$99.99/Month

200 Requests / Month
Then $0.6497400 per request if limit exceeded.
Rate Limit: 120 reqs per minute
Specialized Customer Support
Real-Time API Monitoring
Unlimited Data Transfer Included

$99.99 / Month

No commitment. Cancel anytime

🚀 Enterprise

Starts at
$ 10,000/Year

Custom Volume
Custom Rate Limit
Specialized Customer Support
Real-Time API Monitoring

Book a Call

Customer favorite features

✔︎ Only Pay for Successful Requests
✔︎ Free 7-Day Trial
✔︎ Multi-Language Support
✔︎ One API Key, All APIs.
✔︎ Intuitive Dashboard

✔︎ Comprehensive Error Handling
✔︎ Developer-Friendly Docs
✔︎ Postman Integration
✔︎ Secure HTTPS Connections
✔︎ Reliable Uptime

Optimized Data Similarity API FAQs

What type of data does the Dedupe endpoint return?

The Dedupe endpoint returns a JSON object containing matched pairs of strings, similarity scores, and optional deduplicated results. The output can be formatted as string pairs, index pairs, or deduplicated strings, depending on the specified configuration.

What are the key fields in the response data?

Key fields in the response data include "status" (indicating success or error) and "response_data," which contains the results formatted according to the user's request, such as matched pairs or deduplicated strings.

How can users customize their data requests?

Users can customize requests by adjusting parameters in the "config" object, such as "similarity_threshold" for match strictness, "remove_punctuation" for preprocessing, and "output_format" to choose the desired result structure.

How is the response data organized?

The response data is organized as an array of results, where each entry corresponds to a match or deduplicated string. Depending on the output format, entries may include original strings, indices, and similarity scores, facilitating easy integration into workflows.

What are typical use cases for this data?

Typical use cases include deduplicating customer lists, reconciling records against a master list, cleaning CRM data, and performing entity resolution across different data sources to ensure data integrity and accuracy.

How is data accuracy maintained?

Data accuracy is maintained through advanced fuzzy matching algorithms that account for common data issues like typos and casing differences. The API is designed to handle messy data effectively, ensuring reliable matching results.

What are the accepted parameter values for the Dedupe endpoint?

Accepted parameter values include "similarity_threshold" (0 to 1), "remove_punctuation" (boolean), "to_lowercase" (boolean), "use_token_sort" (boolean), and "top_k" (integer or "all"). These parameters allow users to tailor the matching process to their specific needs.

How to handle partial or empty results?

If the Dedupe endpoint returns partial or empty results, users should check the input data for quality issues, such as excessive duplicates or very low similarity thresholds. Adjusting the "similarity_threshold" or reviewing the input list can help improve results.

General FAQs

How do I get an API key?

To obtain your API key, first sign in to your account and navigate to the API you want to use. From the API's Pricing section, choose a plan and complete the subscription process. Once subscribed, return to the API page and you will see your API Access Key displayed at the top of the documentation page. You can use this key to authenticate your requests.

Can I switch APIs during the free trial?

You can’t switch APIs during the free trial. If you subscribe to a different API, your trial will end and the new subscription will start as a paid plan.

When does the free trial end?

The free trial lasts for 7 days and allows you to make up to 50 API requests.

Can I use the free trial more than once?

No, the free trial is available only once, so we recommend using it on the API that interests you the most. Most of our APIs offer a free trial, but some may not include this option.

Does the API offer a free trial?

Yes. If the API offers a free trial, you will see a "Free 7-Day Trial" option in its Pricing section. The trial lasts for 7 days and allows up to 50 API requests, enabling you to evaluate the API before subscribing to a paid plan.

What is Zyla API Hub?

Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.

What currencies and payment methods are allowed?

Prices are listed in USD (United States Dollar), EUR (Euro), CAD (Canadian Dollar), AUD (Australian Dollar), and GBP (British Pound). We accept all major debit and credit cards. Our payment system uses the latest security technology and is powered by Stripe, one of the world's most reliable payment companies. If you have any trouble paying by card, just contact us at [email protected]

Additionally, if you already have an active subscription in any of these currencies (USD, EUR, CAD, AUD, GBP), that currency will remain for subsequent subscriptions. You can change the currency at any time as long as you don't have any active subscriptions.

Why can't I pay with my local currency even though I see it on the pricing page?

The local currency shown on the pricing page is based on the country of your IP address and is provided for reference only. The actual prices are in USD (United States Dollar). When you make a payment, the charge will appear on your card statement in USD, even if you see the equivalent amount in your local currency on our website. This means you cannot pay directly with your local currency.

My payment was declined, what should I do?

Occasionally, a bank may decline the charge due to its fraud protection settings. We suggest reaching out to your bank initially to check if they are blocking our charges. Also, you can access the Billing Portal and change the card associated to make the payment. If these does not work and you need further assistance, please contact our team at [email protected]

How will I be charged for my API subscription?

Prices are determined by a recurring monthly or yearly subscription, depending on the chosen plan.

How will my API calls be deducted from my plan?

API calls are deducted from your plan based on successful requests. Each plan comes with a specific number of calls that you can make per month. Only successful calls, indicated by a Status 200 response, will be counted against your total. This ensures that failed or incomplete requests do not impact your monthly quota.

How does your billing cycle work?

Zyla API Hub works on a recurring monthly subscription system. Your billing cycle will start the day you purchase one of the paid plans, and it will renew the same day of the next month. So be aware to cancel your subscription beforehand if you want to avoid future charges.

How do I upgrade my current subscription plan with an API?

To upgrade your current subscription plan, simply go to the pricing page of the API and select the plan you want to upgrade to. The upgrade will be instant, allowing you to immediately enjoy the features of the new plan. Please note that any remaining calls from your previous plan will not be carried over to the new plan, so be aware of this when upgrading. You will be charged the full amount of the new plan.

How can I see the remaining number of API calls I can make this month?

To check how many API calls you have left for the current month, refer to the 'X-Zyla-API-Calls-Monthly-Remaining' field in the response header. For example, if your plan allows 1,000 requests per month and you've used 100, this field in the response header will indicate 900 remaining calls.

How can I check how many API requests I've used and how many I have remaining?

You can monitor your API usage through the response headers included with every request:

x-zyla-api-calls-monthly-used: Shows the total number of API requests you have used during the current billing period.
x-zyla-api-calls-monthly-remaining: Shows the number of API requests you have remaining for the current billing period.

How do I know when my rate limit will reset?

The 'X-Zyla-RateLimit-Reset' header shows the number of seconds until your rate limit resets. This tells you when your request count will start fresh. For example, if it displays 3,600, it means 3,600 seconds are left until the limit resets.

Can I cancel anytime?

Yes, you can cancel your subscription at any time. Simply go to the Pricing section of the API you're subscribed to and click the "Unsubscribe" button.

Please note that upgrades, downgrades, and cancellations take effect immediately. Once your subscription is canceled, access to the service will end immediately, regardless of any remaining API calls in your quota.

What happens if I forget to cancel my free trial?

After 7 days, you will be charged the full amount for the plan you were subscribed to during the trial. Therefore, it's important to cancel before the trial period ends. Refund requests for forgetting to cancel on time are not accepted.

How many calls can I make during the free trial?

When you subscribe to an API free trial, you can make up to 50 API calls. If you wish to make additional API calls beyond this limit, the API will prompt you to perform an "Start Your Paid Plan." You can find the "Start Your Paid Plan" button in your profile under Subscription -> Choose the API you are subscribed to -> Pricing tab.

If I have any problems, who I should contact?

You can contact us through our chat channel to receive immediate assistance. We are always online from 8 am to 5 pm (EST). If you reach us after that time, we will get back to you as soon as possible. Additionally, you can contact us via email at [email protected]

What is your refund policy?

Please have a look at our Refund Policy: https://zylalabs.com/terms#refund

Service Level

100%

Response Time

3,110ms

Category:

Natural Language Processing (NLP)

Tags:

#Fuzzy Matching

#Text Similarity

#Name Matching

#Company Matching

#Confidence Scoring

#Data Processing

Related APIs

News API

News API is a powerful tool that provides real-time access to the latest web-based news content. With the ability to specify your desired topic, you can effortlessly retrieve up-to-date information and stay informed. Stay ahead with this comprehensive API for accessing news tailored to your interests.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

684ms

Feed Reader API

The Feed Reader API allows developers to easily read and normalize data from RSS/ATOM/JSON feed URLs. It enables users to retrieve the latest news and content from various sources and normalize the data into a common format. The API also allows developers to filter, sort, and paginate the data, this allows them to develop a variety of feed-based applications like news aggregators, social media bots, and content-based services.

News & Events Free 7-Day Trial

Service Level:

99%

Response Time:

471ms

Check Holidays API

Keep track of public and non-public holidays around the world. You can check what holidays are going to be celebrated in a vast list of supported countries.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

304ms

Reuters News API

The Reuters News API provides news, data, and analytics tailored for professionals in global markets and business sectors.

News & Events Free 7-Day Trial

Service Level:

99%

Response Time:

9,434ms

Get Moka News API

A reliable scraper API that continuously collects news every minute from 30+ verified global sources, with plans to expand coverage to hundreds more.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

402ms

New York Times News API

New York Times News API is a powerful tool that provides developers with access to an extensive collection of news articles from The New York Times, one of the world's most reputable and influential news organizations.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

284ms

Apple News API

The Apple News API allows developers to integrate news content from Apple News into their apps and websites.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

156ms

Future UK Bank Holidays API

The Future UK Bank Holidays API allows developers to retrieve future bank holidays in the United Kingdom by using a required parameter for the date.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

89ms

NYC Local News API

Capture media coverage and local news in New York City (NYC) to enhance your projects efficiently.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

55ms

New York City News Data API

Seamlessly retrieve the latest news data from New York's top sources for your applications.

News & Events Free 7-Day Trial

Service Level:

100%

Response Time:

61ms

IP Blacklist Check API

Verifies IP addresses against multiple blacklists, indicating listing status, number of matches, and detailed results by blacklist.

Security & Cybersecurity Free 7-Day Trial

Service Level:

100%

Response Time:

1,398ms

BIN IP API

Verifies the validity of BINs and IP addresses, detects geographic matches, proxies, ISPs, and provides complete details of country, city, and currency.

Location & Mapping Free 7-Day Trial

Service Level:

100%

Response Time:

1,232ms

BIN IP Checker API

Check and verify credit, and debit cards from all major brands or private ones as well. We offer the industry standard accurate information.

Finance & Payments Free 7-Day Trial

Service Level:

100%

Response Time:

418ms

BIN IP Information Capture API

Effortlessly capture and validate BIN details to enhance transaction security and customer experience.

Security & Cybersecurity Free 7-Day Trial

Service Level:

100%

Response Time:

1,044ms

BIN IP Information Extractor API

Extract crucial BIN IP information effortlessly with our user-friendly BIN IP Information Extractor API.

Data & Analytics Free 7-Day Trial

Service Level:

100%

Response Time:

839ms

Retrieve BIN IP Information API

Quickly access detailed BIN data for seamless payment processing and risk management.

Finance & Payments Free 7-Day Trial

Service Level:

100%

Response Time:

728ms

BIN IP Data Retrieval API

Access detailed location and organization data with our BIN IP Data Retrieval API for enhanced user insights.

Location & Mapping Free 7-Day Trial

Service Level:

100%

Response Time:

520ms

Retrieve BIN IP Details API

Quickly retrieve comprehensive BIN IP details to enrich your applications with our innovative API solution.

Data & Analytics Free 7-Day Trial

Service Level:

100%

Response Time:

403ms

IP Blacklist Detector API

The IP Blacklist Detector API offers real-time analysis to identify and block malicious IP addresses, bolstering your online security.

Security & Cybersecurity Free 7-Day Trial

Service Level:

100%

Response Time:

1,944ms

Fraud IP Checker API

Fraud IP Checker API Verification API evaluates the reputation and risk associated with an IP address to detect possible fraud or suspicious activity.

Security & Cybersecurity Free 7-Day Trial

Service Level:

100%

Response Time:

7ms

Optimized Data Similarity API API ID: 11920

Long description (balanced, marketplace-friendly)

What would you like to see? See the information or check the documentation?

API Documentation

Endpoints

INPUT PARAMETERS

API EXAMPLE RESPONSE

Dedupe - CODE SNIPPETS

API Access Key & Authentication

Questions

Simple Transparent Pricing

💫Basic

$24.99/Month

⚡Pro

$49.99/Month

🔥Pro Plus

$99.99/Month

🚀 Enterprise

Starts at $ 10,000/Year

Customer favorite features

Optimized Data Similarity API FAQs

What type of data does the Dedupe endpoint return?

What are the key fields in the response data?

How can users customize their data requests?

How is the response data organized?

What are typical use cases for this data?

How is data accuracy maintained?

What are the accepted parameter values for the Dedupe endpoint?

How to handle partial or empty results?

General FAQs

How do I get an API key?

Can I switch APIs during the free trial?

When does the free trial end?

Can I use the free trial more than once?

Does the API offer a free trial?

What is Zyla API Hub?

What currencies and payment methods are allowed?

Why can't I pay with my local currency even though I see it on the pricing page?

My payment was declined, what should I do?

How will I be charged for my API subscription?

How will my API calls be deducted from my plan?

How does your billing cycle work?

How do I upgrade my current subscription plan with an API?

How can I see the remaining number of API calls I can make this month?

How can I check how many API requests I've used and how many I have remaining?

How do I know when my rate limit will reset?

Can I cancel anytime?

What happens if I forget to cancel my free trial?

How many calls can I make during the free trial?

If I have any problems, who I should contact?

What is your refund policy?

Service Level

Response Time

Category:

Tags:

Related APIs

News API

Feed Reader API

Check Holidays API

Reuters News API

Get Moka News API

New York Times News API

Apple News API

Future UK Bank Holidays API

NYC Local News API

New York City News Data API

You might also like

IP Blacklist Check API

BIN IP API

BIN IP Checker API

BIN IP Information Capture API

BIN IP Information Extractor API

Retrieve BIN IP Information API

BIN IP Data Retrieval API

Retrieve BIN IP Details API

IP Blacklist Detector API

Fraud IP Checker API

Starts at
$ 10,000/Year