Q: How is the data licensed?

Every tier includes an indefinite commercial license to the snapshot you purchase. You may integrate the data into your own app, run queries, and display the product images and information to your end-users. However, you may not redistribute, resell, or publicly host the raw files (CSV/JSONL/SQLite/images) for third parties to download. Future updates are sold separately (refer to the next question).

Q: How often is the dataset updated, and how do I stay current?

We run continuous background audits and refresh the master dataset ad hoc, at least once a month (most recent: 25 June 2026). The dataset you buy is a static snapshot at the time of purchase. You have two ways to stay current: (1) Differential refreshes, where you buy only the data points that changed since your last purchase at $0.0025 per updated data point. (2) The hosted API for real-time freshness.

Q: Will there be a live API?

Yes. A hosted REST API for real-time lookups is launching Fall 2026. It will support search by name, filtering by ingredient, batch requests, and a partner feedback loop to flag products for inclusion. Final pricing will be announced closer to launch; by licensing the full dataset today, you lock in early supporter-only rates and priority access.

Question 1

What is included in the dataset?

Accepted Answer

The data comes in three tiers, all built on the same 180,000 products with brand names, product category, country of origin, and safety flags (fragrance, drying alcohol, parabens, sulfates, silicones). Basic Product Index adds a flat, label-order ingredient list per product. Products & Ingredients adds a normalized ingredient dictionary (irritancy, comedogenicity, ratings, functions) with a product↔ingredient link table. The Complete Archive is everything in Products & Ingredients plus the full product-image archive (~18GB, 180,000 images).

Question 2

What formats do you deliver?

Accepted Answer

Every tier ships as industry-standard CSV, JSON line-delimited (JSONL), and a standalone SQLite database. Products & Ingredients and the Complete Archive use nested JSONL, relational CSVs, and SQLite with foreign keys and indexes, so they drop straight into Postgres, MongoDB, or data science environments like Pandas. A full data dictionary and schema guide are included with every tier.

Question 3

How accurate is the ingredient analysis?

Accepted Answer

From the Products & Ingredients tier up, our normalized dictionary follows a research-backed schema mapping ingredients to their functional classes (e.g., humectants, exfoliants) and an expert safety rating, with deep chemical enrichment: CAS, EC, IUPAC, and Ph. Eur. names plus common synonyms. We use standardized INCI names and include common aliases to keep your search logic robust. Because we aggregate global public data, brand-name normalization is ongoing; we continuously merge aliases, but you may occasionally see variants. (The entry-level Basic Product Index ships a flat ingredient list without this analysis.)

Question 4

How are the images mapped to the products?

Accepted Answer

Every product record contains a unique image_name field (e.g., 1042.jpeg). This ID matches the filename in the image archive, so you can link metadata to high-res imagery with a simple string-match. You host the ~18GB archive on your own infrastructure and may display the images directly to end users in your app. The image archive is included exclusively with the Complete Archive tier.

Question 5

Is my brand covered?

Accepted Answer

We track more than 24,000 brands across 180,000 products. The fastest way to check is the live coverage tool on any product page, which confirms specific brands instantly.

Question 6

What product categories are covered?

Accepted Answer

The catalogue is skincare-led. Approximate mix:
• Skincare: 62%
• Haircare: 12%
• Suncare: 8%
• Body & bath: 8%
• Makeup: 7%
• Other: 3%

Question 7

How complete are the fields?

Accepted Answer

We are transparent about coverage rather than letting you discover it mid-evaluation. Weighted by ingredient frequency across the dataset:
• Products with an ingredient list: ~100%
• Ingredient functions: ~94%
• CAS number: ~86%
• EC number: ~75%
• Comedogenicity rating: ~23%
• Irritancy rating: ~23%
• Concentration: <1%

Concentration is intentionally sparse: we only capture it where a manufacturer publicly discloses it, and we never estimate or extrapolate. The free sample reflects the same coverage you get in the full set.

Question 8

How is the data licensed?

Accepted Answer

Every tier includes an indefinite commercial license to the snapshot you purchase. You may integrate the data into your own app, run queries, and display the product images and information to your end-users. However, you may not redistribute, resell, or publicly host the raw files (CSV/JSONL/SQLite/images) for third parties to download. Future updates are sold separately (refer to the next question).

Question 9

How often is the dataset updated, and how do I stay current?

Accepted Answer

We run continuous background audits and refresh the master dataset ad hoc, at least once a month (most recent: 25 June 2026). The dataset you buy is a static snapshot at the time of purchase. You have two ways to stay current: (1) Differential refreshes, where you buy only the data points that changed since your last purchase at $0.0025 per updated data point. (2) The hosted API for real-time freshness.

Question 10

Will there be a live API?

Accepted Answer

Yes. A hosted REST API for real-time lookups is launching Fall 2026. It will support search by name, filtering by ingredient, batch requests, and a partner feedback loop to flag products for inclusion. Final pricing will be announced closer to launch; by licensing the full dataset today, you lock in early supporter-only rates and priority access.

Question 11

How do API limits and overage work?

Accepted Answer

You can cap usage at your monthly limit, so any calls beyond it are declined as over-limit until your next cycle, or allow metered overage billed per call. Email alerts are supported, and batch requests are metered per item processed (a batch of 50 products counts as 50 calls). Final rate limits will be set at launch.

Question 12

Can I see a sample before buying?

Accepted Answer

Absolutely. Each tier has its own 1000-product sample pack you can download instantly from this site. Every sample includes the full schema in CSV and JSONL formats so you can test your import scripts before committing to a tier.

Question 13

What is your refund policy?

Accepted Answer

Due to the digital nature of the dataset and the immediate access provided upon purchase, all sales are final. Because the data cannot be 'returned' once it has been accessed, we are unable to offer refunds. We strongly recommend downloading the free sample pack for your chosen tier to verify the data quality and schema compatibility before completing your purchase.

Frequently Asked Questions

The dataset

Coverage & quality

Licensing

Updates & the API

Buying & support

Still need a hand?

Email Us

Licensing & Pricing

Data & Schema

Existing Licensees

Want the live API?

Ready to get the data?