Arthur Vander Voort
Feb 18, 2023
  2096
(0 votes)

Optimizely PIM - Data Cleansing in PIM: Save Time by Importing Messy Data

In my previous post, I talked pretty extensively about why setting up data governance is both an obstacle and essential to getting data into the PIM. Another common obstacle is trying to perfect your data prior to importing it to the PIM. You can spend a lot of time in an Excel spreadsheet, trying to find and replace a bunch of bad values across dozens or hundreds of columns. While this is a valid approach, Optimizley PIM also has functionality that you can leverage to cleanse your data directly in the application. This article outlines an approach to speed up your implementations without sacrificing data quality.

MessyDataSample.png

Before we get into it, here are a few caveats:

  • This approach is specifically for what we commonly call 'attribute' data – product data that has a defined list of values. Think of size, color, guage, width, length, material, finish, wattage, and so on.
  • Depending on the number of properties and property values, this can still be a time consuming process. 
  • This approach will not help you clean up data for unique product properties like product description or product title.
  • This approach is only recommended during implementation. You must re-establish strong data governance after cleansing your data to get value from this exercise.

Set up your properties to allow bad data

Properties that ony allow a defined list of values typically make up the bulk of product data. Normally, when data is imported, if a product has a value that is not on the list for this property, the data will not be imported, and the bad data will be isolated. The key to this approach is to intentionally set your properties in a way that allows bad data to be ingested. You must set all of your properties that use a defined list of values to allow ad hoc values. With ad hoc values enabled, no data governance is enforced, and any value will be imported and added to the property's value list.

If you already have all of your properties created, you can easily toggle them to allow ad hoc values:

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'Allow Adhoc Values' to 'yes' for every row.
  4. Import the properties to the PIM.
SetAdhocs.gif

Import your messy product data

With the properties configured, you can now import your product data. Data governance will not prevent your messy data from being imported for these properties, which is exactly what we want.

  1. Go to imports.
  2. Select your product data file.
  3. Map your data and proceed through the import wizard.
  4. Initiate the product import.

Review & cleanse your data

With the import complete, all of the data, good and bad, has been imported and the property value lists have been updated. This is where the work starts. We are going to review the values for each property and correct any bad ones. To do this, we will filter for the properties we configured to allow ad hoc values and then work our way through the list.

  1. On the property list, apply a filter for properties that use a dropdown control type.
  2. Edit the first property.
  3. Go to the values tab.
  4. Review the list of values for bad data.
  5. For any bad value, click the edit icon and enter the preferred value (for instance, if you want '33 inches to' be '33 in', update this).
  6. When you have cleaned up all the bad values, toggle 'allow ad hoc values' to false and save the property.
  7. Repeat until you are through all properties.
PropertyValueCleanup.gif

When the data was initially imported, any bad values were added to the property list and saved to products that had those values. When you edit the property value and save the property, we will propagate the changes to all products that had the value. This lets you remove duplicate (e.g. 24" & 24 inches) and erroneous values while automatically updating the product data.

Re-establish your data governance to maintain quality

Now that all of our property value lists have been reviewed and the product data is cleansed,  we need to ensure that we don't waste this effort by allowing bad data back into the PIM. In the previous step, we recommended toggling the ad hoc value option off after you finished each property, but make sure you did not miss any. To do this, we can use the export/import approach again similar to when we initially enabled the ad hoc values.

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'enable ad hoc' to 'no' for every row.
  4. Import the properties to PIM.
Feb 18, 2023

Comments

Please login to comment.
Latest blogs
Commerce 15 and CMS 13: Optimizely’s Next Step Toward AI-Powered, Graph-First Commerce

Optimizely is preparing to release Commerce 15 in mid-May 2026 , positioning this as a foundational shift—not just an upgrade. The direction is...

Augusto Davalos | May 7, 2026

The future of Content: Introducing Optimizely CMS 13

Optimizely In the rapidly evolving landscape of digital experience, the "monolithic vs. headless" debate is being replaced by a more sophisticated...

Aniket | May 6, 2026

Hide built in scheduled job from the admin UI

Ok so this probably goes into the not so useful section but late last night I got a veery strong feeling that all projects I am  involved with have...

Per Nergård (MVP) | May 6, 2026

Optimizely SaaS CMS Developer Certification Exam

The Optimizely SaaS CMS Developer Certification is an industry-recognized credential for developers and architects who build scalable, composable...

Megha Rathore | May 5, 2026

Piwik PRO Connector for Optimizely CMS — Now on NuGet (and Yes, It Speaks Both 12 and 13)

Analytics has spent the last decade living in another tab — and what's in that tab usually isn't the full story. Between consent requirements,...

Allan Thraen | May 4, 2026 |

A First Look at Optimizely Remote MCP Server for Experimentation

Optimizely just released a Remote MCP Server for Experimentation and I've been trying it out to see what it can do. If you don't know, MCP (Model...

Jacob Pretorius | May 1, 2026