Xflow data logo on a black background.

Or how to provide meaningful data for tests while preparing a data migration.

Very often in multi-functional projects, such as an ERP implementation, different teams will evaluate their solution independently from each other until the later stages of testing.  And regularly, the data migration project works the same way, identifying and validating data transformation in parallel of the rest of the project.

The risk appears that the new solution and the new processes are tested with « positive » data, data that is meant to produce an expected result, and not with « negative » data, data that is not meant to produce a negative result but does cause one.  Indeed, with today’s integrated ERP solutions, different values entered in a functional area can negatively impact another for unforeseen reasons.

To reduce such risk and also to accelerate the design of the data transformation, it is useful to work with “real” data, and sample it in a smart way.  Real data is extracted from the operational databases, then transformed for the new data model as necessary and injected into the test environment.  Let us see this process in more details.

Sampling real data

There are 2 prerequisites.

  1. First prerequisite: you must of course know what data or kind of data is needed. Do we need products and customers to test an order management solution? Do we need bills of materials? Do we need raw materials, engineering documents, demand forecasts, historical data…?
  2. Second prerequisite: you must obtain agreement from all parties engaged. Once the scope is understood, we can start picking pieces of data but it is essential to explain the sampling process and obtain alignment about it. Indeed, every group will have different requirements, from high level to very detailed, and will demand data to satisfy all these requirements. In my experience, the less detailed the requirements are, the more data (in terms of volume) is asked. I call this the “just-in-case (I need more)” syndrome.

No alignment is a sure path to scope creeps and change requests.

To make a good sample, we must also bear in mind that this is a dynamic exercise. In my experience, most tests can start with a basic sample. But it needs to evolve, in terms of breadth and volume. We can work it up from basic and very few records to large with hundreds of thousands of records.

Building a basic sample

In most circumstances, the business people will know best.  No matter what the project is about, they know what the key business cases are and they know the data used for executing these cases.  For example, if we look at goods manufacturing: your business people will know what the different goods are, what makes them different from each other, how they are made or sold, etc.  Enabling you to propose a sample based on types or categories, or production methods, or sites, or trade channels, or transportation means, or accounting methods, or warehousing methods, etc.  In asset management, it might be the classes of assets, their depreciation rules, the different kinds of charges they cause, etc.

Again, such a basic sample will usually do fine for early test stages, where the need is higher for versatility of the data rather than volume.  But as testing progresses, more data is required, to repeat test scenarios with an increasing number of small variances and to provide data for training purposes.  For now, we have a basic sample and because of its small size, we can manipulate it manually, updating records one by one if needed, and transform it as required for the target solution.  Beware however to avoid handpicking the data.  You want to define a set of rules to select a sample of data, not to select records directly from a list or from a database.

Why should we prepare the initial sample with business people?  Because as we try to transform the data, we challenge, early on, data quality and transformation rules with data that our business people can relate to.  We are not working with artificial data, created from assumptions, but from actual, operational, current, information. And this helps create commitment from the people, working with their data and not with some data.  It also helps getting into the action before blueprinting fully completes.  Indeed, not all teams might complete their blueprinting (or matching or whatever you name it) at the same time and that, usually, does not mean that one may sit idle until everyone else catches up.  They will want to start testing, if only to validate some assumptions.  With this approach of making an initial sample with the business people, we have the key data elements and business cases covered.  We can leap into action, but without losing any effort for the main data migration.

To summarize, we enable a few things:

  • Commitment from business people;
  • Data for prototyping and early tests;
  • Data that is consistent for the scope of the project and not for a single team alone;
  • Early move into action.

What if you are challenged and some people want a lot more data or different data?  In my experience, challenging back with one short question does wonders: why?  Why is the initial sample not working for them?  Why do they need more or different data?  Very often, the additional data will anyway not be used: it is only just-in-case.  With such information in your hands, you may factually push back, or accept, depending on time and resources.

Profiling and business intelligence

Ideally, this exercise needs to happen early in the project, before or during the blueprinting phase. At least, it should be done immediately after creating an initial sample.

Roughly speaking, profiling is about identifying patterns in the data. It is often used with data quality in mind and it helps defining data quality rules and we easily stop there. If this analysis remain a cornerstone of the migration, the profiling itself can achieve more. Indeed, we can use the vast amount of information collected to understand also how the business is structured. Looking into the operational or transactional data, we can gather a few items:

  • Master data, such as customers, materials or assets
  • Transactional data, such as sales orders or bills of lading
  • Reference data, such as order types, plant codes or terms of payments

This data will help answer questions such as “what kind of orders are used by what sort of customers in which countries and from which warehouse are they supplied?”

We cannot get into more details here, as there are many particular business cases, but the logic is simple. With all this information, we have a proper grasp on the data and the business logic that unites it. We are ready to discuss the new samples.


We are now able to break the data down according to the business scenarios that we need to test. And we can still limit the data with additional filters like a period of time. For example: selecting the data used in the last 6 months of operations. Obviously, there will be scenarios that do not fit within such a window. But they will be easily identified by comparing the profiling with the data selected, and the related data can then be added to the sample.

Let’s take again our sales orders: we might have identified the types of orders used by the different customers, the products they typically purchase, the shipping conditions agreed, etc. To help us with sampling the data, we can also make rankings and select for example the top 10 customers in terms of volume of sales or in terms of volume of transactions. We can also identify the products making 80% of the sales, etc. By combining the identified business cases with these rankings, we can build a very strong database.

Supporting the project even further

With the gathering of this new set of information, we are now able to assist the general project with deeper insight into some business scenarios. We have an inventory of transactions and trends, in such a way to we could rank them but also in a way that we can pinpoint meaningful exceptions (a once a year order that amounts for a fair size of the turnover for example) and oddities. All of these facts can help improve the blueprint, by including overlooked cases, and removing noise.


I have shown the interest of building samples and executing data profiling for business intelligence early in the project. This new set of information can support the data migration and the general project with high efficiency in several ways:

  • Involve the business people early and constructively;
  • Heighten the understanding of the business for all parties involved;
  • Provide meaningful data for testing with manageable volumes.

Sometimes it is difficult to follow this approach. There are customers who will reject data profiling for fear of what it might reveal and people who will challenge the findings. For sure there is no “one size fits all”. In such situation, I try to educate my stakeholders with a few examples or to keep the results for the project team alone.

Picture of Jean-Francois Minsart
Jean-Francois Minsart
Jef is a veteran SAP data consultant with over 20 years experience. His career took him around the world to support initiatives as varied as mergers, systems integration, upgrades, software development and roll out, etc.​


Featured Post


Recent Post


Follow Us