Modify a Process

A ‘process’ is a description of how to move data from one place to another. This page shows everything for a single process.

A number of options are available on the side bar. These include:

  • Start – execute this process (move data from source to destination).
  • Preview – similar to Start but data is not written to the destination and is instead available to preview after the process has run. This is useful to check what the data looks like before actually writing it to your destination system.
  • History – show the log information for previous executions of this process.
  • Edit – what you’re doing right now.
  • Rebuild – throw away the schema information and re-create it from the underlying source and destination objects, or from the data itself.
  • Delete – permanently delete this process.

There are a number of sections on the page:

Detail

Shows the name and description of the process. During process creation, the name of the source table or file is used to name the process, but it is often a good idea to give it a more descriptive name, perhaps explaining what data is being moved or where it’s going from and to.

From / To

Shows the data store where the data is to be read from. Depending on what type of data store this is, there may be other options available, for example, the name of the database table to read from, or the name of the Excel file where your data is stored. There may be other data store – specific fields here too. Some of the common ones are listed here:

Database Data Stores: allows you to specify the name of the object (table) or choose from the available database queries associated with the data store.

Excel Data Stores: allow you to specify where to find the column headers and data rows.

CSV Data Stores: are similar to Excel Data Stores, but additionally allow you to specify which characters to use as row delimiters, column delimiters and text qualifiers.

Cloud Data Stores: such as Cloudant, will often ask for information such as database name, username and password.

Keys

If your process is merging data into your destination data store (instead of overwriting – the default action), then you will need to specify one or more key columns so Conductor knows which rows to merge. For a key to work, the combination of data in each column selected must be unique across the entire table. For example, if you were merging a list of contacts you might use the email column as a key, because no two entries will have the same email address.

If you are only overwriting or appending, you don’t generally need to specify any keys if you don’t want to, although some data stores require at least one key column to be defined, such as SQL Azure. When you initially create the process, the keys will be automatically determined if possible, but you can change them here.

Schema

Here, you can see which columns will be used to read data and which columns these map to for writing data. These columns and their mappings are determined automatically when the Process is created, but you can change this here.

To edit a column’s name or data type, double-click it, edit its details in the pop-up window then click OK. From here, you can also delete a column entirely or just tell the system not to use it. It’s important to note that changing a column here does not change the source or destination object itself, only the definition used by the process.

When the process was created, we tried to map the source and destination columns as best as we could, but sometimes our best guess isn’t quite what you wanted. You can remap columns here, by single-clicking on the source column then single-clicking on the destination column it should map to. This will force a mapping between those columns and have another go at remapping the rest. If a column can’t be automatically mapped it will show in the list of unmapped columns at the bottom of this section – you can manually map these in the same way.

An indicator will appear beside each mapping showing whether the mapping is a good match or a poor match. Some fields will not automatically map to each other because they have incompatible data types, even if the name matches, so the next closest match will be selected, or the column will simply remain unmapped.

If you want to rebuild the process and re-create the schema mappings from scratch (either because the underlying objects have changed or because something else went wrong,) you can click the Rebuild button on the left side of the screen, which will throw away all schema mapping and recreate it from the underlying data store objects.

Options

The controls in this section change how the Process responds to problems encountered while rebuilding or running the Process.

Tolerate – if these conditions are encountered while running a process, you can choose to log them as a warning and carry on (checked), or stop the process (unchecked):

  • Truncation – data going into a field is bigger than the field can hold, some of it will get ‘cut-off’.
  • Ambiguity – confusion over which fields map together because more than one combination of fields is likely.
  • Uncertainty – the source column type is not completely compatible with the destination column type.
  • Unusual – a conversion from source to destination column is possible but is unusual, e.g.: time to integer.
  • Missing Source Columns – we expected more columns in the source data but they were missing.
  • Additional Source Columns – we found more columns in the source data than we expecting.

Destination Action – the options available will vary depending on your destination data store type, but can include:

  • Replace existing data – delete any existing data first then write in the new data.
  • Append to existing data – append the new data onto the end of any existing data.
  • Merge new and changed data (insert/update) – merge data where it matches or append if it doesn’t.
  • Merge all data (insert/update/delete) – fully synchronise the destination with the source, including deleting rows that don’t exist in the source. A full source dataset it required for this.

Column matching threshold – how strict we are when trying to determine which fields map to each other. A lower threshold gives more flexibility but could result in fields being mapped that should have remained unmapped. A higher threshold could result in unmapped fields that should have been mapped but weren’t close enough in name or type.

Anonymization

This lets you randomise your data, giving you secure, anonymous data to test with. For more on how it works or why you might want to use it.

Conductor can use your source data to randomly seed changes that make your data anonymous, so you can freely test it. Data structures are classified as names, addresses, or identification numbers and source data is used to seed the new values, so the changes are repeatable.

Rules

By adding one or more rules, you can filter the incoming data, rejecting anything that doesn’t match. If no rules are defined, all data is accepted.

Not what you were looking for?

Back Home

Related Topics

Edit a Process

New Process

Process Page

Process Group