Best Practice For Managing Data Set

 General Best Practice


  • Make sure that every data set you create or import has a name and description with specific details about what dataset contains.
  • Don't add date range in data set name. This practice will also help to avoid having to rename the data set.
  • For data set description, write a basic description of what data is being pulled
  • Always designate an owner of dataset. This person should be responsible for that dataset and owner receives the alert if data set breaks.
  • Add the data input and output for the data flow. so future user can track down information relevant to the dataset or dataflow. In addition, include the frequency of data flow like automated, schedule.
  • Add Calc if data flow have calculated field.
  • Data set should be name using following template  Type_Clientinfo_source_Reportname

  • Recommended name prefix (Type) and what each means
  1. Raw_ :- Raw data file that is directly pulled from a source
  2. INT_ :- Intermediary data set for data flow
  3. Dev_ :- Data is being Audited. Change to Prod when audited
  4. Prod_ :- Production used for final data set. These are the data set you can builds card on
  5. Temp_ :-Used for test, development and ad-hoc data set these should be periodically audited

  • Recommended Naming Suffix

  1. Calc: Calculated field that have been added

Data Governance

Company wants to create an extensive governance model, you have the option of adding comment to a beast mode calculation for a data set, These comments, which are essentially meta data within beast mode calculation itself, can identify the author of the beast mode calculation and create a date and description of the beast mode calculation.

/*
Author
created date
 description
*/

If we are doing test connection for new data set always set it to update manually instead of trying to set up automatic feed.

Audit data set on regular basis to make sure that there aren't redundant data sets, Data set that have zero cards, and Dataset with zero data flows connected to them.

The major domo can audit the card by sorting y number of cards. When auditing unused data flows is a bit trickier to distinguish, but if you see it hasn't run in 1-2 month, That's indicator is that there is no schedule on it and you may be able to delete it out, or investigate with the owner what's going on it.

Have a process in place of user to:

  • Upload the data
  • Validate that data is correct, either in workbench or another tool.
  • Validate the data again in domo to make sure your number are projecting as expected
  • Go through every  step of the process to ensure your data is correct and you can build card from it
  • After the card is built, have your data owner validate the card.
Have those who audit data always check error and check run failures, Make sure these person have credential and access to account if they are not the credential owners.

It is good practice to audit the data center once a month and make sure all the important data flow and data set are running. Also confirm all the credential are working and reauthenticate any of the ones that are out of date.