Walking through the data models in Power BI for better data visualization
![]() |
Data models in Power BI |
The goal of this learning path is to cover the different options available when building data models in Power BI. The purpose of modeling data is to ensure that data can be accurately and meaningfully represented in visualizations. And we will start off by covering various aspects around data modeling in Power BI. At a high level, we can say that data models in effect determine how data is represented in the first place. This can start off from the very basics, like setting the types for different columns in tables and also the formatting for each of those columns, which has an effect on how that data is represented when it's used in visualizations.
What is Data Modeling in Power BI?
Now, throughout this learning path, we are going to explore various ways in which data can be modeled. And the reason for covering such a breadth of techniques is because data models can have a huge bearing on the accuracy of data which is conveyed in visualizations and reports. For instance, if you need to calculate the total expenses of various departments in an organization, you will need to correctly calculate the sum of all expenditures across all of the departments. Furthermore, modeling data also has a huge impact on the overall performance of visualizations. For instance, if you model the data correctly, this could mean that your charts load quicker and interactions with the charts are also smooth. Ensuring a high quality user experience. And, of course, there are various intricate details when it comes to optimizations, a lot of which will be covered in this learning path. So from the importance of data models, let's now dive a little deeper and see what specifically modeling can involve.
Well, one step will be to load the data which is required. While loading the data may fall under the purview of preparing data for analysis, the operative phrase here is data that is needed. When it comes to modeling, it is important that only the required data is selected and loaded into Power BI. Since smaller datasets can lead to improved performance. Furthermore, it is also important to set the right data types for each of the fields. So once we load data into columns, we should make sure that each column is mapped to the correct data type, for example, if a specific column contains dates, we should model it as a date, which will allow us to perform a lot of time related analysis using that column.
Furthermore, the data which we load into Power BI in its raw form may not quite be sufficient when it comes to performing analysis. For instance, we may have a lot of order information which includes the quantity of certain products in the order along with their prices, and we will need to derive the value of each order by multiplying the price and quantities. So, this is also part of data modeling.
As is the setting of relationships between different tables which contain related data. Very often when working with Power BI, you may have related data split across multiple tables, and if you're familiar with relational databases, you will know that this is often done in order to optimize the space utilization of the data and avoid duplication. When data is split into tables, however, we need some way to combine them together when required, and this is where relationships come into the picture. So having summarized each of these steps, let's dive into each of them in a little more detail. Starting with the loading of data into Power BI.
How data modelling is done?
So first of all, we should have a good idea of what is required to be presented in our reports and based on that we need to ensure that all the data which we need is loaded. Ideally, the data will be in a form where we can use it as is, but in some cases, we may need to derive additional fields from the existing data. An easy trap to fall into when loading data, is bringing in a lot more than what we really need. While at first, this really won't seem to have any impact on either overall performance or the building of visualizations. Overtime having just too many fields can lead to a lot of clutter and picking the right field for analysis becomes a little more painstaking. Beyond that, when loading data in import mode, loading more than its required will use up resources unnecessarily.
Even if we're not working in import mode and we have set the storage model to DirectQuery, we do need to consider that getting more data from the underlying data source will mean greater network traffic, and because of that, create a network latency. So, all of these are important factors to consider when loading data into Power BI. Another step which we looked at was setting the right data types for each column.
At a very basic level, any field which contains numeric data must be modeled as some numeric field, whether a decimal number or a whole number, because this is what we will probably need to perform some aggregation on. So, for example, we cannot really perform an average calculation on text. Furthermore, the column types are also important when it comes to establishing relationships. Such relationships can only be set up when there happens to be at least one common column between two different tables, and when I say common, their values as well as types must correspond. Let's move on then to deriving data. So this is where we may need to create calculated columns within our tables by performing operations on other existing columns.
So the example I cited a little earlier is where we have sales data in a table which includes the quantity of a product ordered along with its price. And then we simply need to multiply those two columns to get the total sale amount. One type of data derivation which we will make extensive use of in the demos of this learning path, is the creation of measures using DAX.
While calculated columns give us a value for each row in a table, a DAX measure may perform a row by row calculation followed by some kind of aggregation. And speaking of aggregation, this can also be an important operation when it comes to deriving data, and another important operation is filtering data. In fact, we can create measures using DAX which perform some kind of filtering and also some aggregation. For instance, we may set a filter for all of the sales recorded in the region of North Africa and then aggregate them all together to get the total sales in North Africa.
Moving along then to defining relationships between different tables. When it comes to modeling data, the first thing which often comes to mind is relationships. If you have any background in relational databases, you will know that data is often split across multiple tables. And this is done to avoid duplication of data and to improve performance. So in analyzing bank information, we may have one table containing customer data and another one containing the details of their bank accounts. And in order to set up a relationship, they need to have some common column, and this also has to be of the same type as discussed.
When it comes to setting up relationships, it's not just the common column we need to worry about, we also need to ensure that the cardinality of the relationship is done correctly. So, if one customer can have multiple bank accounts, then the relationship should be set as 1:many or if an account then belong to multiple customers, let's just say for a joint account, and a single customer can have multiple accounts in the same bank. The relationship is better represented as many:many. Then there is also the direction of filtering, so this is known as cross filtering in the context of Power BI, and this can be set in either a single direction or both. So sticking with the examples of customers and bank accounts, should we be able to filter bank accounts based on customers or customers based on bank accounts? Later on, in the demos of this learning path, we will see that this can have a bearing on some aggregations. So, getting the relationships right in a data model is essential in order to portray accurate information in our Power BI reports, and also to ensure that they perform in an optimal manner. So with all of this set when it comes to data modeling.
Some Important tools for data modelling:
What are some of the tools which we can use in order to help us model our data correctly? Well, one of these is a date table. When it comes to data analysis, time is one of the crucial dimensions along which analysis can be performed. And, of course, time can be expressed in various degrees of granularity. For example, if you're analyzing sales data, we may want to do this over a period of a year, maybe even a quarter or a month. So when we look at a date, we should not just look at it as a single value, but as a number of different values which can be arranged in some hierarchical form. And this is where Power BI's date tables enter the picture. So in a date table, we cannot just express a date using its various dimensions, such as year, quarter, and month. We can also format each of those dimensions in a manner which is suitable for our visualizations. For example, we may represent the month of February using the text February and also the number 2. Now a date table can be treated as any Power BI table and related to other tables.
When it comes to the different dimensions of a date, well, we can enlist the help of the DAX formula language in order to design the table. This allows us to extract specific bits of information from a date, such as the day of the week and can also help in the formatting of this information. And to get an idea, here is an example of a date table. So there are four different columns in here. The first column points to the Date in the year, month, and day format. You'll see that the Month column represents a month as a three letter string, and then we can have separate columns for the Quarter and the Day. Now, the first column in here can be used to join the date table with any other table containing a Date field. For example, the purchase date for a product. Let's move along then, to another family of techniques when it comes to data modeling in Power BI. So in here, let's consider the fact that there is some data which is inherently hierarchical in nature.
Example:
For example, if we have geographical data such as the country, state, and city in a particular table, these can be arranged in a hierarchical structure within Power BI rather than treated as three entirely independent string columns. And the reason why this can be beneficial is because Power BI allows us to use such hierarchical data within visualizations. For example, to zoom in from a country level down to the state level and further down to a city level. As we move up and down the hierarchies, we will be able to see aggregated values across each level, so sales at a country level, at a state level, and so on. And beyond that, related data can also be cast together into a single unit. Another important consideration when modeling data is establishing row-level security. This determines the degree of access a user has in the Power BI service. And this can be done because each user can be cast into groups, and in the demos, we will see that row-level security can be set up to restrict users to just their own data.
Comments
Post a Comment