With PowerBI you have the ability to connect to multiple data sources that can then feed into your data model. In a school which uses Microsoft, it would not be uncommon that files are saved on SharePoint sites and made accessible to groups of people. Since they are in the cloud, connecting to them via PowerBI can be automated and updates can occur at regular intervals via the option to set a scheduled refresh.
The rest of this post goes through the steps to get a single Excel table from a SharePoint site into PowerBI. In the past, I’ve fumbled around and managed it but with Power Query automatically creating parameters and sample files; neither of which are needed.
I’m a big fan of using data flows to keep most of the data processing (in Power Query) and storage separate from the data model. It is much more flexible and potentially improves performance when working on the model in PowerBI Desktop (this is entirely hypothetical – I haven’t tested this!). The example below shows the steps of setting this up in a data flow, however most of the same steps would apply if you import the data directly to PowerBI Desktop.
Once you’ve mastered the steps below, it then becomes relatively easy to merge multiple Excel files that have the same structure (something I’ve done for assessment data) or to connect to the Excel file results from Microsoft Forms (as I’ve done for our pupil voice data).
To start with, within your workspace create a new Dataflow. This is essentially Power Query online and has most of the same functionality as is in PowerBI Desktop.
The next screen asks you to start creating your data flow. Here, you just need to select ‘add new entities’.
The next screen shows all of the different data sources that you can connect to. In this instance we want to find the SharePoint Folder. Don’t be tempted to click the Excel icon!
Once you’ve done that, you need to enter your SharePoint site URL. It’s important that you this is the top level of the site, rather than any sub-folders that are contained within. For example, if your file is found at http://mysite.sharepoint.com/teams/teaching/assessment/year-9, then the address you need to put in the site URL is http://mysite.sharepoint.com/teams/teaching/.
You’ll be asked to sign in to the SharePoint site and once this is done click next (4).
Assuming you’ve entered the SharePoint site correctly, you’ll be presented with a list containing all the files in that SharePoint site. You don’t need to do anything here other than click, transform data in the bottom right.
We’re now into the Power Query interface that enables us to filter and search for particular files, file types, folders and so on.
In this instance, because I am importing one file that will never have a file of the same name in the SharePoint site, I’ll just search for the name of the file and select it.
You’ll now only have one row showing. We can go a step further now and remove all columns except the ‘Content’ column. To do this, right hand click on the Content column and click ‘Remove Other Columns’.
Now add a custom column from the Add column menu in the ribbon. Name the column whatever you want, but add the following formula: Excel.Workbook([Content])
You now have an extra column with the two ‘expand arrows’ next to the column name. Click on them, deselect ‘Use original column name as prefix’ then click OK.
After clicking OK, you be presented with this which shows the content of the Excel file. It will identify every sheet and defined table that exists within the workbook. In this case, I have a table defined (named Table1) that I want to use so I’ll filter to only show the tables and not the sheets.
We can now remove more columns. At this point, all that is needed is the data column. So right hand click on that and then click ‘remove other columns’.
We’re nearly there! Expand the data table, select the columns you want to import, deselect ‘use original column name as prefix’ and click OK.
Finally, we get the contents of the Excel file inside Power Query. The next step is to change the data types for each column, followed by any other processing in Power Query. Name the query, then click ‘Save & close’.
The dataflow will run a validation process to check for errors and assuming all is good you will be presented with the option to refresh now or set a scheduled refresh. I like to keep everything automated to minimise the maintenance I need to do to keep the system ticking over and so usually choose to set a refresh schedule.
The frequency that you choose for the data to be updated really depends upon how often it the data changes. But it isn’t a big performance drag for me to to run these updates on PowerBI Service so I’ll have regular updates even if the data doesn’t change that frequently.
And that’s it! The data in that particular Excel file will now feed into my dataflow several times a day automatically. From here, the next step would be to connect the dataflow to your data model and create the relevant relationships between the imported table and the rest of the model.