Using PowerBI to create a dynamic mark book (2)

The first post on this topic (Using PowerBI to create a dynamic mark book) outlined why I have created the ability for our teachers to add their assessment data to the PowerBI app, but also what the report pages look like once that data is in and connected with the rest of the model.

This post is more of a how-to and is focused on combining data from multiple sheets in multiple Excel files that are stored on a SharePoint site. There are a few more stages beyond what is in this post which I’ll go through in the final post on this topic. However, this will get you 90% of the way towards getting your data in the right shape for creating a dynamic mark book.

Step 1 is to export your data from your MIS in the correct format. Our IT gurus have setup a custom report in ISAMS that pulls each teachers classes with the other custom columns required. There is one Excel file per teacher and each class has a separate sheet. These are all saved in folders in our SharePoint site that is easily accessible from within Teams.

To access the assessment data input files, teachers navigate to the document library for our SharePoint site (1), then their department (2).

Within each department folder are files for each teacher. With the help of some admin support, we downloaded each teachers file, saved them in the correct folder with the Initials Subject format.

Each teacher has their own file to add assessment data.

The content of the file looks a little like the screenshot below. The columns in the first red box (1) are pre-populated in the custom report. The other column headings are created in the custom report, but contain no data – the assessment info must be added to these columns by the teacher.

It is important to note that the structure of the sheet cannot be changed by teachers. For example, they cannot reorder columns, add new columns or change the name of columns. This is the structure that everyone must work with.

The first assessment shown here has the name ‘Causes of climate change starter quiz’. To add a new assessment the rows must be copied and pasted again below the previous assessment as shown in (2). The particular assessment details are then changed. Further assessments are added by copying and pasting the rows down the page.

The columns which I have highlighted in yellow must be completed for each assessment. As mentioned the first 6 columns shown in (1) are pre-populated so nothing needs changing here. The assessment date must be added in dd/mm/yyyy format, then the assessment name, marks scored and marks available columns also need completing. Other columns provide more information about the assessment allowing more thorough analysis but they are not required for the system to work.

The sheet names at the bottom of the page (3) are named as Sheet1, Sheet2 etc upon the original download, however the user can change them as they like. If a user wants to add a sheet that won’t be imported into the data model then can create a new sheet and prefix the name with a ‘-‘. Some departments have used this to add assessment details that are common across a department or assessment percentage to Wellington grade lookup tables that can be used in vlookups.

An example assessment data input spreadsheet for one class. Columns include StudentID, StaffID, Subject, Set, Fullname, Teacher Initials, Topic, Sub-topic, High / low stakes, Assessment type, Assessment date, Assessment name, Marks scored, Marks available, Assessment %, Wellington grade, Late (L), Incomplete (I), Redo (R) and Comment.

The next stage is to setup a dataflow that extracts the data from all sheets (except those starting with ‘-‘) in all Excel files in the Assessment data folder in the WCCSI Senior School SharePoint site. I have outlined the first stages of this in a previous post – How to connect PowerBI to Excel workbooks in a SharePoint site. However, in this instance, rather than filtering on filename to select the relevant files, we want to filter to show all files in a particular folder path. To do so, go to the Folder Path column and choose Text Filters > Begins With. Enter the SharePoint address for the assessment data folder (and not any sub-folders within it). You should now be able to see all the files within the folder. At this stage, I also filter by extension so that I only include Excel files.

Filtering the Folder Path in PowerQuery to include all files within a particular folder.

From here, follow the instructions in the other post until you have the columns Name, Data, Item, Kind and Hidden.

At this stage, we need to jump back in and make a few other tweaks. Firstly, we want to remove any sheets that start with ‘-‘. To do so, click on the down arrow by the column Name (or Name.1 in this instance), go to filters and then choose does not begin with.

Removing sheets that start with a particular prefix enables any sheet name to be imported, except those that start with the prefix.
Using the filter panel to remove sheets that start with a ‘-‘.

It’s also necessary to keep only the sheets and not any other kind of data in the Excel files. To do this, filter the Kind column to keep only the sheets.

Filtering out any other type of data except worksheets.

We can now continue by removing all the columns except the Data column which we can expand by clicking on the two arrows icon next to (1) below.

Having filtered to just keep the relevant worksheets of the Excel files in the designated SharePoint folder, we can now extract the data from each of those sheets.

It isn’t necessary to use the original column name as prefix so I usually always deselect it.

Extracting the data from each Excel sheet.

Having extracted the data, we now have the contents of every sheet (except those starting with ‘-‘) in the folder. Earlier I mentioned that the column order cannot be changed by the user. If they do change the column order then the data input will not work. For example, if a teacher deletes the assessment type column, then the assessment date column will move to the left and not align with the assessment date column in other sheets. As such, when the data types are set, this will produce an error and the rows will be removed.

There is still work to do. We have not got the correct column names in the header (1), column names are repeated (2) and there are blank rows (3). The following screenshots and description outline how to manage this.

Data from the worksheets is now extracted, but there is still tidying up to do.

The first step is simply to promote the first row to use a headers. Click the table icon to the left of the first column header and then select use first row as headers.

Using the first row as headers.

We’ll now remove null rows (2), rows that are blank (3) and rows that contain a repeat of the column heading (4) by filtering the StudentID column (1). Notice how I’ve toggled to the advanced settings so that multiple clauses can be added.

Filtering on the StudentID column to remove rows containing blanks, nulls and the text ‘StudentID’.

Since we extracted the data from each sheet manually, we also need to determine the data types for each column (PowerQuery does not do it automatically using this method). To do so, click on the icon next to the column name (1) and then select the relevant data type (2).

Selecting the relevant data type for each column.

It’s also important to remove any rows that throw up any errors upon setting the data types. In my setup, I remove errors in the assessment data, marks scored and marks available column. Essentially, if the assessment date (1) is not entered in dd/mm/yyyy format, or any numerical fields (2 & 3) are not entered as a number then the rows are removed and the data will not be added.

Removing any rows that do not follow the correct data types.

The final transformation I make in this part of the process is to remove the assessment % column that is extracted from the sheets, and add a custom column which calculates the percentage (marks scored divided by marks available). This minimises the risk of data entry errors and makes it quicker for staff to add data to the spreadsheet. To add the new assessment % column, click on add column (1), then highlight the marks scored column (2), before clicking standard and divide (3). Name the new column Assessment % and change the data type to %.

Calculating the percentage from two columns and adding it as a new column in PowerQuery.

Our data is now looking pretty good. Multiple sheets (with different names but the same structure), from multiple Excel files in a SharePoint folder are now all combined using PowerQuery. We can check that it has worked by profiling the columns. Go to View (1), then Data view (2), tick the options (3) and see the data (4). Looking at the set column, we have 34 distinct sets which equates to 34 different worksheets imported in the first 1000 rows (1 worksheet = 1 set = 1 class; profiling only works for the first 1000 rows). Therefore, we can be sure that the process has worked and we now have the data from multiple sheets in multiple workbooks.

The next post (in the next week or so) will look at how I manage the transition from one academic year to another without losing the data, ensure consistency with naming conventions (e.g. High stakes vs. High vs. high) and create a compound key that enables this table to be linked to the dimension tables in my data model.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s