Data Views
Introduction
Often times it is useful to get attributes across a large number of objects in the Flywheel hierarchy. One possible way of retrieving fields in such a manner would be looping over each level in the hierarchy, collecting the desired fields at each level. This is problematic because it is tedious to write and inefficient to execute.
A view is an abstraction that describes how to collect and aggregate data on a Flywheel instance. Views are defined as a set of columns to collect and are executed against a container (such as a project, subject or session). In addition, tabular or JSON file data may be aggregated from files, even files that are members of archives, or part of analysis output.
Data views provide an alternative way to efficiently aggregate data across the hierarchy in a tabular way, producing results like this:
project.label |
subject.age |
subject.sex |
acquisition.label |
trial_type |
response_time |
accuracy |
---|---|---|---|---|---|---|
color study |
1040688000 |
male |
task-redBlue |
go |
1.435 |
1 |
color study |
1040688000 |
male |
task-redBlue |
stop |
1.739 |
0 |
color study |
851472000 |
female |
task-redBlue |
go |
1.379 |
1 |
color study |
851472000 |
female |
task-redBlue |
stop |
1.534 |
1 |
Loading An Existing View
Views that already exist on flywheel can be accessed through the SDK using the get_view() command:
# Load a data view
view = fw.get_view(<view_ID>)
the view ID can be found in the URL on the UI when accessing a data view.
A list of data views available on the site can be found with the ???? command
Creating A View
Views can be created using the ViewBuilder
class, or the short-hand View()
method.
At a minimum, one column or file must be specified when creating a data view. A simple example would be to create a view that collects all of the built-in subject columns:
# Build a data view
view = fw.View(columns='subject')
Executing A View
A View object doesn’t become useful until you execute it against a container. View results can be loaded directly into memory, or saved to a file. If the pandas python package is available, you can also load view results directly into a DataFrame. Finally, you can also save the results of a view execution directly to a container as a file.
Load JSON View
import json
with fw.read_view_data(view, project_id) as resp:
data = json.load(resp)
Load pandas DataFrame
df = fw.read_view_dataframe(view, project_id)
Save DataFrame to Local CSV
fw.save_view_data(view, project_id, '/tmp/results.csv', format='csv')
Data Filtering
When executing a data view, you can control the data that is included in the result by passing the following named keywords to any of the functions above:
filter - a comma separated list of filters to apply to the data
skip - an amount of data to skip before returning results, helpful when paginating data
limit - an max amount of data to return, helpful when paginating data
Data Formats
The following output formats are supported:
json - The default format. Results will be returned as an array of objects, one per row.
json-row-column - A second json format where columns are separate from rows, and rows is an array of arrays.
json-flat - Similar to json, except that instead of an object, the rows are the top level array.
csv - Comma-separated values format
tsv - Tab-separated values format
Data format can be specified on any of the data view execution functions.
Columns
Data view columns are references to container fields in the form of <container>.<field>
.
A current list of pre-defined columns and groups of columns is available via the print_view_columns()
method.
For example:
project (group): All column aliases belonging to project
project.id (string): The project id
project.label (string): The project label
project.info (string): The freeform project metadata
subject (group): All column aliases belonging to subject
subject.id (string): The subject id
subject.label (string): The subject label or code
subject.firstname (string): The subject first name
subject.lastname (string): The subject last name
subject.age (int): The subject age, in seconds
subject.info (string): The freeform subject metadata
...
Adding the project
group column will result in project.id
and project.label
being added. Likewise adding the subject
group column
will result in the subject id
, label
, firstname
, lastname
, age
(and more) columns being added to the view.
Info Columns
The info
columns are unique in that they represent the unstructured metadata associated with a container. As such, they are not included in
the column groups, and behave a little bit differently. If the output data format is CSV or TSV, then a set of columns are extracted from the first
row encountered, which is generally the first object created. This may result in unexpected behavior if info fields are not uniform across each
object. It’s better in most cases to explicitly state which info fields you wish as columns: e.g. subject.info.IQ
.
Files
Rows can also be extracted from CSV, TSV and JSON files that are present on the Flywheel instance. This can be done with the view builder by
specifying which container
type to find files on and a filename
wildcard match. In addition, analysis files can be matched by specifying
analysis_label
, analysis_gear_name
and/or analysis_gear_version
.
For example:
# Read all columns from files named behavioral_results_*.csv on each session
view = fw.View(container='session', filename='behavioral_results*.csv')
# Read Mean_Diffusivity.csv results from the newest AFQ analyses on each session, and include session and subject labels
builder = flywheel.ViewBuilder(columns=['subject.label', 'session.label'], container='session', analysis_gear_name='afq', filename='Mean_Diffusivity.csv')
builder.file_match('newest')
builder.file_column('Left_Thalamic_Radiation', type='float')
builder.file_column('Right_Thalamic_Radiation', type='float')
view = builder.build()
Saving Views
View definitions can be saved to your user account, or any project you have access to. For example:
me = fw.get_current_user().id
subjects_view = fw.View(label='Subject Info', columns=['subject'])
view_id = fw.add_view(me, subjects_view)
Then you can execute the view any time against any container
df = fw.read_view_dataframe(view_id, project_id)