Skip to Main Content

In addition to our traditional Data Warehouse, we have also developed a Data Lakehouse to enhance our data platform. This was initiated as part of DTS's Digital Campus Roadmap, which is working to centralize university-related data for the benefit of stakeholders across campus. 

What is the Data Lakehouse?

The Data Lakehouse was built as a one-stop enterprise repository of university data that provides unique capabilities to combine information across domains. It is a foundational framework that applications, analytics platforms, and intelligent tools can use to access information in real-time. Its purpose is to serve as a shared analytics space that allows UCLA to shift from decentralized data siloes to managed and timely data sharing. It is an important component of our modern data platform as shown to the right.

EDA Data Platform Model

What are the benefits of the Data Lakehouse?

Integration

Combine diverse data to broaden insights. Robust integrations make information quickly and easily sharable and consumable.

Equitable Access

Create opportunities for data exploration and democratize data to the reaches of campus. Enable advanced analytics with a single source of enterprise data

Governance

Collaborate to define data access policies, standardize data definition, define data stewardship, and provide a clear understanding of appropriate data use

What security and Governance does the Data Lakehouse offer?

Isolation

  • Hosted on Virtual Private Cloud in AWS
  • Privatelink for communication with external entities

 

Encryption

  • S3 SSE-S3 encryption for data at rest. HTTPS encryption for data in flight. S3 policies for allowing traffic from secure channels.
  • Redshift Datawarehouse is using KMS keys.

 

Alerts Audits & Logs

  • Leveraging CloudTrail , config for infrastructure related audit trail.
  • CloudWatch for logging events from services and databases
  • Custom built CloudWatch dashboards for security
  • Slack and email integration for critical alerts 

 

Access Control

  • Users, group and roles management using IAM.
  • Shibboleth Multi Factor Authentication for identity federation.
  • Serverless components hosted in Private Subnets.
  • Private interfaces when dealing with SaaS vendors.
  • Following least privilege principle in security groups.

 

Authorization

  • Lake formation and Redshift for row-level and column level security.

 

Data Security

  • Multi AZ code deployments,
  • S3 bucket versioning, delete prevention, Archival policies
  • Oauth2 authentication for API access, Resource policies on APIs to allow traffic from known sources, Rate limits
  • Database snapshots,
  • Disaster Recovery account**: WIP

What data is currently in the Data Lakehouse?

The following data has been successfully curated and ingested into the Data Lakehouse:

UCPath

BruinCard

AQMD (Air Quality Survey)

UID

Personnel/Payroll Historical Data

Student Data

Wi-fi

Salesforce

Canvas (Bruin Learn)

Parking

Pay Station Parking

Kronos

How do I take advantage of the Data Lakehouse for my department's data analysis needs?

Please contact us at <enter names, email address and/or phone number here>