March 4, 2023

Writing an Analytics Server using Vapor Part 6 - Retrieving Info About Users

Some people use analytics to track users, to get personal info about them, to sell their data. That's not what this post is about, and it's not what I'm about.

That said, there is some information I want from my analytics server related to users:

  • How many individual users have used my app?
  • In a given time period (day, month, year), how many users use my app?
  • How many users had the flag turned on when they used my app?
  • In a given time period, how many users perform each of the actions in UserEvent.Action?
  • How many UserEvents does the user who uses my app most make in a given time period?

For this reason, the UserEvent type has a userID that is sent on each request.

In order to efficiently access most of this data, I'll need to store users in a separate table in my databse and link it to my userevents table.

But for now, I can answer the first question pretty simply.

A New Controller

To find out how many users have used the app, I will need a new endpoint. I want to have a separate path for this endpoint, independent from the userevents path. So I will create a new controller.

I create a new module called UserController.swift.

I will need a constant to refer to the path for the new controller, so I add an extension to PathComponent:

extension PathComponent {
    static var users: PathComponent { PathComponent(stringLiteral: UserController.users) }
    static var count: PathComponent { PathComponent(stringLiteral: UserController.count) }
}

I add a new UserController struct with a static var to refer to its database scheme and the endpoints I expect to support:

struct UserController {
    static var users: String { #function }
    static var count: String { #function }
}

and I conform it to RouteCollection:

extension UserController: RouteCollection {
    func boot(routes: Vapor.RoutesBuilder) throws {}
}

Now in routes.swift, I register the new UserController:

func routes(_ app: Application) throws {
    
    // only use routes from the UserEventController
    try app.register(collection: UserEventController())
    try app.register(collection: UserController())
...

Listing Users

Let's start by adding an endpoint that lists all the users that have used the app. I start, as usual, with a test, in a new UserControllerTests.swift module:

@testable import App
import XCTVapor

final class UserControllerTests: XCTestCase {
    
    private var sut: Application!
    
    override func setUp() {
        sut = Application(.testing)
        try! configure(sut)
   }
    
    override func tearDown() {
        sut.shutdown()
    }
    
    // MARK: POST -  add UserEvent
    func test_get_list_returns_200() throws {
        try sut.test(.GET, UserController.users) { response in
            XCTAssertEqual(response.status, .ok)
        }
    }
}

and the test fails because I haven't set up the route yet.

In UserController.swift, I add a route:

extension UserController: RouteCollection {
    func boot(routes: Vapor.RoutesBuilder) throws {
        let getroutes = routes
            .grouped(.constant(Self.users))
        getroutes.get(use: list)
    }
    
    private func list(request: Request) async throws -> [String] {
        []
    }
}

and the test passes, though the endpoint is currently useless.

Retrieving Users that have Posted UserEvents

If a user has posted to the endpoint at userevents, then their userID is listed in the database (it's assumed that the client will create a userID. The server just accepts whatever the client sends). So as a first go, all I have to do is search for those UserEvents and return the userID for each.

I start with a test:

    func test_get_list_returns_all_users_that_have_used_app() throws {
                
        let userIDs = (0..<10).map { _ in UUID() }
        let expected = userIDs.map(\.uuidString)
        let sent = userIDs.map { UserEvent.random(for: $0, at: Date()) }
        try post(sent)
        
        try sut.test(.GET, UsersController.users) { response in
            let received = try JSONDecoder().decode([String].self, from: response.body)
            XCTAssertEqual(Set(received), Set(expected))
        }
    }

Note that this test uses some helpers that are very similar to the helpers used in UserEventControllerTests.

To make the test pass, I access the userevents table in the route:

    private func list(request: Request) async throws -> [String] {
        try await UserEventRecord.query(on: request.db)
            .all()
            .map(\.userID)
            .map(String.init)
    }

This seems to work, but what happens when the same user posts multiple UserEvents? A request to users should still only report the user once, right?

    func test_get_list_returns_each_user_that_has_used_app_once() throws {
        
        let user = UUID()
        
        let expected = [user.uuidString]
        let sent = (0..<10).map { _ in UserEvent.random(for: user, at: Date().addingTimeInterval(.random(in: 60..<3600))) }
        try post(sent)
        
        try sut.test(.GET, UserController.users) { response in
            let received = try JSONDecoder().decode([String].self, from: response.body)
            XCTAssertEqual(received, expected)
        }
    }

This test fails because received ends up holding the same UUID 10 times.

But QueryBuilder offers a unique method that will remove duplicate instances. All I have to do is tell it which key to compare against.

    private func list(request: Request) async throws -> [String] {
        try await UserEventRecord.query(on: request.db)
            .unique()
            .all(\.$userID)
            .map(String.init)
    }

Counting Users

This is only so useful, though. A UUID isn't that useful, and probably the only useful information I can get from this query is the number of users that have made a query. So why should I send all this data when the best that the client app could want is to count them.

So lets add another endpoint that just requests the number of users.

    func boot(routes: Vapor.RoutesBuilder) throws {
        let getroutes = routes
            .grouped(.constant(Self.users))
        getroutes.get(use: list)
        getroutes.get(.count, use: count)
    }
    
    private func count(request: Request) async throws -> Int {
        42
    }

and add a test:

    func test_get_count_returns_count_of_users_that_have_used_app() throws {
        
        let users = (0..<Int.random(in: 3..<20)).map { _ in UUID() }

        // send twice so that each user has 2 events in the database
        try post(users.map { UserEvent.random(for: $0, at: Date()) })
        try post(users.map { UserEvent.random(for: $0, at: Date()) })

        try sut.test(.GET, UserController.countPath) { response in
            let received = try JSONDecoder().decode(Int.self, from: response.body)
            XCTAssertEqual(received, users.count)
        }
    }

Now lets get the route to return the count instead of the entire list. Just count the response:

    private func count(request: Request) async throws -> Int {
        try await UserEventRecord.query(on: request.db)
            .unique()
            .all(\.$userID)
            .count
    }

But I have some pretty serious repetition in this code. I can clean it up a little if I factor out the query:

    private func userIDs(in database: Database) -> QueryBuilder<UserEventRecord> {
        UserEventRecord.query(on: database)
            .unique()
    }
    
    private func list(request: Request) async throws -> [String] {
  
        try await userIDs(in: request.db)
            .all(\.$userID)
            .map(\.uuidString)
    }
    
    private func count(request: Request) async throws -> Int {
        try await userIDs(in: request.db)
            .all(\.$userID)
            .count
    }

Adding Query Parameters to the Query.

I would like to be able to count how many users used a given action in a given time period. So I would like to add all the same kinds of filtering to my users endpoint as I have in my userevents/list endpoint. If I can do this cleanly, I'll gain all kinds of functionality for free from the code I've already written for UserEventController.

I'll add a test:

    func test_get_count_returns_count_of_users_that_have_used_app_and_sent_a_given_action() throws {
        
        let user1 = UUID()
        let user2 = UUID()
        let user3 = UUID()
        let events = [
            UserEvent(date: Date(), action: .start, userID: user1),
            UserEvent(date: Date(), action: .start, userID: user1),
            UserEvent(date: Date(), action: .start, userID: user2),
            UserEvent(date: Date(), action: .start, userID: user2),
            UserEvent(date: Date(), action: .stop, userID: user3),
            UserEvent(date: Date(), action: .stop, userID: user3),
        ]
        try post(events)
        
        try sut.test(.GET, countPath(action: .start)) { response in
            let received = try JSONDecoder().decode(Int.self, from: response.body)
            XCTAssertEqual(response.status, .ok)
            XCTAssertEqual(received, 2)
        }
    }

This ends up using a some very similar helper methods to the ones in UserEventControllerTests, so I won't repeat them here.

I then refactor the code for extracting a QueryBuilder from a request into a separate extension on UserEventRecord:

extension UserEventRecord {
    
    static func query(from request: Request) -> QueryBuilder<UserEventRecord>? {
        
        var query = UserEventRecord.query(on: request.db)
        
        var queryWasFound = false
        
        if let dateRange = try? request.query.decode(DateRangeQuery.self) {
            query = dateRange.filter(query)
            queryWasFound = true
        }
        if let actionQuery = try? request.query.decode(ActionQuery.self) {
            query = actionQuery.filter(query)
            queryWasFound = true
        }
        if let userIDQuery = try? request.query.decode(UserIDQuery.self) {
            query = userIDQuery.filter(query)
            queryWasFound = true
        }
        
        if let flagQuery = try? request.query.decode(FlagQuery.self),
           let q = flagQuery.filter(query) {
            query = q
            queryWasFound = true
        }

        if !queryWasFound && request.url.query?.isEmpty == false {
            return nil
        }

        return query
    }
    
}

And I rewite UserEventContoller.list() to use that method:

    func list(request: Request) async throws -> [UserEvent] {  
        guard let query = UserEventRecord.query(from: request) else {  throw Abort(.badRequest) }
        
        return try await query
            .all()
            .map(\.userEvent)
    }

An extra side-benefit is that it makes the route much more readable.

Just to be safe, I run my tests again and of course everything passes except the new test that I just added. I'll have to update UserController to to use the same method.

    private func query(from request: Request) throws -> QueryBuilder<UserEventRecord> {
        guard let query = UserEventRecord.query(from: request) else {
            throw Abort(.badRequest)
        }
        
        return query.unique()
    }
    
    private func list(request: Request) async throws -> [String] {
        try await query(from: request)
            .all(\.$userID)
            .map(\.uuidString)
    }
    
    private func count(request: Request) async throws -> Int {
        try await query(from: request)
            .count(\.$userID)
    }

And the test passes.

Assessing Where I'm At

It's time to look at the original list of things I wanted to get from the userID and see what's doable at this point.

How many individual users have used my app?

this was coevered by test_get_count_returns_count_of_users_that_have_used_app()

In a given time period (day, month, year), how many users use my app?

Let's write a test and see.

    func test_get_count_returns_count_of_all_users_that_used_app_in_date_range() throws {
                
        let now = Date()
            
        let users = [UUID(), UUID(), UUID()]
        
        let sent = users.flatMap { [
            UserEvent.random(for: $0, at: now.addingTimeInterval(-.oneDay)),
            UserEvent.random(for: $0, at: now),
            UserEvent.random(for: $0, at: now.addingTimeInterval(.oneDay))
        ]
        }
        
        try post(sent)
        
        let startOfDay = Calendar.current.startOfDay(for: now)
        let endOfDay = Calendar.current.startOfDay(for: now.addingTimeInterval(.oneDay))
        try sut.test(.GET, countPath(startDate: startOfDay, endDate: endOfDay)) { response in
            let received = try JSONDecoder().decode(Int.self, from: response.body)
            XCTAssertEqual(received, users.count)
        }
    }

How many users had the flag turned on when they used my app?

Let's write a test and see.

    func test_get_count_returns_count_of_all_users_that_have_had_flag_turned_on() throws {
                
        let now = Date()
            
        let users = (0..<30).map { _ in UUID() }
        
        let sent = users.map {
            UserEvent.random(for: $0, at: now)
        }
        let expected = sent.filter { $0.flag == true }.count

        try post(sent)
                
        try sut.test(.GET, countPath(flag: true)) { response in
            let received = try JSONDecoder().decode(Int.self, from: response.body)
            XCTAssertEqual(received, expected)
        }
    }

In a given time period, how many users perform each of the actions in UserEvent.Action?

Let's write a test and see.

    func test_get_count_returns_count_of_all_users_that_used_app_in_date_range_with_a_given_action() throws {
                
        let now = Date()
        let action = UserEvent.Action.pause

        let startOfDay = Calendar.current.startOfDay(for: now)
        let endOfDay = Calendar.current.startOfDay(for: now.addingTimeInterval(.oneDay))

        let users = (0..<30).map { _ in UUID() }
        
        let sent = users.flatMap { [
            UserEvent.random(for: $0, at: now.addingTimeInterval(-.oneDay)),
            UserEvent.random(for: $0, at: now),
            UserEvent.random(for: $0, at: now.addingTimeInterval(.oneDay))
        ]
        }
        
        let eventsWithActionOnDate = sent.filter { $0.action == action }
            .filter { $0.timestamp.value >= startOfDay }
            .filter { $0.timestamp.value <= endOfDay }
        let usersWhoSentAction = Set(eventsWithActionOnDate.map(\.userID))
        
        try post(sent)
        
        try sut.test(.GET, countPath(startDate: startOfDay, endDate: endOfDay, action: action)) { response in
            let received = try JSONDecoder().decode(Int.self, from: response.body)
            XCTAssertEqual(received, usersWhoSentAction.count)
        }
    }

How many UserEvents does the user who uses my app most make in a given time period?

We're almost there with this one. We could do it with what we have so far, but we'd have to:

  • call users/list to get all users by id
  • iterate over each user and call userevents/list, filtering by userID and date range
  • count the number of UserEvents in each response
  • find the largest number

This is a lot of network traffic for something that should be a pretty simple question.

What if instead I could call one endpoint that told us how many requests were made by each user in a given time period. I could even do the same filtering that I'm doing for userevents/lists and users/count.

Summarizing User Activity

I'll add one last route to UserController in order to get a summary of user activity for each user. It will receive the same query keys and return a dictionary matching userID to a count of UserEvents

I start by adding a summaryPath to UserController, then add a test:

    func test_get_summary_returns_200() throws {
        try sut.test(.GET, UserController.summaryPath) { response in
            XCTAssertEqual(response.status, .ok)
        }
    }

and add the route to UserController

    func boot(routes: Vapor.RoutesBuilder) throws {
        let getroutes = routes
            .grouped(.constant(Self.users))
        
        getroutes.get(use: list)
        getroutes.get(.count, use: count)
        getroutes.get(.summary, use: summarize)
    }
    private func summarize(request: Request) async throws -> [String:Int] { [:] }

once that's working, I add a test to make sure I get back the right data:

    func test_get_summary_returns_count_for_each_user_that_has_used_the_app() throws {
        
        let users = (0..<30).map { _ in UUID() }

        var counts = [String:Int]()
        for user in users {
            try (0..<Int.random(in: 0..<3)).forEach { _ in
                try post([UserEvent.random(for: user, at: Date().addingTimeInterval(.random(in: -1000 ... 1000)))])
                counts[user.uuidString] = counts[user.uuidString, default: 0] + 1
            }
        }
        
        try sut.test(.GET, summaryPath()) { response in
            let received = try JSONDecoder().decode([String:Int].self, from: response.body)
            XCTAssertEqual(received, counts)
        }
    }

and implement the search in UserController.summarize()

    private func summarize(request: Request) async throws -> [String:Int] {
        let query = try query(from: request)
        
        let events = try await query
            .all()
        
        let users = try await query
            .unique()
            .all(\.$userID)
            .map(\.uuidString)
        
        return users.toDictionary { user in
            events
                .filter { $0.userID.uuidString == user }
                .count
        }
    }

There are argumants to do this a few different ways. Since I can hope that the database query will be relatively quick, I choose to make two queries: one to retrieve all UserEventRecords that match the given criteria and one to retrieve all UserRecords that have posted events that match the given criteria. I then just combine these into a single dictionary.

The toDictionary() method is fairly self-explanatory, but for the record here it is:

extension Array {
    func toDictionary<Output>(_ transform: (Element)->Output)  -> [Element:Output] {
        var out = [Element:Output]()
        for element in self {
            out[element] = transform(element)
        }
        return out
    }
}

Now I still want to test to make sure that the users/summary endpoint will respect the request passed in, so I add one more test, this time only requesting a summary of users that have sent events that had their flag set to true:

    func test_get_summary_returns_count_for_each_user_that_has_used_the_app_and_passed_the_given_flag() throws {
        
        let users = (0..<30).map { _ in UUID() }
        let flag = true
        
        var counts = [String:Int]()
        for user in users {
            try (0..<Int.random(in: 0..<10)).forEach { _ in
                let event = UserEvent.random(for: user, at: Date().addingTimeInterval(.random(in: -1000 ... 1000)))
                try post([event])
                if event.flag == flag {
                    counts[user.uuidString] = counts[user.uuidString, default: 0] + 1
                }
            }
        }
        
        try sut.test(.GET, summaryPath(flag: flag)) { response in
            let received = try JSONDecoder().decode([String:Int].self, from: response.body)
            XCTAssertEqual(received, counts)
        }
    }

You would expect that it would pass, and it does.

Now back to that last requirement:

How many UserEvents does the user who uses my app most make in a given time period?

I can now retrieve a dictionary that associates each userID to the number of times that user made a given request:

    func test_get_summary_returns_count_for_each_user_that_has_used_the_app_in_the_given_date_range() throws {
        
        let users = (0..<30).map { _ in UUID() }
        
        let now = Date()
        let startOfDay = Calendar.current.startOfDay(for: now)
        let endOfDay = Calendar.current.startOfDay(for: now.addingTimeInterval(.oneDay))

        var counts = [String:Int]()
        for user in users {
            try (0..<Int.random(in: 0..<10)).forEach { _ in
                let event = UserEvent.random(for: user, at: Date().addingTimeInterval(.random(in: -.oneDay ... .oneDay)))
                try post([event])
                if event.timestamp.value >= startOfDay && event.timestamp.value <= endOfDay {
                    counts[user.uuidString] = counts[user.uuidString, default: 0] + 1
                }
            }
        }
        
        try sut.test(.GET, summaryPath(startDate: startOfDay, endDate: endOfDay)) { response in
            let received = try JSONDecoder().decode([String:Int].self, from: response.body)
            XCTAssertEqual(received, counts)
        }
    }

From there, I would just have to find the largest number in that dictionary.

Reconsidering Naming

I've noticed that UserController only ever sends back lists of users, and UserEventController has one endpoint that takes a single UserEvent, but its other endpoint sends back lists of UserEvents. So maybe it makes more sense to do some renaming and refactoring.

So I add a new module called UserEventsController and copy over the routes for listing events from UserEventController. I also split the tests up into two test modules. I won't go into the details, there are lots of namespace issues to sort out, but in the end I have a nice cleaner set of controllers.

I also decide to move the list endpoint from userevents/list to just userevents.

With these two changes, I have a much cleaner UserEventsController:

extension UserEventsController: RouteCollection {
    
    func boot(routes: Vapor.RoutesBuilder) throws {
        let getroutes = routes
            .grouped(.userevents)
        getroutes.get(use: list)
    }
  
    func list(request: Request) async throws -> [UserEvent] {
        guard let query = UserEventRecord.query(from: request) else {  throw Abort(.badRequest) }
        
        return try await query
            .all()
            .map(\.userEvent)
    }
}

And while we're here...

Really, I'll rarely want to retrieve individual UserEvent records, just like I'll rarely want to retrieve individual user records. So I can add an endpoint to UserEventsController that just returns the count of any request.

I'll skip the intermediate steps and just give the tests:

    func test_get_count_returns_200() throws {
        try sut.test(.GET, UserEventsController.countPath) { response in
            XCTAssertEqual(response.status, .ok)
        }
    }

    func test_get_count_returns_all_userevents_that_match_flag_requested_true() throws {
                
        let sent = (0..<Int.random(in: 3..<20)).map { _ in
            UserEvent.random(at: Date().addingTimeInterval(.random(in: 60...3600)))
        }
        
        let expected = sent.filter { $0.flag == true }
        
        try post(sent)
        
        try sut.test(.GET, countPath(flag: true)) { response in
            let received = try JSONDecoder().decode(Int.self, from: response.body)
            XCTAssertEqual(received, expected.count)
        }
    }

    func test_get_count_stress_test_send_all_query_keys() throws {
                
        let now = Date()
        let dateRange: ClosedRange<TimeInterval> = -.oneDay ... .oneDay
        
        let sent = (0..<300).map { _ in
            UserEvent.random(at: now.addingTimeInterval(.random(in: dateRange)))
        }

        let startOfDay = Calendar.current.startOfDay(for: now)
        let endOfDay = Calendar.current.startOfDay(for: now.addingTimeInterval(.oneDay))
        
        let happeningToday = sent.filter {
            $0.timestamp.value >= startOfDay &&
            $0.timestamp.value <= endOfDay
        }
        
        let expected = happeningToday.randomElement()!
        
        try post(sent)
        
        let path = countPath(startDate: startOfDay,
                            endDate: endOfDay,
                            userID: expected.userID,
                            action: expected.action,
                            flag: expected.flag)
        try sut.test(.GET, path) { response in
            let received = try JSONDecoder().decode(Int.self, from: response.body)
            XCTAssertEqual(received, 1)
        }
    }

and the new route:

    func count(request: Request) async throws -> Int {
        guard let query = UserEventRecord.query(from: request) else {  throw Abort(.badRequest) }
        
        return try await query
            .count()
    }

Summary

And with that I believe my analytics server returns all the info that I am looking for. I should now be able to write an app that can retrieve the analytics and display it to the user.

Posts in this Series:

Tagged with: