Truly huge files and the problem of continuous virtual address space

As we all know does mmap, or even worse on Windows CreateFileMapping, need contiguous virtual address space for a given mapping size. That can become a problem when you want to load a file of a gigabyte with mmap.

The solution is of course to mmap the big file using multiple mappings. For example like adapting yesterday’s demo this way:

void FileModel::setFileName(const QString &fileName)
{
    ...
    if (m_file->open(QIODevice::ReadOnly)) {
        if (m_file->size() > MAX_MAP_SIZE) {
            m_mapSize = MAX_MAP_SIZE;
            m_file_maps.resize(1 + m_file->size() / MAX_MAP_SIZE, nullptr);
        } else {
            m_mapSize = static_cast(m_file->size());
            m_file_maps.resize(1, nullptr);
        }
        ...
    } else {
        m_index->open(QFile::ReadOnly);
        m_rowCount = m_index->size() / 4;
    }
    m_file_maps[0] = m_file->map(0, m_mapSize, QFileDevice::NoOptions);
    qDebug() << "Done loading " << m_rowCount << " lines";
    map_index = m_index->map(0, m_index->size(), QFileDevice::NoOptions);

    beginResetModel();
    endResetModel();
    emit fileNameChanged();
}

And in the data() function:

QVariant FileModel::data( const QModelIndex& index, int role ) const
{
    QVariant ret;
    ...
    quint32 mapIndex = pos_i / MAX_MAP_SIZE;
    quint32 map_pos_i = pos_i % MAX_MAP_SIZE;
    quint32 map_end_i = end_i % MAX_MAP_SIZE;
    uchar* map_file = m_file_maps[mapIndex];
    if (map_file == nullptr)
        map_file = m_file_maps[mapIndex] = m_file->map(mapIndex * m_mapSize, m_mapSize, QFileDevice::NoOptions);
    position = m_file_maps[mapIndex] + map_pos_i;
    if (position) {
            const int length = static_cast(end_i - pos_i);
            char *buffer = (char*) alloca(length+1);
            if (map_end_i >= map_pos_i)
                strncpy (buffer, (char*) position, length);
            else {
                const uchar *position2 = m_file_maps[mapIndex+1];
                if (position2 == nullptr) {
                    position2 = m_file_maps[mapIndex+1] = m_file->map((mapIndex+1) *
                         m_mapSize, m_mapSize, QFileDevice::NoOptions);
                }
                strncpy (buffer, (char*) position, MAX_MAP_SIZE - map_pos_i);
                strncpy (buffer + (MAX_MAP_SIZE - map_pos_i), (char*) position2, map_end_i);
            }
            buffer[length] = 0;
            ret = QVariant(QString(buffer));
        }
    }
    return ret;
}

You could also not use mmap for the very big source text file and use m_file.seek(map_pos_i) and m_file.read(buffer, length). The most important mapping is of course the index one, as the reading of the individual lines can also be done fast enough with normal read() calls (as long as you don’t have to do it for each and every line of the very big file and as long as you know in a O(1) way where the QAbstractListModel’s index.row()’s data is).

But you already knew that. Right?

Loading truly truly huge text files with a QAbstractListModel

Sometimes people want to do crazy stuff like loading a gigabyte sized plain text file into a Qt view that can handle QAbstractListModel. Like for example a QML ListView. You know, the kind of files you generate with this commando:

base64 /dev/urandom | head -c 100000000 > /tmp/file.txt

But, how do they do it?

FileModel.h

So we will make a custom QAbstractListModel. Its private member fields I will explain later:

#ifndef FILEMODEL_H
#define FILEMODEL_H

#include <QObject>
#include <QVariant>
#include <QAbstractListModel>
#include <QFile>

class FileModel: public QAbstractListModel {
    Q_OBJECT

    Q_PROPERTY(QString fileName READ fileName WRITE setFileName NOTIFY fileNameChanged )
public:
    explicit FileModel( QObject* a_parent = nullptr );
    virtual ~FileModel();

    int columnCount(const QModelIndex &parent) const;
    int rowCount( const QModelIndex& parent =  QModelIndex() ) const Q_DECL_OVERRIDE;
    QVariant data( const QModelIndex& index, int role = Qt::DisplayRole ) const  Q_DECL_OVERRIDE;
    QVariant headerData( int section, Qt::Orientation orientation,
                         int role = Qt::DisplayRole ) const  Q_DECL_OVERRIDE;
    void setFileName(const QString &fileName);
    QString fileName () const
        { return m_file->fileName(); }
signals:
    void fileNameChanged();
private:
    QFile *m_file, *m_index;
    uchar *map_file;
    uchar *map_index;
    int m_rowCount;
    void clear();
};

#endif// FILEMODEL_H

FileModel.cpp

We will basically scan the very big source text file for newline characters. We’ll write the offsets of those to a file suffixed with “.mmap”. We’ll use that new file as a sort of “partition table” for the very big source text file, in the data() function of QAbstractListModel. But instead of sectors and files, it points to newlines.

The reason why the scanner itself isn’t using the mmap’s address space is because apparently reading blocks of 4kb is faster than reading each and every byte from the mmap in search of \n characters. Or at least on my hardware it was.

You should probably do the scanning in small qEventLoop iterations (make sure to use nonblocking reads, then) or in a thread, as your very big source text file can be on a unreliable or slow I/O device. Plus it’s very big, else you wouldn’t be doing this (please promise me to just read the entire text file in memory unless it’s hundreds of megabytes in size: don’t micro optimize your silly homework notepad.exe clone).

Note that this is demo code with a lot of bugs like not checking for \r and god knows what memory leaks and stuff was remaining when it suddenly worked. I leave it to the reader to improve this. An example is that you should check for validity of the “.mmap” file: your very big source text file might have changed since the newline partition table was made.

Knowing that I’ll soon find this all over the place without any of its bugs fixed, here it comes ..

#include "FileModel.h"

#include <QDebug>

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>

FileModel::FileModel( QObject* a_parent )
    : QAbstractListModel( a_parent )
    , m_file (nullptr)
    , m_index(nullptr)
    , m_rowCount ( 0 ) { }

FileModel::~FileModel() { clear(); }

void FileModel::clear()
{
    if (m_file) {
        if (m_file->isOpen() && map_file != nullptr)
            m_file->unmap(map_file);
        delete m_file;
    }
    if (m_index) {
        if (m_index->isOpen() && map_index != nullptr)
            m_index->unmap(map_index);
        delete m_index;
    }
}

void FileModel::setFileName(const QString &fileName)
{
   clear();
   m_rowCount = 0;
   m_file = new QFile(fileName);
   int cur = 0;
   m_index = new QFile(m_file->fileName() + ".mmap");
   if (m_file->open(QIODevice::ReadOnly)) {
       if (!m_index->exists()) {
           char rbuffer[4096];
           m_index->open(QIODevice::WriteOnly);
           char nulbuffer[4];
           int idxnul = 0;
           memset( nulbuffer +0, idxnul >> 24 & 0xff, 1 );
           memset( nulbuffer +1, idxnul >> 16 & 0xff, 1 );
           memset( nulbuffer +2, idxnul >>  8 & 0xff, 1 );
           memset( nulbuffer +3, idxnul >>  0 & 0xff, 1 );
           m_index->write( nulbuffer, sizeof(quint32));
           qDebug() << "Indexing to" << m_index->fileName();
           while (!m_file->atEnd()) {
               int in = m_file->read(rbuffer, 4096);
               if (in == -1)
                   break;
               char *newline = (char*) 1;
               char *last = rbuffer;
               while (newline != 0) {
                   newline = strchr ( last, '\n');
                   if (newline != 0) {
                     char buffer[4];
                     int idx = cur + (newline - rbuffer);
                     memset( buffer +0, idx >> 24 & 0xff, 1 );
                     memset( buffer +1, idx >> 16 & 0xff, 1 );
                     memset( buffer +2, idx >>  8 & 0xff, 1 );
                     memset( buffer +3, idx >>  0 & 0xff, 1 );
                     m_index->write( buffer, sizeof(quint32));
                     m_rowCount++;
                     last = newline + 1;
                  }
               }
               cur += in;
           }
           m_index->close();
           m_index->open(QFile::ReadOnly);
           qDebug() << "done";
       } else {
           m_index->open(QFile::ReadOnly);
           m_rowCount = m_index->size() / 4;
       }
       map_file= m_file->map(0, m_file->size(), QFileDevice::NoOptions);
       qDebug() << "Done loading " << m_rowCount << " lines";
       map_index = m_index->map(0, m_index->size(), QFileDevice::NoOptions);
   }
   beginResetModel();
   endResetModel();
   emit fileNameChanged();
}

static quint32
read_uint32 (const quint8 *data)
{
    return data[0] << 24 |
           data[1] << 16 |
           data[2] << 8 |
           data[3];
}

int FileModel::rowCount( const QModelIndex& parent ) const
{
    Q_UNUSED( parent );
    return m_rowCount;
}

int FileModel::columnCount(const QModelIndex &parent) const
{
    Q_UNUSED( parent );
    return 1;
}

QVariant FileModel::data( const QModelIndex& index, int role ) const
{
    if( !index.isValid() )
        return QVariant();
    if (role == Qt::DisplayRole) {
        QVariant ret;
        quint32 pos_i = read_uint32(map_index + ( 4 * index.row() ) );
        quint32 end_i;
        if ( index.row() == m_rowCount-1 )
            end_i = m_file->size();
        else
            end_i = read_uint32(map_index + ( 4 * (index.row()+1) ) );
        uchar *position;
        position = map_file +  pos_i;
        uchar *end = map_file + end_i;
        int length = end - position;
        char *buffer = (char*) alloca(length +1);
        memset (buffer, 0, length+1);
        strncpy (buffer, (char*) position, length);
        ret = QVariant(QString(buffer));
        return ret;
    }
    return QVariant();
}

QVariant FileModel::headerData( int section, Qt::Orientation orientation, int role ) const
{
    Q_UNUSED(section);
    Q_UNUSED(orientation);
    if (role != Qt::DisplayRole)
           return QVariant();
    return QString("header");
}

main.cpp

#include <QGuiApplication>
#include <QQmlApplicationEngine>
#include <QtQml>// qmlRegisterType

#include "FileModel.h"

int main(int argc, char *argv[])
{
    QGuiApplication app(argc, argv);
    qmlRegisterType<FileModel>( "FileModel", 1, 0, "FileModel" );
    QQmlApplicationEngine engine;
    engine.load(QUrl(QStringLiteral("qrc:/main.qml")));
    return app.exec();
}

main.qml

import QtQuick 2.3
import QtQuick.Window 2.2
import FileModel 1.0

Window {
    visible: true

    FileModel { id: fileModel }
    ListView {
        id: list
        anchors.fill: parent
        delegate: Text { text: display }
        MouseArea {
            anchors.fill: parent
            onClicked: {
                list.model = fileModel
                fileModel.fileName = "/tmp/file.txt"
            }
        }
    }
}

profile.pro

TEMPLATE = app
QT += qml quick
CONFIG += c++11
SOURCES += main.cpp \
    FileModel.cpp
RESOURCES += qml.qrc
HEADERS += \
    FileModel.h

qml.qrc

<RCC>
    <qresource prefix="/">
        <file>main.qml</file>
    </qresource>
</RCC>

Judges Dredd in het land gesignaleerd

Blijkbaar zijn er weer een paar rechercheurs die geloven dat de rechtstaat niet aan hen besteed is; dat ze zoals Judge Dredd aan de slag kunnen met het afluisteren van ziekenhuizen, artsen en hulpverleners zoals psychologen.

Alles is goed om hun parallelle constructies te ondersteunen. De wet zit voor hen niet in de weg. Wie heeft dat nu nodig? Wetten? Pfuh. Daar doet Dredd niet aan mee. Judge Dredd is de wet. Wat is dat nu.

Hadden ze aanwijzing dat de arts mee in het complot zat? Nee dat was er niet. Want waarom is de orde der geneesheren dan niet op de hoogte gebracht? Het was gewoon volstrekt illegaal om dat ziekenhuis af te luisteren.

Ik hoop van harte dat deze rechercheurs een zware gevangenisstraf krijgen en tot slot nooit nog het beroep van rechercheur mogen uitoefenen.

We hebben dat hier niet nodig. Ga maar in Het VK politieagentje spelen. Zolang het nog bestaat. Bende knoeiers.

Composition and aggregation with QObject

Consider these rather simple relationships between classes

Continuing on this subject, here are some code examples.

Class1 & Class2: Composition
An instance of Class1 can not exist without an instance of Class2.

Example of composition is typically a Bicycle and its Wheels, Saddle and a HandleBar: without these the Bicycle is no longer a Bicycle but just a Frame.

It can no longer function as a Bicycle. Example of when you need to stop thinking about composition versus aggregation is whenever you say: without the other thing can’t in our software the first thing work.

Note that you must consider this in the context of Class1. You use aggregation or composition based on how Class2 exists in relation to Class1.

Class1 with QScopedPointer:

#ifndef CLASS1_H
#define CLASS1_H

#include <QObject>
#include <QScopedPointer>
#include <Class2.h>

class Class1: public QObject
{
    Q_PROPERTY( Class2* class2 READ class2 WRITE setClass2 NOTIFY class2Changed)
public:
    Class1( QObject *a_parent = nullptr )
        : QObject ( a_parent) {
        // Don't use QObject parenting on top here
        m_class2.reset (new Class2() );
    }
    Class2* class2() {
        return m_class2.data();
    }
    void setClass2 ( Class2 *a_class2 ) {
        Q_ASSERT (a_class2 != nullptr); // Composition can't set a nullptr!
        if ( m_class2.data() != a_class2 ) {
            m_class2.reset( a_class2 );
            emit class2Changed()
        }
    }
signals:
    void class2Changed();
private:
    QScopedPointer<Class2> m_class2;
};

#endif// CLASS1_H

Class1 with QObject parenting:

#ifndef CLASS1_H
#define CLASS1_H

#include <QObject>
#include <Class2.h>

class Class1: public QObject
{
    Q_PROPERTY( Class2* class2 READ class2 WRITE setClass2 NOTIFY class2Changed)
public:
    Class1( QObject *a_parent = nullptr )
        : QObject ( a_parent )
        , m_class2 ( nullptr ) {
        // Make sure to use QObject parenting here
        m_class2 = new Class2( this );
    }
    Class2* class2() {
        return m_class2;
    }
    void setClass2 ( Class2 *a_class2 ) {
         Q_ASSERT (a_class2 != nullptr); // Composition can't set a nullptr!
         if ( m_class2 != a_class2 ) {
             // Make sure to use QObject parenting here
             a_class2->setParent ( this );
             delete m_class2; // Composition can never be nullptr
             m_class2 = a_class2;
             emit class2Changed();
         }
    }
signals:
    void class2Changed();
private:
    Class2 *m_class2;
};

#endif// CLASS1_H

Class1 with RAII:

#ifndef CLASS1_H
#define CLASS1_H

#include <QObject>
#include <QScopedPointer>

#include <Class2.h>

class Class1: public QObject
{
    Q_PROPERTY( Class2* class2 READ class2 CONSTANT)
public:
    Class1( QObject *a_parent = nullptr )
        : QObject ( a_parent ) { }
    Class2* class2()
        { return &m_class2; }
private:
    Class2 m_class2;
};
#endif// CLASS1_H

Class3 & Class4: Aggregation

An instance of Class3 can exist without an instance of Class4. Example of composition is typically a Bicycle and its driver or passenger: without the Driver or Passenger it is still a Bicycle. It can function as a Bicycle.

Example of when you need to stop thinking about composition versus aggregation is whenever you say: without the other thing can in our software the first thing work.

Class3:

#ifndef CLASS3_H
#define CLASS3_H

#include <QObject>

#include <QPointer>
#include <Class4.h>

class Class3: public QObject
{
    Q_PROPERTY( Class4* class4 READ class4 WRITE setClass4 NOTIFY class4Changed)
public:
    Class3( QObject *a_parent = nullptr );
    Class4* class4() {
        return m_class4.data();
    }
    void setClass4 (Class4 *a_class4) {
         if ( m_class4 != a_class4 ) {
             m_class4 = a_class4;
             emit class4Changed();
         }
    }
signals:
    void class4Changed();
private:
    QPointer<Class4> m_class4;
};
#endif// CLASS3_H

Class5, Class6 & Class7: Shared composition
An instance of Class5 and-or an instance of Class6 can not exist without a instance of Class7 shared by Class5 and Class6. When one of Class5 or Class6 can and one can not exist without the shared instance, use QWeakPointer at that place.

Class5:

#ifndef CLASS5_H
#define CLASS5_H

#include <QObject>
#include <QSharedPointer>

#include <Class7.h>

class Class5: public QObject
{
    Q_PROPERTY( Class7* class7 READ class7 CONSTANT)
public:
    Class5( QObject *a_parent = nullptr, Class7 *a_class7 );
        : QObject ( a_parent )
        , m_class7 ( a_class7 ) { }
    Class7* class7()
        { return m_class7.data(); }
private:
    QSharedPointer<Class7> m_class7;
};

Class6:

#ifndef CLASS6_H
#define CLASS6_H

#include <QObject>
#include <QSharedPointer>

#include <Class7.h>

class Class6: public QObject
{
    Q_PROPERTY( Class7* class7 READ class7 CONSTANT)
public:
    Class6( QObject *a_parent = nullptr, Class7 *a_class7 )
        : QObject ( a_parent )
        , m_class7 ( a_class7 ) { }
    Class7* class7()
        { return m_class7.data(); }
private:
    QSharedPointer<Class7> m_class7;
};
#endif// CLASS6_H

Interfaces with QObject

FlyBehavior:

#ifndef FLYBEHAVIOR_H
#define FLYBEHAVIOR_H
#include <QObject>
// Don't inherit QObject here (you'll break multiple-implements)
class FlyBehavior {
    public:
        Q_INVOKABLE virtual void fly() = 0;
};
Q_DECLARE_INTERFACE(FlyBehavior , "be.codeminded.Flying.FlyBehavior /1.0") 
#endif// FLYBEHAVIOR_H

FlyWithWings:

#ifndef FLY_WITH_WINGS_H
#define FLY_WITH_WINGS_H
#include <QObject>  
#include <Flying/FlyBehavior.h>
// Do inherit QObject here (this is a concrete class)
class FlyWithWings: public QObject, public FlyBehavior
{
    Q_OBJECT
    Q_INTERFACES( FlyBehavior )
public:
    explicit FlyWithWings( QObject *a_parent = nullptr ): QObject ( *a_parent ) {}
    ~FlyWithWings() {}

    virtual void fly() Q_DECL_OVERRIDE;
}
#endif// FLY_WITH_WINGS_H