Truly huge files and the problem of continuous virtual address space

As we all know does mmap, or even worse on Windows CreateFileMapping, need contiguous virtual address space for a given mapping size. That can become a problem when you want to load a file of a gigabyte with mmap.

The solution is of course to mmap the big file using multiple mappings. For example like adapting yesterday’s demo this way:

void FileModel::setFileName(const QString &fileName)
{
    ...
    if (m_file->open(QIODevice::ReadOnly)) {
        if (m_file->size() > MAX_MAP_SIZE) {
            m_mapSize = MAX_MAP_SIZE;
            m_file_maps.resize(1 + m_file->size() / MAX_MAP_SIZE, nullptr);
        } else {
            m_mapSize = static_cast(m_file->size());
            m_file_maps.resize(1, nullptr);
        }
        ...
    } else {
        m_index->open(QFile::ReadOnly);
        m_rowCount = m_index->size() / 4;
    }
    m_file_maps[0] = m_file->map(0, m_mapSize, QFileDevice::NoOptions);
    qDebug() << "Done loading " << m_rowCount << " lines";
    map_index = m_index->map(0, m_index->size(), QFileDevice::NoOptions);

    beginResetModel();
    endResetModel();
    emit fileNameChanged();
}

And in the data() function:

QVariant FileModel::data( const QModelIndex& index, int role ) const
{
    QVariant ret;
    ...
    quint32 mapIndex = pos_i / MAX_MAP_SIZE;
    quint32 map_pos_i = pos_i % MAX_MAP_SIZE;
    quint32 map_end_i = end_i % MAX_MAP_SIZE;
    uchar* map_file = m_file_maps[mapIndex];
    if (map_file == nullptr)
        map_file = m_file_maps[mapIndex] = m_file->map(mapIndex * m_mapSize, m_mapSize, QFileDevice::NoOptions);
    position = m_file_maps[mapIndex] + map_pos_i;
    if (position) {
            const int length = static_cast(end_i - pos_i);
            char *buffer = (char*) alloca(length+1);
            if (map_end_i >= map_pos_i)
                strncpy (buffer, (char*) position, length);
            else {
                const uchar *position2 = m_file_maps[mapIndex+1];
                if (position2 == nullptr) {
                    position2 = m_file_maps[mapIndex+1] = m_file->map((mapIndex+1) *
                         m_mapSize, m_mapSize, QFileDevice::NoOptions);
                }
                strncpy (buffer, (char*) position, MAX_MAP_SIZE - map_pos_i);
                strncpy (buffer + (MAX_MAP_SIZE - map_pos_i), (char*) position2, map_end_i);
            }
            buffer[length] = 0;
            ret = QVariant(QString(buffer));
        }
    }
    return ret;
}

You could also not use mmap for the very big source text file and use m_file.seek(map_pos_i) and m_file.read(buffer, length). The most important mapping is of course the index one, as the reading of the individual lines can also be done fast enough with normal read() calls (as long as you don’t have to do it for each and every line of the very big file and as long as you know in a O(1) way where the QAbstractListModel’s index.row()’s data is).

But you already knew that. Right?