Software Engineer in Data

Team: CoreTech

Location: Remote

Last updated: September 23, 2024

Summary #︎

At Kamu we are developing a novel Web3 technology that, similarly to the invention of SQL database 40 years ago, will write a new chapter in humanity’s transition towards data economy.

This is your opportunity to join an ambitious early-stage startup that has already secured funding, and work on a technology that will shape the future of data science from a place of relative financial stability.

About us #︎

Kamu is building a unique decentralized network for the exchange and collaborative processing of structured data (whitepaper). Think of it as GitHub on top of a decentralized database, where people and organizations can share near real-time data streams, and data scientists can collaboratively compose them with SQL into high-quality data products ready for use by data-centric apps and for AI/ML. The network guarantees that all data is 100% auditable and verifiable and brings superior automation, accountability, and transparency to the data flows that underpin our society.

Kamu is backed by multiple investors and companies including Protocol Labs (the creators of IPFS and Filecoin) and Dell Technologies.

We are:

A distributed, multinational company with a presence in Canada, Ukraine, and Portugal
A highly technical group with decades of experience in big data, distributed software, and PhDs in AI/ML and computer science
A team that takes pride in delivering quality products and efficient workflows
Strong believers in Web3, decentralization, and personal data ownership
Open Source enthusiasts who develop technology in the open, and constantly share progress with the community through publications and conferences.

About you #︎

You have sharded databases, tuned replication, and dove into intricacies of transaction isolation levels. You know your way around OLAP cubes and transitioned companies from map-reduce to data lakes and fabrics. You have developed countless data APIs, optimized join peformance in your favorite dataframe library, and wrote a few of analytics engines of your own.

You have a burning passion for data … yet also a lingering sense that something is missing. You can’t help but wonder:

Why data flows in every company resemble a Rube Goldberg machine
Why there are hundreds of different analytical databases, while the vast majority of organizations in the world still cannot afford anything beyond Excel?
Why two dacades into Big Data age everyone still struggles to keep even small data up-to-date and of good quality?
Why despite the mantra of “breaking down the silos” enterprise data produced nothing but silos?
Why reuse and collaboration in data does not exist, and all attempts to create cross-company data repositories turns into “data graveyards”?

You, just like us, feel that the world of data is ready for a major innovation that will shake up the status quo. So instead of continuing the “rat race” towards bigger and more performant data that benefits only big tech companies - you want to apply your skills to something that can make a real difference in the world and democratize the data globally.

If this is true - you should talk to us!

Responsibilities #︎

As a Data Engineer in Kamu you will be working on the core technologies that serve our network and the platform:

A stream-oriented data format for structured dynamic data that can work with conventional (S3, HDFS) and decentralized (IPFS, Arweave) storage
A metadata format that serves as a passport of data and describes every event that influenced it
A protocol for 100% verifiable, reproducible, and auditable multi-party data processing
A fleet of plug-in data processing engines
And an infrastructure that turns this technology into a novel decentralized and near real-time data lake!

Core technology stack:

Rust
Parquet
Apache Arrow
Streaming (temporal) SQL
Apache Spark, Flink, DataFusion
IPLD, IPFS, Filecoin
Ethereum blockchain

Your work will include:

Evolving the core data formats and protocols
Improving the the existing data engines and integrating new ones
Building an efficient distributed processing infrastructure for running data pipelines and API queries
Designing data access APIs for ingress and egress of data
Building a federated data sharing and compute network
Integrating Kamu with 3rd-party data providers and consumers
Integrating Kamu with blockchain decoding/indexing technologies
Research and implementation of features like:
- Privacy-preserving compute
- Fine-grain provenance
- AI/ML integration with Kamu data pipelines
Communicating your progress to users and the community
Contributing to the product documentation and automated testing

Requirements #︎

BSc in CS or equivalent experience
6+ years of industry experience
Required skills:
- High profficiency in Rust, Java, or Scala
- Strong knowledge of SQL and database internals
- Modern data lake architecture and horizontal scaling
- Data science toolkits (Pandas, R)
- Data integration systems and patterns
- Software quality (test pyramid, CI/CD)
Bonus skills:
- Structured data formats (Parquet, Arrow)
- Stateful stream processing fundamentals
- CDC, Event sourcing
- Docker, AWS, Kubernetes
- Data visualization (PowerBI, Tableau, Jupyter)
- Development methodologies (Agile, Scrum)
- Open source collaboration
- Blockchain indexing and analytics (Dune, TrueBlocks)
- Decentralized storage (IPFS)
Good written English skills, ability to write clear documentation

What we offer #︎

🤙 Remote work with flexible hours
💵 Competitive salary, equity
💻 $1,500 home office equipment stipend
🏖️ 21 days of paid vacation per year
✈️ Conference travel and education budget

Application process #︎

Technical screening [40m]
Chat with one of the founders [40m]
Online interview [90m]

Apply Now

Send your CV to join@kamu.dev

All applications are reviewed by a human

🇺🇦✊ We stand with Ukraine and employ refugees and people on free and occupied territories. Ukrainian applicants can expect:

Accelerated recruitment process
Interview in their native language
Home office equipment support
Relocation support