FAQ

 

Streamly Q and A

/

General Questions:

Q:
What is Streamly?
A:

Streamly is an integrated real-time data processing solution. The keywords here are “integrated” and “solution.” These two words aim at emphasizing the fact that Streamly is not only a Stream processing application such as Apache Storm, Spark Streaming, or Apache Flink, but a set of complementary applications that are glued together to create a solution for your stream processing needs. These applications are Apache Kafka, MQTT, Spark, Elasticsearch, Zeppelin, Kibana, and Cassandra. You could think of it as a public version of the Netflix Keystone Pipeline.

Q:
What are the benefits of Streamly for hosting my applications?
A:

The main purpose of Streamly is to lower the barrier to stream processing and analytics. While all tools used in Streamly are open source, it is a herculean task to ensure QoS, availability, operability, and scalability. The current context is one wherein even the minimum viable Stream processing environment takes weeks or even months to get production ready and the cost of ownership of such environments are huge. Streamly’s goal is to provide a low cost environment for stream processing.

 

Q:
How does Streamly work?
A:

Streamly does not change the way you are used to building stream processing applications with Spark streaming. Instead, Streamly simply provides you with an environment where you can do what you already know how to do.

One of the first steps in building your stream processing applications is naturally to identify the tools with which your application has to interface. Once you do, you can create the necessary resources in your Streamly workspace. For instance, if your Spark application uses Cassandra, you can proceed and create a Cassandra keyspace and tables as you would normally do. If your Spark application uses Elasticsearch, you can proceed and create the Elasticsearch index in Streamly.

After creating all these resources, you can proceed to configure your application accordingly and deploy it onto Spark within Streamly.

Q:
But I can do all of this on AWS. What is different?
A:

AWS provides a lot of great tools: Kinesis, Dynamo, Elasticsearch, Spark, etc. So, you can indeed deploy your stream processing application on AWS. However, you will notice that to deploy your Spark Streaming application, you create a Spark cluster and you own it and you must operate it.

Similarly, you will have to deploy tools such as Zeppelin and Kibana yourself. In addition, you have tools such as Kinesis and Dynamo that tie you to AWS. All these reasons and the huge AWS costs are points that make us believe that you should think carefully before deploying your stream processing application on AWS.

 

Q:
When should I Use Streamly?
A:

If you want to build a Streaming application and you do NOT want to spend a huge amount of time setting up a gazillion of tools and operating them, then Streamly is the platform you should use.

Q:
What are some Use Cases that can be solved using Streamly?
A:

The use cases that can be solved by Streamly are traditional stream processing use cases. They include; IoT, sensor networks, log analysis, clickstream processing, tracking ads, monitoring of pumps and rigs, fast data aggregation, etc.

 

Q:
Can I use Streamly with my on-premises resources?
A:

Yes. Streamly is available as an on-premises solution as well. The Streamly deployment, configuration, and integration procedure is fully automated using Ansible. Get in touch and we will be happy to help.

Q:
How does Streamly integrate with our enterprise environment?
A:

Streamly integrates using LDAP and Kerberos. Given Streamly is a multi-tenant solution, different groups within the organization can create their private workspaces and work alongside each other without conflicting with one another.

 

Q:
How do I access my streams?
A:

You bring your streams to Streamly by publishing your data to either MQTT or Kafka. Your Spark application can then consume the data from those topics. The Kafka and MQTT topics can be secured or unsecured. You can define the security configuration in your Streamly dashboard.

Q:
What kind of skill sets are required to deploy applications on Streamly?
A:

You can submit any Java or Scala application on Streamly. It is not limited to streaming applications. You are free to use any framework and library that you choose.

 

Q:
What happens to data written to standard out?
A:

Your Spark application should redirect Standard Out and Standard Err to Logback or Log4J. You can see examples in our Github repository. From there, Standard out is redirected to the logger and pushed to Elasticsearch.

Q:
Do I need to push my source code to Streamly?
A:

No. Submit your bundled jar file using the Streamly dashboard. No source code is needed.

 

Q:
Can I use other build systems than Maven?
A:

Yes. You can use any other Java build processes to create your executable jar file. Make sure you test your Spark application locally before deploying it to Streamly. If you can run it in your local Spark instances, then you should be fine in Streamly.

Q:
What constraints should I be aware of when submitting applications on Streamly?
A:

Here are some things to keep in mind when deploying applications on Streamly.

  • The size limit of files you can upload is 200MB.
  • Any data written to the file system is potentially available to any other application. In general, avoid writing to the file system. Write instead either to the database or to Elasticsearch.

 

Q:
Can I access Streamly tools externally?
A:

Kafka and MQTT services are externally reachable but others are not.

Q:
How can I upload files to Spark driver/executor?
A:

When you upload a file on Streamly for a specific application, Streamly distributes it across all Spark nodes.

 

Q:
What are the best use cases for Streamly?
A:

There are many use cases for Streamly. We have listed a few in our Use Cases page.

Q:
Where can I get some help?
A:

If you have any questions or problems when using Streamly, first please check our frequently asked questions. If you do not find the answer to your question in the FAQ, you can post it in the mailing list. And, of course, you can always contact us, especially if your question is confidential. We would be delighted to help where possible.

 

Q:
What are the benefits of using Streamly?
A:

The main purpose of Streamly is to lower the barrier to stream processing and analytics. While all tools used in Streamly are open source, it is a herculean task to ensure QoS, availability, operability, and scalability. The current context is one wherein even the minimum viable Stream processing environment takes weeks or even months to get production ready and the cost of ownership of such environments are huge. Streamly’s goal is to provide a low-cost environment for stream processing.

Q:
How can I migrate data from my existing Elasticsearch (or Cassandra) cluster to my new Streamly account?
A:

Get in touch and we will work out a solution.

 

Q:
Are there limits to the number of topics (keyspaces, tables, applications, indices)?
A:

There is currently no limit on these resources.

Q:
How can I increase my Elasticsearch indices number limit?
A:

There is no limit on the number of Elasticsearch indices.

 

Q:
Is it possible to delete an account on Streamly?
A:

We would be sorry to see you go. But, shoot us a message via the Contact Us form and we will close your account.

Q:
How can I report an error?
A:

Log into the Streamly Dashboard and use the feedback form to send us your thoughts. Alternaly, use the Contact Us form. We welcome any input.

Getting Started:

Q:
How do I get started with Streamly?
A:

In less than 10 minutes, you get started with Streamly by following the steps described in the Developers Section.

Q:
What language does Streamly support?
A:

Streamly applications can be written in Java, Scala, and Python.

 

Q:
Is Streamly compatible with all Java, Scala and Python versions?
A:

Streamly is compatible with Java 1.8 and 1.7, Scala 2.11 and 2.10 and Python 2 and 3.

Q:
Does Streamly work with all Apache Spark versions?
A:

Streamly was developed and tested with Spark 2.1.0 and 2.0.0 and Hadoop 2.7.

 

Q:
Is there any constraints with using the core Java APIs?
A:

No. Your applications are executed using a recent version of JDK 8 with no modifications.

Q:
How do I specify which JDK I would like my application to use?
A:

You can specify your Java version from within your application. This can be done by using the maven plugin maven-compiler-plugin on your pom.xml.

Deployment:

Q:
Can I run Streamly on my own data center/cluster?
A:

Absolutely. Streamly comes with an Ansible-based provisioning framework. As a result, the whole solution can be deployed in any data center, in the cloud, or on-premises. Many companies use their dedicated Streamly deployment as an internal Stream Processing as a Service platform.

Q:
Does Streamly provide a command line interface ?
A:

No. This feature is not yet available.

 

Q:
Are my streams isolated during deployment on my account?
A:

Yes, streamed data flow via EMQTT or Kafka topics. You can configure them in your Streamly dashboard to be secured. Once you do, there is no possibility for anyone else to interfere.

Q:
Can I deploy a Spark application with a database other than Cassandra?
A:

Yes, you can deploy your applications with any database of your choice. Though, Streamly only hosts Cassandra. That means, you will have to host this database somewhere else.

Performance:

Q:
How good is Streamly’s performance?
A:

Streamly supports high availability. In fact, every service is clustered and configured to provide scalable, fault-tolerant, and readily available applications. We continue to tune Streamly for high-throughput and low latency. Please contact us if you want to know more.

Q:
What does high-performance processing mean?
A:

High performance in the context of stream processing has two dimensions: the latency and the throughput. The latency determines how long it takes to process events that flow in the data stream. The throughput determines how many events can be processed in a unit of time. Different stream processing scenarios have different requirements along these two dimensions. In general, however, lower latency and higher throughput are the hallmarks of high-performance stream processing systems.

 

Backup and Restore:

 

 

Q:
Should I backup my application data and streams?
A:

There is no available service yet for backup and restore in Streamly. Make sure your Elasticsearch indexes and Cassandra keyspaces have a replication factor of at least 1.

Q:
If my streaming process is interrupted what happens next?
A:

If your stream process is interrupted, you can check your application logs in Kibana/Elasticsearch to understand what happened. You can also restart your application.

 

Q:
Can I share my Dashboard with another account?
A:

No. Dashboards are not shared across accounts. When creating an account on Streamly you get access to all resources within your workspace and nothing else.

Q:
Will I be billed for sharing?
A:

No. Dashboards are not shared across accounts, which mean you can’t be billed for an unavailable feature.

 

Q:
Can I recover from a deleted topic (index, table,…)?
A:

No. Delete actions are not reversible.

 

 

Security:

 

 

Q:
Can I authenticate other users?
A:

No.

Q:
How secure is Streamly, because it is multi-tenanted?
A:

In Streamly, all services are secured and each user works in his/her specified workspace. Streams data are not shared across workspaces unless your topics are unsecure. Streamly uses a combination of security mechanisms to protect your data: Search Guard, Shiro, Kerberos, LDAP, as well as many security protocols built into such tools as Kafka, Cassandra, etc.

 

Q:
Are my Streams safe?
A:

Yes. All stream operations are done in the way that each user data is highly secured.

Q:
Where is Streamly hosted?
A:

Streamly is currently hosted on in private datacenters in North America regions.

 

Q:
What industrial security controls and practices does Streamly follow?
A:

Streamly uses a combination of security mechanisms to protect your data: Search Guard, Shiro, Kerberos, LDAP, JaaS, as well as many security protocols built into such tools as Kafka, Cassandra, etc.

Q:
Where do I find more information about Streamly Security?
A:

For more information on Streamly Security, please send an email to info@streamly.io

 

 

Pricing and Billing:

 

 

Q:
How am I charged for using Streamly?
A:

Streamly is currently in beta and free. We intend to create a billing method in future and feedback from you on pricing will be appreciated. So, for now, you can make use of the features Streamly provides for free. So why not spread the word?