Streamly Q and A
Streamly is an integrated real-time data processing solution. The keywords here are “integrated” and “solution.” These two words aim at emphasizing the fact that Streamly is not only a Stream processing application such as Apache Storm, Spark Streaming, or Apache Flink, but a set of complementary applications that are glued together to create a solution for your stream processing needs. These applications are Apache Kafka, MQTT, Spark, Elasticsearch, Zeppelin, Kibana, and Cassandra. You could think of it as a public version of the Netflix Keystone Pipeline.
The main purpose of Streamly is to lower the barrier to stream processing and analytics. While all tools used in Streamly are open source, it is a herculean task to ensure QoS, availability, operability, and scalability. The current context is one wherein even the minimum viable Stream processing environment takes weeks or even months to get production ready and the cost of ownership of such environments are huge. Streamly’s goal is to provide a low cost environment for stream processing.
Streamly does not change the way you are used to building stream processing applications with Spark streaming. Instead, Streamly simply provides you with an environment where you can do what you already know how to do.
One of the first steps in building your stream processing applications is naturally to identify the tools with which your application has to interface. Once you do, you can create the necessary resources in your Streamly workspace. For instance, if your Spark application uses Cassandra, you can proceed and create a Cassandra keyspace and tables as you would normally do. If your Spark application uses Elasticsearch, you can proceed and create the Elasticsearch index in Streamly.
After creating all these resources, you can proceed to configure your application accordingly and deploy it onto Spark within Streamly.
AWS provides a lot of great tools: Kinesis, Dynamo, Elasticsearch, Spark, etc. So, you can indeed deploy your stream processing application on AWS. However, you will notice that to deploy your Spark Streaming application, you create a Spark cluster and you own it and you must operate it.
Similarly, you will have to deploy tools such as Zeppelin and Kibana yourself. In addition, you have tools such as Kinesis and Dynamo that tie you to AWS. All these reasons and the huge AWS costs are points that make us believe that you should think carefully before deploying your stream processing application on AWS.
If you want to build a Streaming application and you do NOT want to spend a huge amount of time setting up a gazillion of tools and operating them, then Streamly is the platform you should use.
The use cases that can be solved by Streamly are traditional stream processing use cases. They include; IoT, sensor networks, log analysis, clickstream processing, tracking ads, monitoring of pumps and rigs, fast data aggregation, etc.
Yes. Streamly is available as an on-premises solution as well. The Streamly deployment, configuration, and integration procedure is fully automated using Ansible. Get in touch and we will be happy to help.
Streamly integrates using LDAP and Kerberos. Given Streamly is a multi-tenant solution, different groups within the organization can create their private workspaces and work alongside each other without conflicting with one another.
You bring your streams to Streamly by publishing your data to either MQTT or Kafka. Your Spark application can then consume the data from those topics. The Kafka and MQTT topics can be secured or unsecured. You can define the security configuration in your Streamly dashboard.
You can submit any Java or Scala application on Streamly. It is not limited to streaming applications. You are free to use any framework and library that you choose.
Your Spark application should redirect Standard Out and Standard Err to Logback or Log4J. You can see examples in our Github repository. From there, Standard out is redirected to the logger and pushed to Elasticsearch.
No. Submit your bundled jar file using the Streamly dashboard. No source code is needed.
Yes. You can use any other Java build processes to create your executable jar file. Make sure you test your Spark application locally before deploying it to Streamly. If you can run it in your local Spark instances, then you should be fine in Streamly.
Here are some things to keep in mind when deploying applications on Streamly.
- The size limit of files you can upload is 200MB.
- Any data written to the file system is potentially available to any other application. In general, avoid writing to the file system. Write instead either to the database or to Elasticsearch.
Kafka and MQTT services are externally reachable but others are not.
When you upload a file on Streamly for a specific application, Streamly distributes it across all Spark nodes.
There are many use cases for Streamly. We have listed a few in our Use Cases page.
If you have any questions or problems when using Streamly, first please check our frequently asked questions. If you do not find the answer to your question in the FAQ, you can post it in the mailing list. And, of course, you can always contact us, especially if your question is confidential. We would be delighted to help where possible.
The main purpose of Streamly is to lower the barrier to stream processing and analytics. While all tools used in Streamly are open source, it is a herculean task to ensure QoS, availability, operability, and scalability. The current context is one wherein even the minimum viable Stream processing environment takes weeks or even months to get production ready and the cost of ownership of such environments are huge. Streamly’s goal is to provide a low-cost environment for stream processing.
Get in touch and we will work out a solution.
There is currently no limit on these resources.
There is no limit on the number of Elasticsearch indices.
We would be sorry to see you go. But, shoot us a message via the Contact Us form and we will close your account.
Log into the Streamly Dashboard and use the feedback form to send us your thoughts. Alternaly, use the Contact Us form. We welcome any input.
In less than 10 minutes, you get started with Streamly by following the steps described in the Developers Section.
Streamly applications can be written in Java, Scala, and Python.
Streamly is compatible with Java 1.8 and 1.7, Scala 2.11 and 2.10 and Python 2 and 3.
Streamly was developed and tested with Spark 2.1.0 and 2.0.0 and Hadoop 2.7.
No. Your applications are executed using a recent version of JDK 8 with no modifications.
You can specify your Java version from within your application. This can be done by using the maven plugin maven-compiler-plugin on your pom.xml.
Absolutely. Streamly comes with an Ansible-based provisioning framework. As a result, the whole solution can be deployed in any data center, in the cloud, or on-premises. Many companies use their dedicated Streamly deployment as an internal Stream Processing as a Service platform.
No. This feature is not yet available.
Yes, streamed data flow via EMQTT or Kafka topics. You can configure them in your Streamly dashboard to be secured. Once you do, there is no possibility for anyone else to interfere.
Yes, you can deploy your applications with any database of your choice. Though, Streamly only hosts Cassandra. That means, you will have to host this database somewhere else.
Streamly supports high availability. In fact, every service is clustered and configured to provide scalable, fault-tolerant, and readily available applications. We continue to tune Streamly for high-throughput and low latency. Please contact us if you want to know more.
High performance in the context of stream processing has two dimensions: the latency and the throughput. The latency determines how long it takes to process events that flow in the data stream. The throughput determines how many events can be processed in a unit of time. Different stream processing scenarios have different requirements along these two dimensions. In general, however, lower latency and higher throughput are the hallmarks of high-performance stream processing systems.
Backup and Restore:
There is no available service yet for backup and restore in Streamly. Make sure your Elasticsearch indexes and Cassandra keyspaces have a replication factor of at least 1.
If your stream process is interrupted, you can check your application logs in Kibana/Elasticsearch to understand what happened. You can also restart your application.
No. Dashboards are not shared across accounts. When creating an account on Streamly you get access to all resources within your workspace and nothing else.
No. Dashboards are not shared across accounts, which mean you can’t be billed for an unavailable feature.
No. Delete actions are not reversible.
In Streamly, all services are secured and each user works in his/her specified workspace. Streams data are not shared across workspaces unless your topics are unsecure. Streamly uses a combination of security mechanisms to protect your data: Search Guard, Shiro, Kerberos, LDAP, as well as many security protocols built into such tools as Kafka, Cassandra, etc.
Yes. All stream operations are done in the way that each user data is highly secured.
Streamly is currently hosted on in private datacenters in North America regions.
For more information on Streamly Security, please send an email to firstname.lastname@example.org
Pricing and Billing:
Streamly is currently in beta and free. We intend to create a billing method in future and feedback from you on pricing will be appreciated. So, for now, you can make use of the features Streamly provides for free. So why not spread the word?